All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
@ 2019-09-20  7:43 Tao Xu
  2019-09-20  7:43 ` [PATCH v12 01/11] util/cutils: Add qemu_strtotime_ps() Tao Xu
                   ` (12 more replies)
  0 siblings, 13 replies; 34+ messages in thread
From: Tao Xu @ 2019-09-20  7:43 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron,
	dan.j.williams

This series of patches will build Heterogeneous Memory Attribute Table (HMAT)
according to the command line. The ACPI HMAT describes the memory attributes,
such as memory side cache attributes and bandwidth and latency details,
related to the Memory Proximity Domain.
The software is expected to use HMAT information as hint for optimization.

In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

The V11 patches link:
https://patchwork.kernel.org/cover/11142287/

Changelog:
v12:
    - Fix a bug that a memory-only node without initiator setting
      doesn't report error. (reported by Danmei Wei)
    - Fix a bug that if HMAT is enabled and without hmat-lb setting,
      QEMU will crash. (reported by Danmei Wei)
v11:
    - Move numa option patches forward.
    - Add num_initiator in Numa_state to record the number of
    initiators.
    - Simplify struct HMAT_LB_Info, use uint64_t array to store data.
    - Drop hmat_get_base().
    - Calculate base in build_hmat_lb().
v10:
    - Add qemu_strtotime_ps() to convert strings with time suffixes
    to numbers, and add some tests for it.
    - Add qapi buildin type time, and add some tests for it.
    - Add machine oprion properties "-machine hmat=on|off" for
	  enabling or disabling HMAT in QEMU.
v9:
    - change the CLI input way, make it more user firendly (Daniel Black)
    use latency=NUM[p|n|u]s and bandwidth=NUM[M|G|P](B/s) as input and drop
    the base-lat and base-bw input.

Liu Jingqi (5):
  numa: Extend CLI to provide memory latency and bandwidth information
  numa: Extend CLI to provide memory side cache information
  hmat acpi: Build Memory Proximity Domain Attributes Structure(s)
  hmat acpi: Build System Locality Latency and Bandwidth Information
    Structure(s)
  hmat acpi: Build Memory Side Cache Information Structure(s)

Tao Xu (6):
  util/cutils: Add qemu_strtotime_ps()
  tests/cutils: Add test for qemu_strtotime_ps()
  qapi: Add builtin type time
  tests: Add test for QAPI builtin type time
  numa: Extend CLI to provide initiator information for numa nodes
  tests/bios-tables-test: add test cases for ACPI HMAT

 hw/acpi/Kconfig                    |   5 +
 hw/acpi/Makefile.objs              |   1 +
 hw/acpi/hmat.c                     | 287 +++++++++++++++++++++++++++++
 hw/acpi/hmat.h                     |  47 +++++
 hw/core/machine.c                  |  71 +++++++
 hw/core/numa.c                     | 212 +++++++++++++++++++++
 hw/i386/acpi-build.c               |   5 +
 include/qapi/visitor-impl.h        |   4 +
 include/qapi/visitor.h             |   9 +
 include/qemu/cutils.h              |   1 +
 include/sysemu/numa.h              |  81 ++++++++
 qapi/machine.json                  | 182 +++++++++++++++++-
 qapi/opts-visitor.c                |  22 +++
 qapi/qapi-visit-core.c             |  12 ++
 qapi/qobject-input-visitor.c       |  18 ++
 qapi/trace-events                  |   1 +
 qemu-options.hx                    |  96 +++++++++-
 scripts/qapi/common.py             |   2 +
 tests/bios-tables-test.c           |  44 +++++
 tests/test-cutils.c                | 199 ++++++++++++++++++++
 tests/test-keyval.c                | 125 +++++++++++++
 tests/test-qobject-input-visitor.c |  29 +++
 util/cutils.c                      |  82 +++++++++
 23 files changed, 1526 insertions(+), 9 deletions(-)
 create mode 100644 hw/acpi/hmat.c
 create mode 100644 hw/acpi/hmat.h

-- 
2.20.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v12 01/11] util/cutils: Add qemu_strtotime_ps()
  2019-09-20  7:43 [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
@ 2019-09-20  7:43 ` Tao Xu
  2019-09-20  7:43 ` [PATCH v12 02/11] tests/cutils: Add test for qemu_strtotime_ps() Tao Xu
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Tao Xu @ 2019-09-20  7:43 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron,
	dan.j.williams

To convert strings with time suffixes to numbers, support time unit are
"ps" for picosecond, "ns" for nanosecond, "us" for microsecond, "ms"
for millisecond or "s" for second.

Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v11 and v12.

New patch in v10.
---
 include/qemu/cutils.h |  1 +
 util/cutils.c         | 82 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
index 12301340a4..0e70a807e1 100644
--- a/include/qemu/cutils.h
+++ b/include/qemu/cutils.h
@@ -180,5 +180,6 @@ int uleb128_decode_small(const uint8_t *in, uint32_t *n);
  * *str1 is <, == or > than *str2.
  */
 int qemu_pstrcmp0(const char **str1, const char **str2);
+int qemu_strtotime_ps(const char *nptr, const char **end, uint64_t *result);
 
 #endif
diff --git a/util/cutils.c b/util/cutils.c
index fd591cadf0..a50c15f46a 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -847,3 +847,85 @@ int qemu_pstrcmp0(const char **str1, const char **str2)
 {
     return g_strcmp0(*str1, *str2);
 }
+
+static int64_t timeunit_mul(const char *unitstr)
+{
+    if (g_strcmp0(unitstr, "ps") == 0) {
+        return 1;
+    } else if (g_strcmp0(unitstr, "ns") == 0) {
+        return 1000;
+    } else if (g_strcmp0(unitstr, "us") == 0) {
+        return 1000000;
+    } else if (g_strcmp0(unitstr, "ms") == 0) {
+        return 1000000000LL;
+    } else if (g_strcmp0(unitstr, "s") == 0) {
+        return 1000000000000LL;
+    } else {
+        return -1;
+    }
+}
+
+
+/*
+ * Convert string to time, support time unit are ps for picosecond,
+ * ns for nanosecond, us for microsecond, ms for millisecond or s for second.
+ * End pointer will be returned in *end, if not NULL. Return -ERANGE on
+ * overflow, and -EINVAL on other error.
+ */
+static int do_strtotime(const char *nptr, const char **end,
+                      const char *default_unit, uint64_t *result)
+{
+    int retval;
+    const char *endptr;
+    int mul_required = 0;
+    int64_t mul;
+    double val, integral, fraction;
+
+    retval = qemu_strtod_finite(nptr, &endptr, &val);
+    if (retval) {
+        goto out;
+    }
+    fraction = modf(val, &integral);
+    if (fraction != 0) {
+        mul_required = 1;
+    }
+
+    mul = timeunit_mul(endptr);
+
+    if (mul == 1000000000000LL) {
+        endptr++;
+    } else if (mul != -1) {
+        endptr += 2;
+    } else {
+        mul = timeunit_mul(default_unit);
+        assert(mul >= 0);
+    }
+    if (mul == 1 && mul_required) {
+        retval = -EINVAL;
+        goto out;
+    }
+    /*
+     * Values >= 0xfffffffffffffc00 overflow uint64_t after their trip
+     * through double (53 bits of precision).
+     */
+    if ((val * (double)mul >= 0xfffffffffffffc00) || val < 0) {
+        retval = -ERANGE;
+        goto out;
+    }
+    *result = val * (double)mul;
+    retval = 0;
+
+out:
+    if (end) {
+        *end = endptr;
+    } else if (*endptr) {
+        retval = -EINVAL;
+    }
+
+    return retval;
+}
+
+int qemu_strtotime_ps(const char *nptr, const char **end, uint64_t *result)
+{
+    return do_strtotime(nptr, end, "ps", result);
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v12 02/11] tests/cutils: Add test for qemu_strtotime_ps()
  2019-09-20  7:43 [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
  2019-09-20  7:43 ` [PATCH v12 01/11] util/cutils: Add qemu_strtotime_ps() Tao Xu
@ 2019-09-20  7:43 ` Tao Xu
  2019-09-20  7:43 ` [PATCH v12 03/11] qapi: Add builtin type time Tao Xu
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Tao Xu @ 2019-09-20  7:43 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron,
	dan.j.williams

Test the input of basic, time suffixes, float, invaild, trailing and
overflow.

Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v11 and v12.

New patch in v10.
---
 tests/test-cutils.c | 199 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 199 insertions(+)

diff --git a/tests/test-cutils.c b/tests/test-cutils.c
index 1aa8351520..19c967d3d5 100644
--- a/tests/test-cutils.c
+++ b/tests/test-cutils.c
@@ -2179,6 +2179,193 @@ static void test_qemu_strtosz_metric(void)
     g_assert(endptr == str + 6);
 }
 
+static void test_qemu_strtotime_ps_simple(void)
+{
+    const char *str;
+    const char *endptr;
+    int err;
+    uint64_t res = 0xbaadf00d;
+
+    str = "0";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 0);
+    g_assert(endptr == str + 1);
+
+    str = "56789";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 56789);
+    g_assert(endptr == str + 5);
+
+    err = qemu_strtotime_ps(str, NULL, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 56789);
+
+    /* Note: precision is 53 bits since we're parsing with strtod() */
+
+    str = "9007199254740991"; /* 2^53-1 */
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 0x1fffffffffffff);
+    g_assert(endptr == str + 16);
+
+    str = "9007199254740992"; /* 2^53 */
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 0x20000000000000);
+    g_assert(endptr == str + 16);
+
+    str = "9007199254740993"; /* 2^53+1 */
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 0x20000000000000); /* rounded to 53 bits */
+    g_assert(endptr == str + 16);
+
+    str = "18446744073709549568"; /* 0xfffffffffffff800 (53 msbs set) */
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 0xfffffffffffff800);
+    g_assert(endptr == str + 20);
+
+    str = "18446744073709550591"; /* 0xfffffffffffffbff */
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 0xfffffffffffff800); /* rounded to 53 bits */
+    g_assert(endptr == str + 20);
+
+    /* 0x7ffffffffffffe00..0x7fffffffffffffff get rounded to
+     * 0x8000000000000000, thus -ERANGE; see test_qemu_strtosz_erange() */
+}
+
+static void test_qemu_strtotime_ps_units(void)
+{
+    const char *ps = "1ps";
+    const char *ns = "1ns";
+    const char *us = "1us";
+    const char *ms = "1ms";
+    const char *s = "1s";
+    int err;
+    const char *endptr;
+    uint64_t res = 0xbaadf00d;
+
+    err = qemu_strtotime_ps(ps, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 1);
+    g_assert(endptr == ps + 3);
+
+    err = qemu_strtotime_ps(ns, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 1000);
+    g_assert(endptr == ns + 3);
+
+    err = qemu_strtotime_ps(us, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 1000000);
+    g_assert(endptr == us + 3);
+
+    err = qemu_strtotime_ps(ms, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 1000000000LL);
+    g_assert(endptr == ms + 3);
+
+    err = qemu_strtotime_ps(s, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 1000000000000ULL);
+    g_assert(endptr == s + 2);
+}
+
+static void test_qemu_strtotime_ps_float(void)
+{
+    const char *str = "56.789ns";
+    int err;
+    const char *endptr;
+    uint64_t res = 0xbaadf00d;
+
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 56.789 * 1000);
+    g_assert(endptr == str + 8);
+}
+
+static void test_qemu_strtotime_ps_invalid(void)
+{
+    const char *str;
+    const char *endptr;
+    int err;
+    uint64_t res = 0xbaadf00d;
+
+    str = "";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+    g_assert(endptr == str);
+
+    str = " \t ";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+    g_assert(endptr == str);
+
+    str = "crap";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+    g_assert(endptr == str);
+
+    str = "inf";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+    g_assert(endptr == str);
+
+    str = "NaN";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+    g_assert(endptr == str);
+}
+
+static void test_qemu_strtotime_ps_trailing(void)
+{
+    const char *str;
+    int err;
+    uint64_t res = 0xbaadf00d;
+
+    str = "123xxx";
+
+    err = qemu_strtotime_ps(str, NULL, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+}
+
+static void test_qemu_strtotime_ps_erange(void)
+{
+    const char *str;
+    const char *endptr;
+    int err;
+    uint64_t res = 0xbaadf00d;
+
+    str = "-1";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -ERANGE);
+    g_assert(endptr == str + 2);
+
+    str = "18446744073709550592"; /* 0xfffffffffffffc00 */
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -ERANGE);
+    g_assert(endptr == str + 20);
+
+    str = "18446744073709551615"; /* 2^64-1 */
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -ERANGE);
+    g_assert(endptr == str + 20);
+
+    str = "18446744073709551616"; /* 2^64 */
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -ERANGE);
+    g_assert(endptr == str + 20);
+
+    str = "200000000000000s";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -ERANGE);
+    g_assert(endptr == str + 16);
+}
+
 int main(int argc, char **argv)
 {
     g_test_init(&argc, &argv, NULL);
@@ -2456,5 +2643,17 @@ int main(int argc, char **argv)
     g_test_add_func("/cutils/strtosz/metric",
                     test_qemu_strtosz_metric);
 
+    g_test_add_func("/cutils/strtotime/simple",
+                    test_qemu_strtotime_ps_simple);
+    g_test_add_func("/cutils/strtotime/units",
+                    test_qemu_strtotime_ps_units);
+    g_test_add_func("/cutils/strtotime/float",
+                    test_qemu_strtotime_ps_float);
+    g_test_add_func("/cutils/strtotime/invalid",
+                    test_qemu_strtotime_ps_invalid);
+    g_test_add_func("/cutils/strtotime/trailing",
+                    test_qemu_strtotime_ps_trailing);
+    g_test_add_func("/cutils/strtotime/erange",
+                    test_qemu_strtotime_ps_erange);
     return g_test_run();
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v12 03/11] qapi: Add builtin type time
  2019-09-20  7:43 [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
  2019-09-20  7:43 ` [PATCH v12 01/11] util/cutils: Add qemu_strtotime_ps() Tao Xu
  2019-09-20  7:43 ` [PATCH v12 02/11] tests/cutils: Add test for qemu_strtotime_ps() Tao Xu
@ 2019-09-20  7:43 ` Tao Xu
  2019-10-15  6:22   ` Tao Xu
  2019-09-20  7:43 ` [PATCH v12 04/11] tests: Add test for QAPI " Tao Xu
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-09-20  7:43 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron,
	dan.j.williams

Add optional builtin type time, fallback is uint64. This type use
qemu_strtotime_ps() for pre-converting time suffix to numbers.

Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v11 and v12.

New patch in v10.
---
 include/qapi/visitor-impl.h  |  4 ++++
 include/qapi/visitor.h       |  9 +++++++++
 qapi/opts-visitor.c          | 22 ++++++++++++++++++++++
 qapi/qapi-visit-core.c       | 12 ++++++++++++
 qapi/qobject-input-visitor.c | 18 ++++++++++++++++++
 qapi/trace-events            |  1 +
 scripts/qapi/common.py       |  2 ++
 7 files changed, 68 insertions(+)

diff --git a/include/qapi/visitor-impl.h b/include/qapi/visitor-impl.h
index 8ccb3b6c20..e0979563c7 100644
--- a/include/qapi/visitor-impl.h
+++ b/include/qapi/visitor-impl.h
@@ -88,6 +88,10 @@ struct Visitor
     void (*type_size)(Visitor *v, const char *name, uint64_t *obj,
                       Error **errp);
 
+    /* Optional; fallback is type_uint64() */
+    void (*type_time)(Visitor *v, const char *name, uint64_t *obj,
+                      Error **errp);
+
     /* Must be set */
     void (*type_bool)(Visitor *v, const char *name, bool *obj, Error **errp);
 
diff --git a/include/qapi/visitor.h b/include/qapi/visitor.h
index 5b2ed3f202..4c3198b1c5 100644
--- a/include/qapi/visitor.h
+++ b/include/qapi/visitor.h
@@ -554,6 +554,15 @@ void visit_type_int64(Visitor *v, const char *name, int64_t *obj,
 void visit_type_size(Visitor *v, const char *name, uint64_t *obj,
                      Error **errp);
 
+/*
+ * Visit a uint64_t value.
+ * Like visit_type_uint64(), except that some visitors may choose to
+ * recognize numbers with timeunit suffix, such as "ps", "ns", "us"
+ * "ms" and "s".
+ */
+void visit_type_time(Visitor *v, const char *name, uint64_t *obj,
+                     Error **errp);
+
 /*
  * Visit a boolean value.
  *
diff --git a/qapi/opts-visitor.c b/qapi/opts-visitor.c
index 324b197495..d73b2e51a0 100644
--- a/qapi/opts-visitor.c
+++ b/qapi/opts-visitor.c
@@ -508,6 +508,27 @@ opts_type_size(Visitor *v, const char *name, uint64_t *obj, Error **errp)
     processed(ov, name);
 }
 
+static void
+opts_type_time(Visitor *v, const char *name, uint64_t *obj, Error **errp)
+{
+    OptsVisitor *ov = to_ov(v);
+    const QemuOpt *opt;
+    int err;
+
+    opt = lookup_scalar(ov, name, errp);
+    if (!opt) {
+        return;
+    }
+
+    err = qemu_strtotime_ps(opt->str ? opt->str : "", NULL, obj);
+    if (err < 0) {
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE, opt->name,
+                   "a time value");
+        return;
+    }
+
+    processed(ov, name);
+}
 
 static void
 opts_optional(Visitor *v, const char *name, bool *present)
@@ -555,6 +576,7 @@ opts_visitor_new(const QemuOpts *opts)
     ov->visitor.type_int64  = &opts_type_int64;
     ov->visitor.type_uint64 = &opts_type_uint64;
     ov->visitor.type_size   = &opts_type_size;
+    ov->visitor.type_time   = &opts_type_time;
     ov->visitor.type_bool   = &opts_type_bool;
     ov->visitor.type_str    = &opts_type_str;
 
diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c
index 5365561b07..ac8896455c 100644
--- a/qapi/qapi-visit-core.c
+++ b/qapi/qapi-visit-core.c
@@ -277,6 +277,18 @@ void visit_type_size(Visitor *v, const char *name, uint64_t *obj,
     }
 }
 
+void visit_type_time(Visitor *v, const char *name, uint64_t *obj,
+                     Error **errp)
+{
+    assert(obj);
+    trace_visit_type_time(v, name, obj);
+    if (v->type_time) {
+        v->type_time(v, name, obj, errp);
+    } else {
+        v->type_uint64(v, name, obj, errp);
+    }
+}
+
 void visit_type_bool(Visitor *v, const char *name, bool *obj, Error **errp)
 {
     assert(obj);
diff --git a/qapi/qobject-input-visitor.c b/qapi/qobject-input-visitor.c
index 32236cbcb1..9b66941d8a 100644
--- a/qapi/qobject-input-visitor.c
+++ b/qapi/qobject-input-visitor.c
@@ -627,6 +627,23 @@ static void qobject_input_type_size_keyval(Visitor *v, const char *name,
     }
 }
 
+static void qobject_input_type_time_keyval(Visitor *v, const char *name,
+                                           uint64_t *obj, Error **errp)
+{
+    QObjectInputVisitor *qiv = to_qiv(v);
+    const char *str = qobject_input_get_keyval(qiv, name, errp);
+
+    if (!str) {
+        return;
+    }
+
+    if (qemu_strtotime_ps(str, NULL, obj) < 0) {
+        /* TODO report -ERANGE more nicely */
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+                   full_name(qiv, name), "time");
+    }
+}
+
 static void qobject_input_optional(Visitor *v, const char *name, bool *present)
 {
     QObjectInputVisitor *qiv = to_qiv(v);
@@ -708,6 +725,7 @@ Visitor *qobject_input_visitor_new_keyval(QObject *obj)
     v->visitor.type_any = qobject_input_type_any;
     v->visitor.type_null = qobject_input_type_null;
     v->visitor.type_size = qobject_input_type_size_keyval;
+    v->visitor.type_time = qobject_input_type_time_keyval;
     v->keyval = true;
 
     return &v->visitor;
diff --git a/qapi/trace-events b/qapi/trace-events
index 5eb4afa110..c4605a7ccc 100644
--- a/qapi/trace-events
+++ b/qapi/trace-events
@@ -29,6 +29,7 @@ visit_type_int16(void *v, const char *name, int16_t *obj) "v=%p name=%s obj=%p"
 visit_type_int32(void *v, const char *name, int32_t *obj) "v=%p name=%s obj=%p"
 visit_type_int64(void *v, const char *name, int64_t *obj) "v=%p name=%s obj=%p"
 visit_type_size(void *v, const char *name, uint64_t *obj) "v=%p name=%s obj=%p"
+visit_type_time(void *v, const char *name, uint64_t *obj) "v=%p name=%s obj=%p"
 visit_type_bool(void *v, const char *name, bool *obj) "v=%p name=%s obj=%p"
 visit_type_str(void *v, const char *name, char **obj) "v=%p name=%s obj=%p"
 visit_type_number(void *v, const char *name, void *obj) "v=%p name=%s obj=%p"
diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
index d61bfdc526..3a6f108794 100644
--- a/scripts/qapi/common.py
+++ b/scripts/qapi/common.py
@@ -35,6 +35,7 @@ builtin_types = {
     'uint32':   'QTYPE_QNUM',
     'uint64':   'QTYPE_QNUM',
     'size':     'QTYPE_QNUM',
+    'time':     'QTYPE_QNUM',
     'any':      None,           # any QType possible, actually
     'QType':    'QTYPE_QSTRING',
 }
@@ -1834,6 +1835,7 @@ class QAPISchema(object):
                   ('uint32', 'int',     'uint32_t'),
                   ('uint64', 'int',     'uint64_t'),
                   ('size',   'int',     'uint64_t'),
+                  ('time',   'int',     'uint64_t'),
                   ('bool',   'boolean', 'bool'),
                   ('any',    'value',   'QObject' + pointer_suffix),
                   ('null',   'null',    'QNull' + pointer_suffix)]:
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v12 04/11] tests: Add test for QAPI builtin type time
  2019-09-20  7:43 [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (2 preceding siblings ...)
  2019-09-20  7:43 ` [PATCH v12 03/11] qapi: Add builtin type time Tao Xu
@ 2019-09-20  7:43 ` Tao Xu
  2019-09-20  7:43 ` [PATCH v12 05/11] numa: Extend CLI to provide initiator information for numa nodes Tao Xu
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Tao Xu @ 2019-09-20  7:43 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron,
	dan.j.williams

Add tests for time input such as zero, around limit of precision,
signed upper limit, actual upper limit, beyond limits, time suffixes,
and etc.

Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v11 and v12.

New patch in v10.
---
 tests/test-keyval.c                | 125 +++++++++++++++++++++++++++++
 tests/test-qobject-input-visitor.c |  29 +++++++
 2 files changed, 154 insertions(+)

diff --git a/tests/test-keyval.c b/tests/test-keyval.c
index 09b0ae3c68..b36914f0fc 100644
--- a/tests/test-keyval.c
+++ b/tests/test-keyval.c
@@ -490,6 +490,130 @@ static void test_keyval_visit_size(void)
     visit_free(v);
 }
 
+static void test_keyval_visit_time(void)
+{
+    Error *err = NULL;
+    Visitor *v;
+    QDict *qdict;
+    uint64_t time;
+
+    /* Lower limit zero */
+    qdict = keyval_parse("time1=0", NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &error_abort);
+    g_assert_cmpuint(time, ==, 0);
+    visit_check_struct(v, &error_abort);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Around limit of precision: 2^53-1, 2^53, 2^53+1 */
+    qdict = keyval_parse("time1=9007199254740991,"
+                         "time2=9007199254740992,"
+                         "time3=9007199254740993",
+                         NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &error_abort);
+    g_assert_cmphex(time, ==, 0x1fffffffffffff);
+    visit_type_time(v, "time2", &time, &error_abort);
+    g_assert_cmphex(time, ==, 0x20000000000000);
+    visit_type_time(v, "time3", &time, &error_abort);
+    g_assert_cmphex(time, ==, 0x20000000000000);
+    visit_check_struct(v, &error_abort);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Close to signed upper limit 0x7ffffffffffffc00 (53 msbs set) */
+    qdict = keyval_parse("time1=9223372036854774784," /* 7ffffffffffffc00 */
+                         "time2=9223372036854775295", /* 7ffffffffffffdff */
+                         NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &error_abort);
+    g_assert_cmphex(time, ==, 0x7ffffffffffffc00);
+    visit_type_time(v, "time2", &time, &error_abort);
+    g_assert_cmphex(time, ==, 0x7ffffffffffffc00);
+    visit_check_struct(v, &error_abort);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Close to actual upper limit 0xfffffffffffff800 (53 msbs set) */
+    qdict = keyval_parse("time1=18446744073709549568," /* fffffffffffff800 */
+                         "time2=18446744073709550591", /* fffffffffffffbff */
+                         NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &error_abort);
+    g_assert_cmphex(time, ==, 0xfffffffffffff800);
+    visit_type_time(v, "time2", &time, &error_abort);
+    g_assert_cmphex(time, ==, 0xfffffffffffff800);
+    visit_check_struct(v, &error_abort);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Beyond limits */
+    qdict = keyval_parse("time1=-1,"
+                         "time2=18446744073709550592", /* fffffffffffffc00 */
+                         NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &err);
+    error_free_or_abort(&err);
+    visit_type_time(v, "time2", &time, &err);
+    error_free_or_abort(&err);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Suffixes */
+    qdict = keyval_parse("time1=2ps,time2=3.4ns,time3=5us,"
+                         "time4=0.6ms,time5=700s",
+                         NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &error_abort);
+    g_assert_cmpuint(time, ==, 2);
+    visit_type_time(v, "time2", &time, &error_abort);
+    g_assert_cmpuint(time, ==, 3400);
+    visit_type_time(v, "time3", &time, &error_abort);
+    g_assert_cmphex(time, ==, 5 * 1000 * 1000);
+    visit_type_time(v, "time4", &time, &error_abort);
+    g_assert_cmphex(time, ==, 600 * 1000 * 1000);
+    visit_type_time(v, "time5", &time, &error_abort);
+    g_assert_cmphex(time, ==, 700 * 1000000000000ULL);
+    visit_check_struct(v, &error_abort);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Beyond limit with suffix */
+    qdict = keyval_parse("time1=18446745s", NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &err);
+    error_free_or_abort(&err);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Trailing crap */
+    qdict = keyval_parse("time1=89ks,time2=ns", NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &err);
+    error_free_or_abort(&err);
+    visit_type_time(v, "time2", &time, &err);;
+    error_free_or_abort(&err);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+}
+
 static void test_keyval_visit_dict(void)
 {
     Error *err = NULL;
@@ -678,6 +802,7 @@ int main(int argc, char *argv[])
     g_test_add_func("/keyval/visit/bool", test_keyval_visit_bool);
     g_test_add_func("/keyval/visit/number", test_keyval_visit_number);
     g_test_add_func("/keyval/visit/size", test_keyval_visit_size);
+    g_test_add_func("/keyval/visit/time", test_keyval_visit_time);
     g_test_add_func("/keyval/visit/dict", test_keyval_visit_dict);
     g_test_add_func("/keyval/visit/list", test_keyval_visit_list);
     g_test_add_func("/keyval/visit/optional", test_keyval_visit_optional);
diff --git a/tests/test-qobject-input-visitor.c b/tests/test-qobject-input-visitor.c
index 6bacabf063..4b5820b744 100644
--- a/tests/test-qobject-input-visitor.c
+++ b/tests/test-qobject-input-visitor.c
@@ -366,6 +366,31 @@ static void test_visitor_in_size_str_fail(TestInputVisitorData *data,
     error_free_or_abort(&err);
 }
 
+static void test_visitor_in_time_str_keyval(TestInputVisitorData *data,
+                                            const void *unused)
+{
+    uint64_t res, value = 265 * 1000 * 1000;
+    Visitor *v;
+
+    v = visitor_input_test_init_full(data, true, "\"265us\"");
+
+    visit_type_time(v, NULL, &res, &error_abort);
+    g_assert_cmpfloat(res, ==, value);
+}
+
+static void test_visitor_in_time_str_fail(TestInputVisitorData *data,
+                                          const void *unused)
+{
+    uint64_t res = 0;
+    Visitor *v;
+    Error *err = NULL;
+
+    v = visitor_input_test_init(data, "\"265us\"");
+
+    visit_type_time(v, NULL, &res, &err);
+    error_free_or_abort(&err);
+}
+
 static void test_visitor_in_string(TestInputVisitorData *data,
                                    const void *unused)
 {
@@ -1311,6 +1336,10 @@ int main(int argc, char **argv)
                            NULL, test_visitor_in_size_str_keyval);
     input_visitor_test_add("/visitor/input/size_str_fail",
                            NULL, test_visitor_in_size_str_fail);
+    input_visitor_test_add("/visitor/input/time_str_keyval",
+                           NULL, test_visitor_in_time_str_keyval);
+    input_visitor_test_add("/visitor/input/time_str_fail",
+                           NULL, test_visitor_in_time_str_fail);
     input_visitor_test_add("/visitor/input/string",
                            NULL, test_visitor_in_string);
     input_visitor_test_add("/visitor/input/enum",
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v12 05/11] numa: Extend CLI to provide initiator information for numa nodes
  2019-09-20  7:43 [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (3 preceding siblings ...)
  2019-09-20  7:43 ` [PATCH v12 04/11] tests: Add test for QAPI " Tao Xu
@ 2019-09-20  7:43 ` Tao Xu
  2019-09-30 11:25   ` Igor Mammedov
  2019-09-20  7:43 ` [PATCH v12 06/11] numa: Extend CLI to provide memory latency and bandwidth information Tao Xu
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-09-20  7:43 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron,
	dan.j.williams

In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
The initiator represents processor which access to memory. And in 5.2.27.3
Memory Proximity Domain Attributes Structure, the attached initiator is
defined as where the memory controller responsible for a memory proximity
domain. With attached initiator information, the topology of heterogeneous
memory can be described.

Extend CLI of "-numa node" option to indicate the initiator numa node-id.
In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v12:
    - Fix the bug that a memory-only node without initiator setting
      doesn't report error. (reported by Danmei Wei)

No changes in v11.

Changes in v10:
    - Add machine oprion properties "-machine hmat=on|off" for enabling
      or disabling HMAT in QEMU.
    - Add more description for initiator option.
    - Report error then HMAT is enable and initiator option is missing.
      Not allow invaild initiator now. (Igor)
---
 hw/core/machine.c     | 71 +++++++++++++++++++++++++++++++++++++++++++
 hw/core/numa.c        | 24 +++++++++++++++
 include/sysemu/numa.h |  6 ++++
 qapi/machine.json     | 10 +++++-
 qemu-options.hx       | 35 ++++++++++++++++++---
 5 files changed, 140 insertions(+), 6 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 1689ad3bf8..087baaf571 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -518,6 +518,20 @@ static void machine_set_nvdimm(Object *obj, bool value, Error **errp)
     ms->nvdimms_state->is_enabled = value;
 }
 
+static bool machine_get_hmat(Object *obj, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    return ms->numa_state->hmat_enabled;
+}
+
+static void machine_set_hmat(Object *obj, bool value, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    ms->numa_state->hmat_enabled = value;
+}
+
 static char *machine_get_nvdimm_persistence(Object *obj, Error **errp)
 {
     MachineState *ms = MACHINE(obj);
@@ -645,6 +659,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
                                const CpuInstanceProperties *props, Error **errp)
 {
     MachineClass *mc = MACHINE_GET_CLASS(machine);
+    NodeInfo *numa_info = machine->numa_state->nodes;
     bool match = false;
     int i;
 
@@ -714,6 +729,16 @@ void machine_set_cpu_numa_node(MachineState *machine,
         match = true;
         slot->props.node_id = props->node_id;
         slot->props.has_node_id = props->has_node_id;
+
+        if (numa_info[props->node_id].initiator_valid &&
+            (props->node_id != numa_info[props->node_id].initiator)) {
+            error_setg(errp, "The initiator of CPU NUMA node %" PRId64
+                       " should be itself.", props->node_id);
+            return;
+        }
+        numa_info[props->node_id].initiator_valid = true;
+        numa_info[props->node_id].has_cpu = true;
+        numa_info[props->node_id].initiator = props->node_id;
     }
 
     if (!match) {
@@ -960,6 +985,13 @@ static void machine_initfn(Object *obj)
 
     if (mc->numa_mem_supported) {
         ms->numa_state = g_new0(NumaState, 1);
+        object_property_add_bool(obj, "hmat",
+                                 machine_get_hmat, machine_set_hmat,
+                                 &error_abort);
+        object_property_set_description(obj, "hmat",
+                                        "Set on/off to enable/disable "
+                                        "ACPI Heterogeneous Memory Attribute "
+                                        "Table (HMAT)", NULL);
     }
 
     /* Register notifier when init is done for sysbus sanity checks */
@@ -1048,6 +1080,40 @@ static char *cpu_slot_to_string(const CPUArchId *cpu)
     return g_string_free(s, false);
 }
 
+static void numa_validate_initiator(NumaState *nstat)
+{
+    int i;
+    NodeInfo *numa_info = nstat->nodes;
+
+    for (i = 0; i < nstat->num_nodes; i++) {
+        if (numa_info[i].initiator == MAX_NODES) {
+            error_report("The initiator of NUMA node %d is missing, use "
+                         "'-numa node,initiator' option to declare it.", i);
+            goto err;
+        }
+
+        if (!numa_info[numa_info[i].initiator].present) {
+            error_report("NUMA node %" PRIu16 " is missing, use "
+                         "'-numa node' option to declare it first.",
+                         numa_info[i].initiator);
+            goto err;
+        }
+
+        if (numa_info[numa_info[i].initiator].has_cpu) {
+            numa_info[i].initiator_valid = true;
+        } else {
+            error_report("The initiator of NUMA node %d is invalid.", i);
+            goto err;
+        }
+    }
+
+    return;
+
+err:
+    error_printf("\n");
+    exit(1);
+}
+
 static void machine_numa_finish_cpu_init(MachineState *machine)
 {
     int i;
@@ -1088,6 +1154,11 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
             machine_set_cpu_numa_node(machine, &props, &error_fatal);
         }
     }
+
+    if (machine->numa_state->hmat_enabled) {
+        numa_validate_initiator(machine->numa_state);
+    }
+
     if (s->len && !qtest_enabled()) {
         warn_report("CPU(s) not present in any NUMA nodes: %s",
                     s->str);
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 4dfec5c95b..eff5491f6f 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -133,6 +133,30 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
         numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
         numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
     }
+
+    if (node->has_initiator) {
+        if (!ms->numa_state->hmat_enabled) {
+            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
+                       "(HMAT) is disabled, use -machine hmat=on before "
+                       "set initiator of NUMA");
+            return;
+        }
+
+        if (node->initiator >= MAX_NODES) {
+            error_report("The initiator id %" PRIu16 " expects an integer "
+                         "between 0 and %d", node->initiator,
+                         MAX_NODES - 1);
+            return;
+        }
+
+        numa_info[nodenr].initiator = node->initiator;
+    } else if (ms->numa_state->hmat_enabled) {
+        /*
+         * If not set the initiator, set it to MAX_NODES. And if
+         * HMAT is enabled and this node has no cpus, QEMU will raise error.
+         */
+        numa_info[nodenr].initiator = MAX_NODES;
+    }
     numa_info[nodenr].present = true;
     max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
     ms->numa_state->num_nodes++;
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index ae9c41d02b..a788c3b126 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -18,6 +18,9 @@ struct NodeInfo {
     uint64_t node_mem;
     struct HostMemoryBackend *node_memdev;
     bool present;
+    bool has_cpu;
+    bool initiator_valid;
+    uint16_t initiator;
     uint8_t distance[MAX_NODES];
 };
 
@@ -33,6 +36,9 @@ struct NumaState {
     /* Allow setting NUMA distance for different NUMA nodes */
     bool have_numa_distance;
 
+    /* Detect if HMAT support is enabled. */
+    bool hmat_enabled;
+
     /* NUMA nodes information */
     NodeInfo nodes[MAX_NODES];
 };
diff --git a/qapi/machine.json b/qapi/machine.json
index ca26779f1a..3c2914cd1c 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -463,6 +463,13 @@
 # @memdev: memory backend object.  If specified for one node,
 #          it must be specified for all nodes.
 #
+# @initiator: defined in ACPI 6.3 Chapter 5.2.27.3 Table 5-145,
+#             indicate the nodeid which has the memory controller
+#             responsible for this NUMA node. This field provides
+#             additional information as to the initiator node that
+#             is closest (as in directly attached) to this node, and
+#             therefore has the best performance (since 4.2)
+#
 # Since: 2.1
 ##
 { 'struct': 'NumaNodeOptions',
@@ -470,7 +477,8 @@
    '*nodeid': 'uint16',
    '*cpus':   ['uint16'],
    '*mem':    'size',
-   '*memdev': 'str' }}
+   '*memdev': 'str',
+   '*initiator': 'uint16' }}
 
 ##
 # @NumaDistOptions:
diff --git a/qemu-options.hx b/qemu-options.hx
index bbfd936d29..74ccc4d782 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -43,7 +43,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "                suppress-vmdesc=on|off disables self-describing migration (default=off)\n"
     "                nvdimm=on|off controls NVDIMM support (default=off)\n"
     "                enforce-config-section=on|off enforce configuration section migration (default=off)\n"
-    "                memory-encryption=@var{} memory encryption object to use (default=none)\n",
+    "                memory-encryption=@var{} memory encryption object to use (default=none)\n"
+    "                hmat=on|off controls ACPI HMAT support (default=off)\n",
     QEMU_ARCH_ALL)
 STEXI
 @item -machine [type=]@var{name}[,prop=@var{value}[,...]]
@@ -103,6 +104,9 @@ NOTE: this parameter is deprecated. Please use @option{-global}
 @option{migration.send-configuration}=@var{on|off} instead.
 @item memory-encryption=@var{}
 Memory encryption object to use. The default is none.
+@item hmat=on|off
+Enables or disables ACPI Heterogeneous Memory Attribute Table (HMAT) support.
+The default is off.
 @end table
 ETEXI
 
@@ -161,14 +165,14 @@ If any on the three values is given, the total number of CPUs @var{n} can be omi
 ETEXI
 
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
-    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
-    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
+    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
+    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa dist,src=source,dst=destination,val=distance\n"
     "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
     QEMU_ARCH_ALL)
 STEXI
-@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
-@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
+@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
+@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
 @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
 @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
 @findex -numa
@@ -215,6 +219,27 @@ split equally between them.
 @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore,
 if one node uses @samp{memdev}, all of them have to use it.
 
+@samp{initiator} is an additional option indicate the @var{initiator}
+NUMA that has best performance (the lowest latency or largest bandwidth)
+to this NUMA @var{node}. Note that this option can be set only when
+the machine oprion properties "-machine hmat=on".
+
+Following example creates a machine with 2 NUMA nodes, node 0 has CPU.
+node 1 has only memory, and its' initiator is node 0. Note that because
+node 0 has CPU, by default the initiator of node 0 is itself and must be
+itself.
+@example
+-machine hmat=on \
+-m 2G,slots=2,maxmem=4G \
+-object memory-backend-ram,size=1G,id=m0 \
+-object memory-backend-ram,size=1G,id=m1 \
+-numa node,nodeid=0,memdev=m0 \
+-numa node,nodeid=1,memdev=m1,initiator=0 \
+-smp 2,sockets=2,maxcpus=2  \
+-numa cpu,node-id=0,socket-id=0 \
+-numa cpu,node-id=0,socket-id=1
+@end example
+
 @var{source} and @var{destination} are NUMA node IDs.
 @var{distance} is the NUMA distance from @var{source} to @var{destination}.
 The distance from a node to itself is always 10. If any pair of nodes is
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v12 06/11] numa: Extend CLI to provide memory latency and bandwidth information
  2019-09-20  7:43 [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (4 preceding siblings ...)
  2019-09-20  7:43 ` [PATCH v12 05/11] numa: Extend CLI to provide initiator information for numa nodes Tao Xu
@ 2019-09-20  7:43 ` Tao Xu
  2019-10-02 15:16   ` Igor Mammedov
  2019-09-20  7:43 ` [PATCH v12 07/11] numa: Extend CLI to provide memory side cache information Tao Xu
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-09-20  7:43 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron,
	dan.j.williams

From: Liu Jingqi <jingqi.liu@intel.com>

Add -numa hmat-lb option to provide System Locality Latency and
Bandwidth Information. These memory attributes help to build
System Locality Latency and Bandwidth Information Structure(s)
in ACPI Heterogeneous Memory Attribute Table (HMAT).

Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v12.

Changes in v11:
    - Move numa option patches forward.
    - Add num_initiator in Numa_state to record the number of
      initiators.
    - Simplify struct HMAT_LB_Info, use uint64_t array to store data.
    - Drop hmat_get_base().

Changes in v10:
    - use new builtin type 'time' as qapi input.
---
 hw/core/numa.c        | 114 ++++++++++++++++++++++++++++++++++++++++++
 include/sysemu/numa.h |  44 ++++++++++++++++
 qapi/machine.json     |  95 ++++++++++++++++++++++++++++++++++-
 qemu-options.hx       |  49 +++++++++++++++++-
 4 files changed, 299 insertions(+), 3 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index eff5491f6f..f5a1c9e909 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -199,6 +199,100 @@ void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
     ms->numa_state->have_numa_distance = true;
 }
 
+void parse_numa_hmat_lb(NumaState *nstat, NumaHmatLBOptions *node,
+                        Error **errp)
+{
+    int i;
+    int init = node->initiator;
+    int targ = node->target;
+    int nb_nodes = nstat->num_nodes;
+    NodeInfo *numa_info = nstat->nodes;
+    HMAT_LB_Info *hmat_lb = nstat->hmat_lb[node->hierarchy][node->data_type];
+
+    /* Error checking */
+    if (init >= nb_nodes) {
+        error_setg(errp, "Invalid initiator=%d, it should be less than %d.",
+                   init, nb_nodes);
+        return;
+    }
+    if (targ >= nb_nodes) {
+        error_setg(errp, "Invalid target=%d, it should be less than %d.",
+                   targ, nb_nodes);
+        return;
+    }
+    if (!numa_info[init].has_cpu) {
+        error_setg(errp, "Invalid initiator=%d, it isn't an "
+                   "initiator proximity domain.", init);
+        return;
+    }
+    if (!numa_info[targ].present) {
+        error_setg(errp, "Invalid target=%d, it hasn't a valid NUMA node.",
+                   targ);
+        return;
+    }
+
+    /* HMAT latency and bandwidth data initialization */
+    if (nstat->num_initiator == 0) {
+        for (i = 0; i < nstat->num_nodes; i++) {
+            if (numa_info[i].has_cpu) {
+                nstat->num_initiator++;
+            }
+        }
+    }
+
+    if (!hmat_lb) {
+        int size = nstat->num_initiator * nb_nodes * sizeof(uint64_t);
+        hmat_lb = g_malloc0(sizeof(*hmat_lb));
+        nstat->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
+        hmat_lb->latency = g_malloc0(size);
+        hmat_lb->bandwidth = g_malloc0(size);
+    }
+    hmat_lb->hierarchy = node->hierarchy;
+    hmat_lb->data_type = node->data_type;
+
+    /* Input latency data */
+    if (node->data_type <= HMATLB_DATA_TYPE_WRITE_LATENCY) {
+        if (!node->has_latency) {
+            error_setg(errp, "Missing 'latency' option.");
+            return;
+        }
+        if (node->has_bandwidth) {
+            error_setg(errp, "Invalid option 'bandwidth' since "
+                       "the data type is latency.");
+            return;
+        }
+        if (hmat_lb->latency[init * nb_nodes + targ]) {
+            error_setg(errp, "Duplicate configuration of the latency for "
+                        "initiator=%d and target=%d.", init, targ);
+            return;
+        }
+
+        hmat_lb->latency[init * nb_nodes + targ] = node->latency;
+    }
+
+    /* Input bandwidth data */
+    if (node->data_type >= HMATLB_DATA_TYPE_ACCESS_BANDWIDTH) {
+        if (!node->has_bandwidth) {
+            error_setg(errp, "Missing 'bandwidth' option.");
+            return;
+        }
+        if (node->has_latency) {
+            error_setg(errp, "Invalid option 'latency' since "
+                       "the data type is bandwidth.");
+            return;
+        }
+        if (hmat_lb->bandwidth[init * nb_nodes + targ]) {
+            error_setg(errp, "Duplicate configuration of the bandwidth for "
+                        "initiator=%d and target=%d.", init, targ);
+            return;
+        }
+
+        /* Convert Byte to Megabyte */
+        hmat_lb->bandwidth[init * nb_nodes + targ] =
+            node->bandwidth / 1024 / 1024;
+    }
+}
+
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
 {
     Error *err = NULL;
@@ -237,6 +331,19 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
         machine_set_cpu_numa_node(ms, qapi_NumaCpuOptions_base(&object->u.cpu),
                                   &err);
         break;
+    case NUMA_OPTIONS_TYPE_HMAT_LB:
+        if (!ms->numa_state->hmat_enabled) {
+            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
+                       "(HMAT) is disabled, use -machine hmat=on before "
+                       "set initiator of NUMA");
+            return;
+        }
+
+        parse_numa_hmat_lb(ms->numa_state, &object->u.hmat_lb, &err);
+        if (err) {
+            goto end;
+        }
+        break;
     default:
         abort();
     }
@@ -264,6 +371,13 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
         qemu_strtosz_MiB(mem_str, NULL, &object->u.node.mem);
     }
 
+    /* Set up suffix-less bandwidth as megabytes */
+    if ((object->type == NUMA_OPTIONS_TYPE_HMAT_LB) &&
+        object->u.hmat_lb.has_bandwidth) {
+        const char *bw_str = qemu_opt_get(opts, "bandwidth");
+        qemu_strtosz_MiB(bw_str, NULL, &object->u.hmat_lb.bandwidth);
+    }
+
     set_numa_options(ms, object, &err);
 
 end:
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index a788c3b126..876beaee22 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -14,6 +14,27 @@ struct CPUArchId;
 #define NUMA_DISTANCE_MAX         254
 #define NUMA_DISTANCE_UNREACHABLE 255
 
+/* the value of AcpiHmatLBInfo flags */
+enum {
+    HMAT_LB_MEM_MEMORY           = 0,
+    HMAT_LB_MEM_CACHE_1ST_LEVEL  = 1,
+    HMAT_LB_MEM_CACHE_2ND_LEVEL  = 2,
+    HMAT_LB_MEM_CACHE_3RD_LEVEL  = 3,
+};
+
+/* the value of AcpiHmatLBInfo data type */
+enum {
+    HMAT_LB_DATA_ACCESS_LATENCY   = 0,
+    HMAT_LB_DATA_READ_LATENCY     = 1,
+    HMAT_LB_DATA_WRITE_LATENCY    = 2,
+    HMAT_LB_DATA_ACCESS_BANDWIDTH = 3,
+    HMAT_LB_DATA_READ_BANDWIDTH   = 4,
+    HMAT_LB_DATA_WRITE_BANDWIDTH  = 5,
+};
+
+#define HMAT_LB_LEVELS    (HMAT_LB_MEM_CACHE_3RD_LEVEL + 1)
+#define HMAT_LB_TYPES     (HMAT_LB_DATA_WRITE_BANDWIDTH + 1)
+
 struct NodeInfo {
     uint64_t node_mem;
     struct HostMemoryBackend *node_memdev;
@@ -29,6 +50,21 @@ struct NumaNodeMem {
     uint64_t node_plugged_mem;
 };
 
+struct HMAT_LB_Info {
+    /* Indicates it's memory or the specified level memory side cache. */
+    uint8_t     hierarchy;
+
+    /* Present the type of data, access/read/write latency or bandwidth. */
+    uint8_t     data_type;
+
+    /* Array to store the latencies */
+    uint64_t    *latency;
+
+    /* Array to store the bandwidthes */
+    uint64_t    *bandwidth;
+};
+typedef struct HMAT_LB_Info HMAT_LB_Info;
+
 struct NumaState {
     /* Number of NUMA nodes */
     int num_nodes;
@@ -39,13 +75,21 @@ struct NumaState {
     /* Detect if HMAT support is enabled. */
     bool hmat_enabled;
 
+    /* Number of Proximity Domains that can initiate memory access requests. */
+    int num_initiator;
+
     /* NUMA nodes information */
     NodeInfo nodes[MAX_NODES];
+
+    /* NUMA nodes HMAT Locality Latency and Bandwidth Information */
+    HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES];
 };
 typedef struct NumaState NumaState;
 
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
 void parse_numa_opts(MachineState *ms);
+void parse_numa_hmat_lb(NumaState *nstat, NumaHmatLBOptions *node,
+                        Error **errp);
 void numa_complete_configuration(MachineState *ms);
 void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
 extern QemuOptsList qemu_numa_opts;
diff --git a/qapi/machine.json b/qapi/machine.json
index 3c2914cd1c..b6019335e8 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -426,10 +426,12 @@
 #
 # @cpu: property based CPU(s) to node mapping (Since: 2.10)
 #
+# @hmat-lb: memory latency and bandwidth information (Since: 4.2)
+#
 # Since: 2.1
 ##
 { 'enum': 'NumaOptionsType',
-  'data': [ 'node', 'dist', 'cpu' ] }
+  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
 
 ##
 # @NumaOptions:
@@ -444,7 +446,8 @@
   'data': {
     'node': 'NumaNodeOptions',
     'dist': 'NumaDistOptions',
-    'cpu': 'NumaCpuOptions' }}
+    'cpu': 'NumaCpuOptions',
+    'hmat-lb': 'NumaHmatLBOptions' }}
 
 ##
 # @NumaNodeOptions:
@@ -557,6 +560,94 @@
    'base': 'CpuInstanceProperties',
    'data' : {} }
 
+##
+# @HmatLBMemoryHierarchy:
+#
+# The memory hierarchy in the System Locality Latency
+# and Bandwidth Information Structure of HMAT (Heterogeneous
+# Memory Attribute Table)
+#
+# For more information of @HmatLBMemoryHierarchy see
+# the chapter 5.2.27.4: Table 5-142: Field "Flags" of ACPI 6.3 spec.
+#
+# @memory: the structure represents the memory performance
+#
+# @first-level: first level memory of memory side cached memory
+#
+# @second-level: second level memory of memory side cached memory
+#
+# @third-level: third level memory of memory side cached memory
+#
+# Since: 4.2
+##
+{ 'enum': 'HmatLBMemoryHierarchy',
+  'data': [ 'memory', 'first-level', 'second-level', 'third-level' ] }
+
+##
+# @HmatLBDataType:
+#
+# Data type in the System Locality Latency
+# and Bandwidth Information Structure of HMAT (Heterogeneous
+# Memory Attribute Table)
+#
+# For more information of @HmatLBDataType see
+# the chapter 5.2.27.4: Table 5-142:  Field "Data Type" of ACPI 6.3 spec.
+#
+# @access-latency: access latency (nanoseconds)
+#
+# @read-latency: read latency (nanoseconds)
+#
+# @write-latency: write latency (nanoseconds)
+#
+# @access-bandwidth: access bandwidth (MB/s)
+#
+# @read-bandwidth: read bandwidth (MB/s)
+#
+# @write-bandwidth: write bandwidth (MB/s)
+#
+# Since: 4.2
+##
+{ 'enum': 'HmatLBDataType',
+  'data': [ 'access-latency', 'read-latency', 'write-latency',
+            'access-bandwidth', 'read-bandwidth', 'write-bandwidth' ] }
+
+##
+# @NumaHmatLBOptions:
+#
+# Set the system locality latency and bandwidth information
+# between Initiator and Target proximity Domains.
+#
+# For more information of @NumaHmatLBOptions see
+# the chapter 5.2.27.4: Table 5-142 of ACPI 6.3 spec.
+#
+# @initiator: the Initiator Proximity Domain.
+#
+# @target: the Target Proximity Domain.
+#
+# @hierarchy: the Memory Hierarchy. Indicates the performance
+#             of memory or side cache.
+#
+# @data-type: presents the type of data, access/read/write
+#             latency or hit latency.
+#
+# @latency: the value of latency from @initiator to @target proximity domain,
+#           the latency units are "ps(picosecond)", "ns(nanosecond)" or
+#           "us(microsecond)".
+#
+# @bandwidth: the value of bandwidth between @initiator and @target proximity
+#             domain, the bandwidth units are "MB(/s)","GB(/s)" or "TB(/s)".
+#
+# Since: 4.2
+##
+{ 'struct': 'NumaHmatLBOptions',
+    'data': {
+    'initiator': 'uint16',
+    'target': 'uint16',
+    'hierarchy': 'HmatLBMemoryHierarchy',
+    'data-type': 'HmatLBDataType',
+    '*latency': 'time',
+    '*bandwidth': 'size' }}
+
 ##
 # @HostMemPolicy:
 #
diff --git a/qemu-options.hx b/qemu-options.hx
index 74ccc4d782..129da0cdc3 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -168,16 +168,19 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
     "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa dist,src=source,dst=destination,val=distance\n"
-    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
+    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
+    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
     QEMU_ARCH_ALL)
 STEXI
 @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
 @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
 @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
 @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
+@itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{str},data-type=@var{str}[,latency=@var{lat}][,bandwidth=@var{bw}]
 @findex -numa
 Define a NUMA node and assign RAM and VCPUs to it.
 Set the NUMA distance from a source node to a destination node.
+Set the ACPI Heterogeneous Memory Attributes for the given nodes.
 
 Legacy VCPU assignment uses @samp{cpus} option where
 @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
@@ -256,6 +259,50 @@ specified resources, it just assigns existing resources to NUMA
 nodes. This means that one still has to use the @option{-m},
 @option{-smp} options to allocate RAM and VCPUs respectively.
 
+Use @samp{hmat-lb} to set System Locality Latency and Bandwidth Information
+between initiator and target NUMA nodes in ACPI Heterogeneous Attribute Memory Table (HMAT).
+Initiator NUMA node can create memory requests, usually including one or more processors.
+Target NUMA node contains addressable memory.
+
+In @samp{hmat-lb} option, @var{node} are NUMA node IDs. @var{str} of 'hierarchy'
+is the memory hierarchy of the target NUMA node: if @var{str} is 'memory', the structure
+represents the memory performance; if @var{str} is 'first-level|second-level|third-level',
+this structure represents aggregated performance of memory side caches for each domain.
+@var{str} of 'data-type' is type of data represented by this structure instance:
+if 'hierarchy' is 'memory', 'data-type' is 'access|read|write' latency(nanoseconds)
+or 'access|read|write' bandwidth(MB/s) of the target memory; if 'hierarchy' is
+'first-level|second-level|third-level', 'data-type' is 'access|read|write' hit latency
+or 'access|read|write' hit bandwidth of the target memory side cache.
+
+@var{lat} of 'latency' is latency value, the possible value and units are
+NUM[ps|ns|us] (picosecond|nanosecond|microsecond), the recommended unit is 'ns'. @var{bw}
+is bandwidth value, the possible value and units are NUM[M|G|T], mean that
+the bandwidth value are NUM MB/s, GB/s or TB/s. Note that max NUM is 65534,
+if NUM is 0, means the corresponding latency or bandwidth information is not provided.
+And if input numbers without any unit, the latency unit will be 'ps' and the bandwidth
+will be MB/s.
+
+For example, the following option assigns NUMA node 0 and 1. Node 0 has 2 cpus and
+a ram, node 1 has only a ram. The processors in node 0 access memory in node
+0 with access-latency 5 nanoseconds, access-bandwidth is 200 MB/s;
+The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
+nanoseconds, access-bandwidth is 100 MB/s.
+@example
+-machine hmat=on \
+-m 2G \
+-object memory-backend-ram,size=1G,id=m0 \
+-object memory-backend-ram,size=1G,id=m1 \
+-smp 2 \
+-numa node,nodeid=0,memdev=m0 \
+-numa node,nodeid=1,memdev=m1,initiator=0 \
+-numa cpu,node-id=0,socket-id=0 \
+-numa cpu,node-id=0,socket-id=1 \
+-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
+-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
+-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
+-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M
+@end example
+
 ETEXI
 
 DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v12 07/11] numa: Extend CLI to provide memory side cache information
  2019-09-20  7:43 [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (5 preceding siblings ...)
  2019-09-20  7:43 ` [PATCH v12 06/11] numa: Extend CLI to provide memory latency and bandwidth information Tao Xu
@ 2019-09-20  7:43 ` Tao Xu
  2019-10-03 11:19   ` Igor Mammedov
  2019-09-20  7:43 ` [PATCH v12 08/11] hmat acpi: Build Memory Proximity Domain Attributes Structure(s) Tao Xu
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-09-20  7:43 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, Daniel Black,
	jonathan.cameron, dan.j.williams

From: Liu Jingqi <jingqi.liu@intel.com>

Add -numa hmat-cache option to provide Memory Side Cache Information.
These memory attributes help to build Memory Side Cache Information
Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT).

Reviewed-by: Daniel Black <daniel@linux.ibm.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v12.

Changes in v11:
    - Move numa option patches forward.
---
 hw/core/numa.c        | 74 +++++++++++++++++++++++++++++++++++++++
 include/sysemu/numa.h | 31 +++++++++++++++++
 qapi/machine.json     | 81 +++++++++++++++++++++++++++++++++++++++++--
 qemu-options.hx       | 16 +++++++--
 4 files changed, 198 insertions(+), 4 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index f5a1c9e909..182e4d9d62 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -293,6 +293,67 @@ void parse_numa_hmat_lb(NumaState *nstat, NumaHmatLBOptions *node,
     }
 }
 
+void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
+                           Error **errp)
+{
+    int nb_numa_nodes = ms->numa_state->num_nodes;
+    HMAT_Cache_Info *hmat_cache = NULL;
+
+    if (node->node_id >= nb_numa_nodes) {
+        error_setg(errp, "Invalid node-id=%" PRIu32
+                   ", it should be less than %d.",
+                   node->node_id, nb_numa_nodes);
+        return;
+    }
+
+    if (node->total > MAX_HMAT_CACHE_LEVEL) {
+        error_setg(errp, "Invalid total=%" PRIu8
+                   ", it should be less than or equal to %d.",
+                   node->total, MAX_HMAT_CACHE_LEVEL);
+        return;
+    }
+    if (node->level > node->total) {
+        error_setg(errp, "Invalid level=%" PRIu8
+                   ", it should be less than or equal to"
+                   " total=%" PRIu8 ".",
+                   node->level, node->total);
+        return;
+    }
+    if (ms->numa_state->hmat_cache[node->node_id][node->level]) {
+        error_setg(errp, "Duplicate configuration of the side cache for "
+                   "node-id=%" PRIu32 " and level=%" PRIu8 ".",
+                   node->node_id, node->level);
+        return;
+    }
+
+    if ((node->level > 1) &&
+        ms->numa_state->hmat_cache[node->node_id][node->level - 1] &&
+        (node->size >=
+            ms->numa_state->hmat_cache[node->node_id][node->level - 1]->size)) {
+        error_setg(errp, "Invalid size=0x%" PRIx64
+                   ", the size of level=%" PRIu8
+                   " should be less than the size(0x%" PRIx64
+                   ") of level=%" PRIu8 ".",
+                   node->size, node->level,
+                   ms->numa_state->hmat_cache[node->node_id]
+                                             [node->level - 1]->size,
+                   node->level - 1);
+        return;
+    }
+
+    hmat_cache = g_malloc0(sizeof(*hmat_cache));
+
+    hmat_cache->mem_proximity = node->node_id;
+    hmat_cache->size = node->size;
+    hmat_cache->total_levels = node->total;
+    hmat_cache->level = node->level;
+    hmat_cache->associativity = node->assoc;
+    hmat_cache->write_policy = node->policy;
+    hmat_cache->line_size = node->line;
+
+    ms->numa_state->hmat_cache[node->node_id][node->level] = hmat_cache;
+}
+
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
 {
     Error *err = NULL;
@@ -344,6 +405,19 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
             goto end;
         }
         break;
+    case NUMA_OPTIONS_TYPE_HMAT_CACHE:
+        if (!ms->numa_state->hmat_enabled) {
+            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
+                       "(HMAT) is disabled, use -machine hmat=on before "
+                       "set initiator of NUMA");
+            return;
+        }
+
+        parse_numa_hmat_cache(ms, &object->u.hmat_cache, &err);
+        if (err) {
+            goto end;
+        }
+        break;
     default:
         abort();
     }
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 876beaee22..39312eefd4 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -35,6 +35,8 @@ enum {
 #define HMAT_LB_LEVELS    (HMAT_LB_MEM_CACHE_3RD_LEVEL + 1)
 #define HMAT_LB_TYPES     (HMAT_LB_DATA_WRITE_BANDWIDTH + 1)
 
+#define MAX_HMAT_CACHE_LEVEL        3
+
 struct NodeInfo {
     uint64_t node_mem;
     struct HostMemoryBackend *node_memdev;
@@ -65,6 +67,30 @@ struct HMAT_LB_Info {
 };
 typedef struct HMAT_LB_Info HMAT_LB_Info;
 
+struct HMAT_Cache_Info {
+    /* The memory proximity domain to which the memory belongs. */
+    uint32_t    mem_proximity;
+
+    /* Size of memory side cache in bytes. */
+    uint64_t    size;
+
+    /* Total cache levels for this memory proximity domain. */
+    uint8_t     total_levels;
+
+    /* Cache level described in this structure. */
+    uint8_t     level;
+
+    /* Cache Associativity: None/Direct Mapped/Comple Cache Indexing */
+    uint8_t     associativity;
+
+    /* Write Policy: None/Write Back(WB)/Write Through(WT) */
+    uint8_t     write_policy;
+
+    /* Cache Line size in bytes. */
+    uint16_t    line_size;
+};
+typedef struct HMAT_Cache_Info HMAT_Cache_Info;
+
 struct NumaState {
     /* Number of NUMA nodes */
     int num_nodes;
@@ -83,6 +109,9 @@ struct NumaState {
 
     /* NUMA nodes HMAT Locality Latency and Bandwidth Information */
     HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES];
+
+    /* Memory Side Cache Information Structure */
+    HMAT_Cache_Info *hmat_cache[MAX_NODES][MAX_HMAT_CACHE_LEVEL + 1];
 };
 typedef struct NumaState NumaState;
 
@@ -90,6 +119,8 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
 void parse_numa_opts(MachineState *ms);
 void parse_numa_hmat_lb(NumaState *nstat, NumaHmatLBOptions *node,
                         Error **errp);
+void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
+                           Error **errp);
 void numa_complete_configuration(MachineState *ms);
 void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
 extern QemuOptsList qemu_numa_opts;
diff --git a/qapi/machine.json b/qapi/machine.json
index b6019335e8..088be81920 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -428,10 +428,12 @@
 #
 # @hmat-lb: memory latency and bandwidth information (Since: 4.2)
 #
+# @hmat-cache: memory side cache information (Since: 4.2)
+#
 # Since: 2.1
 ##
 { 'enum': 'NumaOptionsType',
-  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
+  'data': [ 'node', 'dist', 'cpu', 'hmat-lb', 'hmat-cache' ] }
 
 ##
 # @NumaOptions:
@@ -447,7 +449,8 @@
     'node': 'NumaNodeOptions',
     'dist': 'NumaDistOptions',
     'cpu': 'NumaCpuOptions',
-    'hmat-lb': 'NumaHmatLBOptions' }}
+    'hmat-lb': 'NumaHmatLBOptions',
+    'hmat-cache': 'NumaHmatCacheOptions' }}
 
 ##
 # @NumaNodeOptions:
@@ -648,6 +651,80 @@
     '*latency': 'time',
     '*bandwidth': 'size' }}
 
+##
+# @HmatCacheAssociativity:
+#
+# Cache associativity in the Memory Side Cache
+# Information Structure of HMAT
+#
+# For more information of @HmatCacheAssociativity see
+# the chapter 5.2.27.5: Table 5-143 of ACPI 6.3 spec.
+#
+# @none: None
+#
+# @direct: Direct Mapped
+#
+# @complex: Complex Cache Indexing (implementation specific)
+#
+# Since: 4.2
+##
+{ 'enum': 'HmatCacheAssociativity',
+  'data': [ 'none', 'direct', 'complex' ] }
+
+##
+# @HmatCacheWritePolicy:
+#
+# Cache write policy in the Memory Side Cache
+# Information Structure of HMAT
+#
+# For more information of @HmatCacheWritePolicy see
+# the chapter 5.2.27.5: Table 5-143: Field "Cache Attributes" of ACPI 6.3 spec.
+#
+# @none: None
+#
+# @write-back: Write Back (WB)
+#
+# @write-through: Write Through (WT)
+#
+# Since: 4.2
+##
+{ 'enum': 'HmatCacheWritePolicy',
+  'data': [ 'none', 'write-back', 'write-through' ] }
+
+##
+# @NumaHmatCacheOptions:
+#
+# Set the memory side cache information for a given memory domain.
+#
+# For more information of @NumaHmatCacheOptions see
+# the chapter 5.2.27.5: Table 5-143: Field "Cache Attributes" of ACPI 6.3 spec.
+#
+# @node-id: the memory proximity domain to which the memory belongs.
+#
+# @size: the size of memory side cache in bytes.
+#
+# @total: the total cache levels for this memory proximity domain.
+#
+# @level: the cache level described in this structure.
+#
+# @assoc: the cache associativity, none/direct-mapped/complex(complex cache indexing).
+#
+# @policy: the write policy, none/write-back/write-through.
+#
+# @line: the cache Line size in bytes.
+#
+# Since: 4.2
+##
+{ 'struct': 'NumaHmatCacheOptions',
+  'data': {
+   'node-id': 'uint32',
+   'size': 'size',
+   'total': 'uint8',
+   'level': 'uint8',
+   'assoc': 'HmatCacheAssociativity',
+   'policy': 'HmatCacheWritePolicy',
+   'line': 'uint16' }}
+
 ##
 # @HostMemPolicy:
 #
diff --git a/qemu-options.hx b/qemu-options.hx
index 129da0cdc3..7cf214a653 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -169,7 +169,8 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
     "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa dist,src=source,dst=destination,val=distance\n"
     "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
-    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
+    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n"
+    "-numa hmat-cache,node-id=node,size=size,total=total,level=level[,assoc=none|direct|complex][,policy=none|write-back|write-through][,line=size]\n",
     QEMU_ARCH_ALL)
 STEXI
 @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
@@ -177,6 +178,7 @@ STEXI
 @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
 @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
 @itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{str},data-type=@var{str}[,latency=@var{lat}][,bandwidth=@var{bw}]
+@itemx -numa hmat-cache,node-id=@var{node},size=@var{size},total=@var{total},level=@var{level}[,assoc=@var{str}][,policy=@var{str}][,line=@var{size}]
 @findex -numa
 Define a NUMA node and assign RAM and VCPUs to it.
 Set the NUMA distance from a source node to a destination node.
@@ -282,11 +284,19 @@ if NUM is 0, means the corresponding latency or bandwidth information is not pro
 And if input numbers without any unit, the latency unit will be 'ps' and the bandwidth
 will be MB/s.
 
+In @samp{hmat-cache} option, @var{node-id} is the NUMA-id of the memory belongs.
+@var{size} is the size of memory side cache in bytes. @var{total} is the total cache levels.
+@var{level} is the cache level described in this structure. @var{assoc} is the cache associativity,
+the possible value is 'none/direct(direct-mapped)/complex(complex cache indexing)'.
+@var{policy} is the write policy. @var{line} is the cache Line size in bytes.
+
 For example, the following option assigns NUMA node 0 and 1. Node 0 has 2 cpus and
 a ram, node 1 has only a ram. The processors in node 0 access memory in node
 0 with access-latency 5 nanoseconds, access-bandwidth is 200 MB/s;
 The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
 nanoseconds, access-bandwidth is 100 MB/s.
+And for memory side cache information, NUMA node 0 and 1 both have 1 level memory
+cache, size is 0x20000 bytes, policy is write-back, the cache Line size is 8 bytes:
 @example
 -machine hmat=on \
 -m 2G \
@@ -300,7 +310,9 @@ nanoseconds, access-bandwidth is 100 MB/s.
 -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
 -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
 -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
--numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M
+-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \
+-numa hmat-cache,node-id=0,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8 \
+-numa hmat-cache,node-id=1,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8
 @end example
 
 ETEXI
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v12 08/11] hmat acpi: Build Memory Proximity Domain Attributes Structure(s)
  2019-09-20  7:43 [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (6 preceding siblings ...)
  2019-09-20  7:43 ` [PATCH v12 07/11] numa: Extend CLI to provide memory side cache information Tao Xu
@ 2019-09-20  7:43 ` Tao Xu
  2019-10-03 13:44   ` Igor Mammedov
  2019-09-20  7:43 ` [PATCH v12 09/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao Xu
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-09-20  7:43 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, Daniel Black,
	Jonathan Cameron, dan.j.williams

From: Liu Jingqi <jingqi.liu@intel.com>

HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
(HMAT). The specification references below link:
http://www.uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf

It describes the memory attributes, such as memory side cache
attributes and bandwidth and latency details, related to the
Memory Proximity Domain. The software is
expected to use this information as hint for optimization.

This structure describes Memory Proximity Domain Attributes by memory
subsystem and its associativity with processor proximity domain as well as
hint for memory usage.

In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

Reviewed-by: Daniel Black <daniel@linux.ibm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v12.

Changes in v11:
    - Move numa option patches forward.
---
 hw/acpi/Kconfig       |   5 +++
 hw/acpi/Makefile.objs |   1 +
 hw/acpi/hmat.c        | 101 ++++++++++++++++++++++++++++++++++++++++++
 hw/acpi/hmat.h        |  45 +++++++++++++++++++
 hw/i386/acpi-build.c  |   5 +++
 5 files changed, 157 insertions(+)
 create mode 100644 hw/acpi/hmat.c
 create mode 100644 hw/acpi/hmat.h

diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 7c59cf900b..039bb99efa 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -7,6 +7,7 @@ config ACPI_X86
     select ACPI_NVDIMM
     select ACPI_CPU_HOTPLUG
     select ACPI_MEMORY_HOTPLUG
+    select ACPI_HMAT
 
 config ACPI_X86_ICH
     bool
@@ -31,3 +32,7 @@ config ACPI_VMGENID
     bool
     default y
     depends on PC
+
+config ACPI_HMAT
+    bool
+    depends on ACPI
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 9bb2101e3b..c05019b059 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
 common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
+common-obj-$(CONFIG_ACPI_HMAT) += hmat.o
 common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
 
 common-obj-y += acpi_interface.o
diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
new file mode 100644
index 0000000000..1368fce7ee
--- /dev/null
+++ b/hw/acpi/hmat.c
@@ -0,0 +1,101 @@
+/*
+ * HMAT ACPI Implementation
+ *
+ * Copyright(C) 2019 Intel Corporation.
+ *
+ * Author:
+ *  Liu jingqi <jingqi.liu@linux.intel.com>
+ *  Tao Xu <tao3.xu@intel.com>
+ *
+ * HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
+ * (HMAT)
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/numa.h"
+#include "hw/acpi/hmat.h"
+
+/*
+ * ACPI 6.3:
+ * 5.2.27.3 Memory Proximity Domain Attributes Structure: Table 5-145
+ */
+static void build_hmat_mpda(GArray *table_data, uint16_t flags, int initiator,
+                           int mem_node)
+{
+
+    /* Memory Proximity Domain Attributes Structure */
+    /* Type */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Length */
+    build_append_int_noprefix(table_data, 40, 4);
+    /* Flags */
+    build_append_int_noprefix(table_data, flags, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Proximity Domain for the Attached Initiator */
+    build_append_int_noprefix(table_data, initiator, 4);
+    /* Proximity Domain for the Memory */
+    build_append_int_noprefix(table_data, mem_node, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 4);
+    /*
+     * Reserved:
+     * Previously defined as the Start Address of the System Physical
+     * Address Range. Deprecated since ACPI Spec 6.3.
+     */
+    build_append_int_noprefix(table_data, 0, 8);
+    /*
+     * Reserved:
+     * Previously defined as the Range Length of the region in bytes.
+     * Deprecated since ACPI Spec 6.3.
+     */
+    build_append_int_noprefix(table_data, 0, 8);
+}
+
+/* Build HMAT sub table structures */
+static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
+{
+    uint16_t flags;
+    int i;
+
+    for (i = 0; i < nstat->num_nodes; i++) {
+        flags = 0;
+
+        if (nstat->nodes[i].initiator_valid) {
+            flags |= HMAT_PROX_INIT_VALID;
+        }
+
+        build_hmat_mpda(table_data, flags, nstat->nodes[i].initiator, i);
+    }
+}
+
+void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat)
+{
+    uint64_t hmat_start;
+
+    hmat_start = table_data->len;
+
+    /* reserve space for HMAT header  */
+    acpi_data_push(table_data, 40);
+
+    hmat_build_table_structs(table_data, nstat);
+
+    build_header(linker, table_data,
+                 (void *)(table_data->data + hmat_start),
+                 "HMAT", table_data->len - hmat_start, 2, NULL, NULL);
+}
diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
new file mode 100644
index 0000000000..0c1839cf6f
--- /dev/null
+++ b/hw/acpi/hmat.h
@@ -0,0 +1,45 @@
+/*
+ * HMAT ACPI Implementation Header
+ *
+ * Copyright(C) 2019 Intel Corporation.
+ *
+ * Author:
+ *  Liu jingqi <jingqi.liu@linux.intel.com>
+ *  Tao Xu <tao3.xu@intel.com>
+ *
+ * HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
+ * (HMAT)
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#ifndef HMAT_H
+#define HMAT_H
+
+#include "hw/acpi/acpi-defs.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "hw/acpi/aml-build.h"
+
+/*
+ * ACPI 6.3: 5.2.27.3 Memory Proximity Domain Attributes Structure,
+ * Table 5-145, Field "flag", Bit [0]: set to 1 to indicate that data in
+ * the Proximity Domain for the Attached Initiator field is valid.
+ * Other bits reserved.
+ */
+#define HMAT_PROX_INIT_VALID 0x1
+
+void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat);
+
+#endif
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index e54e571a75..7f2e05f1a9 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -68,6 +68,7 @@
 #include "hw/i386/intel_iommu.h"
 
 #include "hw/acpi/ipmi.h"
+#include "hw/acpi/hmat.h"
 
 /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
  * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
@@ -2698,6 +2699,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
             acpi_add_table(table_offsets, tables_blob);
             build_slit(tables_blob, tables->linker, machine);
         }
+        if (machine->numa_state->hmat_enabled) {
+            acpi_add_table(table_offsets, tables_blob);
+            build_hmat(tables_blob, tables->linker, machine->numa_state);
+        }
     }
     if (acpi_get_mcfg(&mcfg)) {
         acpi_add_table(table_offsets, tables_blob);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v12 09/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)
  2019-09-20  7:43 [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (7 preceding siblings ...)
  2019-09-20  7:43 ` [PATCH v12 08/11] hmat acpi: Build Memory Proximity Domain Attributes Structure(s) Tao Xu
@ 2019-09-20  7:43 ` Tao Xu
  2019-10-03 14:41   ` Igor Mammedov
  2019-09-20  7:43 ` [PATCH v12 10/11] hmat acpi: Build Memory Side Cache " Tao Xu
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-09-20  7:43 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron,
	dan.j.williams

From: Liu Jingqi <jingqi.liu@intel.com>

This structure describes the memory access latency and bandwidth
information from various memory access initiator proximity domains.
The latency and bandwidth numbers represented in this structure
correspond to rated latency and bandwidth for the platform.
The software could use this information as hint for optimization.

Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v12:
    - Fix a bug that if HMAT is enabled and without hmat-lb setting,
      QEMU will crash. (reported by Danmei Wei)

Changes in v11:
    - Calculate base in build_hmat_lb().
---
 hw/acpi/hmat.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/acpi/hmat.h |   2 +
 2 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index 1368fce7ee..e7be849581 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -27,6 +27,7 @@
 #include "qemu/osdep.h"
 #include "sysemu/numa.h"
 #include "hw/acpi/hmat.h"
+#include "qemu/error-report.h"
 
 /*
  * ACPI 6.3:
@@ -67,11 +68,105 @@ static void build_hmat_mpda(GArray *table_data, uint16_t flags, int initiator,
     build_append_int_noprefix(table_data, 0, 8);
 }
 
+static bool entry_overflow(uint64_t *lb_data, uint64_t base, int len)
+{
+    int i;
+
+    for (i = 0; i < len; i++) {
+        if (lb_data[i] / base >= UINT16_MAX) {
+            return true;
+        }
+    }
+
+    return false;
+}
+/*
+ * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+ * Structure: Table 5-146
+ */
+static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
+                          uint32_t num_initiator, uint32_t num_target,
+                          uint32_t *initiator_list, int type)
+{
+    uint8_t mask = 0x0f;
+    uint32_t s = num_initiator;
+    uint32_t t = num_target;
+    uint64_t base = 1;
+    uint64_t *lb_data;
+    int i, unit;
+
+    /* Type */
+    build_append_int_noprefix(table_data, 1, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Length */
+    build_append_int_noprefix(table_data, 32 + 4 * s + 4 * t + 2 * s * t, 4);
+    /* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
+    build_append_int_noprefix(table_data, hmat_lb->hierarchy & mask, 1);
+    /* Data Type */
+    build_append_int_noprefix(table_data, hmat_lb->data_type, 1);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Number of Initiator Proximity Domains (s) */
+    build_append_int_noprefix(table_data, s, 4);
+    /* Number of Target Proximity Domains (t) */
+    build_append_int_noprefix(table_data, t, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 4);
+
+    if (HMAT_IS_LATENCY(type)) {
+        unit = 1000;
+        lb_data = hmat_lb->latency;
+    } else {
+        unit = 1024;
+        lb_data = hmat_lb->bandwidth;
+    }
+
+    while (entry_overflow(lb_data, base, s * t)) {
+        for (i = 0; i < s * t; i++) {
+            if (!QEMU_IS_ALIGNED(lb_data[i], unit * base)) {
+                error_report("Invalid latency/bandwidth input, all "
+                "latencies/bandwidths should be specified in the same units.");
+                exit(1);
+            }
+        }
+        base *= unit;
+    }
+
+    /* Entry Base Unit */
+    build_append_int_noprefix(table_data, base, 8);
+
+    /* Initiator Proximity Domain List */
+    for (i = 0; i < s; i++) {
+        build_append_int_noprefix(table_data, initiator_list[i], 4);
+    }
+
+    /* Target Proximity Domain List */
+    for (i = 0; i < t; i++) {
+        build_append_int_noprefix(table_data, i, 4);
+    }
+
+    /* Latency or Bandwidth Entries */
+    for (i = 0; i < s * t; i++) {
+        uint16_t entry;
+
+        if (HMAT_IS_LATENCY(type)) {
+            entry = hmat_lb->latency[i] / base;
+        } else {
+            entry = hmat_lb->bandwidth[i] / base;
+        }
+
+        build_append_int_noprefix(table_data, entry, 2);
+    }
+}
+
 /* Build HMAT sub table structures */
 static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
 {
     uint16_t flags;
-    int i;
+    uint32_t *initiator_list = NULL;
+    int i, j, hrchy, type;
+    HMAT_LB_Info *numa_hmat_lb;
 
     for (i = 0; i < nstat->num_nodes; i++) {
         flags = 0;
@@ -82,6 +177,35 @@ static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
 
         build_hmat_mpda(table_data, flags, nstat->nodes[i].initiator, i);
     }
+
+    if (nstat->num_initiator) {
+        initiator_list = g_malloc0(nstat->num_initiator * sizeof(uint32_t));
+        for (i = 0, j = 0; i < nstat->num_nodes; i++) {
+            if (nstat->nodes[i].has_cpu) {
+                initiator_list[j] = i;
+                j++;
+            }
+        }
+    }
+
+    /*
+     * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+     * Structure: Table 5-146
+     */
+    for (hrchy = HMAT_LB_MEM_MEMORY;
+         hrchy <= HMAT_LB_MEM_CACHE_3RD_LEVEL; hrchy++) {
+        for (type = HMAT_LB_DATA_ACCESS_LATENCY;
+             type <= HMAT_LB_DATA_WRITE_BANDWIDTH; type++) {
+            numa_hmat_lb = nstat->hmat_lb[hrchy][type];
+
+            if (numa_hmat_lb) {
+                build_hmat_lb(table_data, numa_hmat_lb, nstat->num_initiator,
+                              nstat->num_nodes, initiator_list, type);
+            }
+        }
+    }
+
+    g_free(initiator_list);
 }
 
 void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat)
diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
index 0c1839cf6f..1154dfb48e 100644
--- a/hw/acpi/hmat.h
+++ b/hw/acpi/hmat.h
@@ -40,6 +40,8 @@
  */
 #define HMAT_PROX_INIT_VALID 0x1
 
+#define HMAT_IS_LATENCY(type) (type <= HMAT_LB_DATA_WRITE_LATENCY)
+
 void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat);
 
 #endif
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v12 10/11] hmat acpi: Build Memory Side Cache Information Structure(s)
  2019-09-20  7:43 [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (8 preceding siblings ...)
  2019-09-20  7:43 ` [PATCH v12 09/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao Xu
@ 2019-09-20  7:43 ` Tao Xu
  2019-10-04  8:01   ` Igor Mammedov
  2019-09-20  7:43 ` [PATCH v12 11/11] tests/bios-tables-test: add test cases for ACPI HMAT Tao Xu
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-09-20  7:43 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, Daniel Black,
	Jonathan Cameron, dan.j.williams

From: Liu Jingqi <jingqi.liu@intel.com>

This structure describes memory side cache information for memory
proximity domains if the memory side cache is present and the
physical device forms the memory side cache.
The software could use this information to effectively place
the data in memory to maximize the performance of the system
memory that use the memory side cache.

Reviewed-by: Daniel Black <daniel@linux.ibm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v12.

Changes in v11:
    - Move numa option patches forward.
---
 hw/acpi/hmat.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 63 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index e7be849581..6b260eeef5 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -160,13 +160,62 @@ static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
     }
 }
 
+/* ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure: Table 5-147 */
+static void build_hmat_cache(GArray *table_data, HMAT_Cache_Info *hmat_cache)
+{
+    /*
+     * Cache Attributes: Bits [3:0] – Total Cache Levels
+     * for this Memory Proximity Domain
+     */
+    uint32_t cache_attr = hmat_cache->total_levels & 0xF;
+
+    /* Bits [7:4] : Cache Level described in this structure */
+    cache_attr |= (hmat_cache->level & 0xF) << 4;
+
+    /* Bits [11:8] - Cache Associativity */
+    cache_attr |= (hmat_cache->associativity & 0xF) << 8;
+
+    /* Bits [15:12] - Write Policy */
+    cache_attr |= (hmat_cache->write_policy & 0xF) << 12;
+
+    /* Bits [31:16] - Cache Line size in bytes */
+    cache_attr |= (hmat_cache->line_size & 0xFFFF) << 16;
+
+    cache_attr = cpu_to_le32(cache_attr);
+
+    /* Type */
+    build_append_int_noprefix(table_data, 2, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Length */
+    build_append_int_noprefix(table_data, 32, 4);
+    /* Proximity Domain for the Memory */
+    build_append_int_noprefix(table_data, hmat_cache->mem_proximity, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 4);
+    /* Memory Side Cache Size */
+    build_append_int_noprefix(table_data, hmat_cache->size, 8);
+    /* Cache Attributes */
+    build_append_int_noprefix(table_data, cache_attr, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /*
+     * Number of SMBIOS handles (n)
+     * Linux kernel uses Memory Side Cache Information Structure
+     * without SMBIOS entries for now, so set Number of SMBIOS handles
+     * as 0.
+     */
+    build_append_int_noprefix(table_data, 0, 2);
+}
+
 /* Build HMAT sub table structures */
 static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
 {
     uint16_t flags;
     uint32_t *initiator_list = NULL;
-    int i, j, hrchy, type;
+    int i, j, hrchy, type, level;
     HMAT_LB_Info *numa_hmat_lb;
+    HMAT_Cache_Info *numa_hmat_cache;
 
     for (i = 0; i < nstat->num_nodes; i++) {
         flags = 0;
@@ -205,6 +254,19 @@ static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
         }
     }
 
+    /*
+     * ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure:
+     * Table 5-147
+     */
+    for (i = 0; i < nstat->num_nodes; i++) {
+        for (level = 0; level <= MAX_HMAT_CACHE_LEVEL; level++) {
+            numa_hmat_cache = nstat->hmat_cache[i][level];
+            if (numa_hmat_cache) {
+                build_hmat_cache(table_data, numa_hmat_cache);
+            }
+        }
+    }
+
     g_free(initiator_list);
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v12 11/11] tests/bios-tables-test: add test cases for ACPI HMAT
  2019-09-20  7:43 [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (9 preceding siblings ...)
  2019-09-20  7:43 ` [PATCH v12 10/11] hmat acpi: Build Memory Side Cache " Tao Xu
@ 2019-09-20  7:43 ` Tao Xu
  2019-10-04  8:08   ` Igor Mammedov
  2019-09-21  1:39 ` [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) no-reply
  2019-09-21  1:53 ` no-reply
  12 siblings, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-09-20  7:43 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: Jingqi Liu, tao3.xu, fan.du, qemu-devel, Daniel Black,
	jonathan.cameron, dan.j.williams

ACPI table HMAT has been introduced, QEMU now builds HMAT tables for
Heterogeneous Memory with boot option '-numa node'.

Add test cases on PC and Q35 machines with 2 numa nodes.
Because HMAT is generated when system enable numa, the
following tables need to be added for this test:
  tests/acpi-test-data/pc/*.acpihmat
  tests/acpi-test-data/pc/HMAT.*
  tests/acpi-test-data/q35/*.acpihmat
  tests/acpi-test-data/q35/HMAT.*

Reviewed-by: Daniel Black <daniel@linux.ibm.com>
Reviewed-by: Jingqi Liu <Jingqi.liu@intel.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in V11 and v12.

Changes in v10:
    - Update test case, add "-machine hmat=on"
---
 tests/bios-tables-test.c | 44 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index 9b3d8b0d1b..976788b6fa 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -870,6 +870,48 @@ static void test_acpi_piix4_tcg_dimm_pxm(void)
     test_acpi_tcg_dimm_pxm(MACHINE_PC);
 }
 
+static void test_acpi_tcg_acpi_hmat(const char *machine)
+{
+    test_data data;
+
+    memset(&data, 0, sizeof(data));
+    data.machine = machine;
+    data.variant = ".acpihmat";
+    test_acpi_one(" -machine hmat=on"
+                  " -smp 2,sockets=2"
+                  " -m 128M,slots=2,maxmem=1G"
+                  " -object memory-backend-ram,size=64M,id=m0"
+                  " -object memory-backend-ram,size=64M,id=m1"
+                  " -numa node,nodeid=0,memdev=m0"
+                  " -numa node,nodeid=1,memdev=m1,initiator=0"
+                  " -numa cpu,node-id=0,socket-id=0"
+                  " -numa cpu,node-id=0,socket-id=1"
+                  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+                  "data-type=access-latency,latency=5ns"
+                  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=500M"
+                  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
+                  "data-type=access-latency,latency=10ns"
+                  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=100M"
+                  " -numa hmat-cache,node-id=0,size=0x20000,total=1,level=1"
+                  ",assoc=direct,policy=write-back,line=8"
+                  " -numa hmat-cache,node-id=1,size=0x20000,total=1,level=1"
+                  ",assoc=direct,policy=write-back,line=8",
+                  &data);
+    free_test_data(&data);
+}
+
+static void test_acpi_q35_tcg_acpi_hmat(void)
+{
+    test_acpi_tcg_acpi_hmat(MACHINE_Q35);
+}
+
+static void test_acpi_piix4_tcg_acpi_hmat(void)
+{
+    test_acpi_tcg_acpi_hmat(MACHINE_PC);
+}
+
 static void test_acpi_virt_tcg(void)
 {
     test_data data = {
@@ -914,6 +956,8 @@ int main(int argc, char *argv[])
         qtest_add_func("acpi/q35/numamem", test_acpi_q35_tcg_numamem);
         qtest_add_func("acpi/piix4/dimmpxm", test_acpi_piix4_tcg_dimm_pxm);
         qtest_add_func("acpi/q35/dimmpxm", test_acpi_q35_tcg_dimm_pxm);
+        qtest_add_func("acpi/piix4/acpihmat", test_acpi_piix4_tcg_acpi_hmat);
+        qtest_add_func("acpi/q35/acpihmat", test_acpi_q35_tcg_acpi_hmat);
     } else if (strcmp(arch, "aarch64") == 0) {
         qtest_add_func("acpi/virt", test_acpi_virt_tcg);
     }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
  2019-09-20  7:43 [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (10 preceding siblings ...)
  2019-09-20  7:43 ` [PATCH v12 11/11] tests/bios-tables-test: add test cases for ACPI HMAT Tao Xu
@ 2019-09-21  1:39 ` no-reply
  2019-09-21  1:53 ` no-reply
  12 siblings, 0 replies; 34+ messages in thread
From: no-reply @ 2019-09-21  1:39 UTC (permalink / raw)
  To: tao3.xu
  Cc: ehabkost, jingqi.liu, tao3.xu, fan.du, qemu-devel,
	jonathan.cameron, imammedo, dan.j.williams

Patchew URL: https://patchew.org/QEMU/20190920074349.2616-1-tao3.xu@intel.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

libudev           no
default devices   yes

warning: Python 2 support is deprecated
warning: Python 3 will be required for building future versions of QEMU
cross containers  no

NOTE: guest cross-compilers enabled: cc
---
Looking for expected file 'tests/data/acpi/pc/SRAT.acpihmat'
Looking for expected file 'tests/data/acpi/pc/SRAT'
**
ERROR:/tmp/qemu-test/src/tests/bios-tables-test.c:327:load_expected_aml: assertion failed: (exp_sdt.aml_file)
ERROR - Bail out! ERROR:/tmp/qemu-test/src/tests/bios-tables-test.c:327:load_expected_aml: assertion failed: (exp_sdt.aml_file)
make: *** [check-qtest-x86_64] Error 1
make: *** Waiting for unfinished jobs....
  TEST    iotest-qcow2: 038


The full log is available at
http://patchew.org/logs/20190920074349.2616-1-tao3.xu@intel.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
  2019-09-20  7:43 [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (11 preceding siblings ...)
  2019-09-21  1:39 ` [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) no-reply
@ 2019-09-21  1:53 ` no-reply
  12 siblings, 0 replies; 34+ messages in thread
From: no-reply @ 2019-09-21  1:53 UTC (permalink / raw)
  To: tao3.xu
  Cc: ehabkost, jingqi.liu, tao3.xu, fan.du, qemu-devel,
	jonathan.cameron, imammedo, dan.j.williams

Patchew URL: https://patchew.org/QEMU/20190920074349.2616-1-tao3.xu@intel.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

PASS 1 fdc-test /x86_64/fdc/cmos
PASS 2 fdc-test /x86_64/fdc/no_media_on_start
PASS 3 fdc-test /x86_64/fdc/read_without_media
==12418==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 fdc-test /x86_64/fdc/media_change
PASS 5 fdc-test /x86_64/fdc/sense_interrupt
PASS 6 fdc-test /x86_64/fdc/relative_seek
---
PASS 33 test-opts-visitor /visitor/opts/dict/unvisited
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-coroutine -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-coroutine" 
PASS 11 fdc-test /x86_64/fdc/read_no_dma_18
==12466==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==12466==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe003de000; bottom 0x7ff9e34f8000; size: 0x00041cee6000 (17665253376)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-coroutine /basic/no-dangling-access
---
PASS 13 test-aio /aio/event/wait/no-flush-cb
PASS 12 fdc-test /x86_64/fdc/read_no_dma_19
PASS 13 fdc-test /x86_64/fdc/fuzz-registers
==12485==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 14 test-aio /aio/timer/schedule
PASS 15 test-aio /aio/coroutine/queue-chaining
PASS 16 test-aio /aio-gsource/flush
---
PASS 26 test-aio /aio-gsource/event/flush
PASS 27 test-aio /aio-gsource/event/wait/no-flush-cb
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/ide-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="ide-test" 
==12494==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 ide-test /x86_64/ide/identify
PASS 28 test-aio /aio-gsource/timer/schedule
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-aio-multithread -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-aio-multithread" 
==12500==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==12506==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-aio-multithread /aio/multi/lifecycle
PASS 2 ide-test /x86_64/ide/flush
==12521==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 test-aio-multithread /aio/multi/schedule
PASS 3 ide-test /x86_64/ide/bmdma/simple_rw
==12532==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 test-aio-multithread /aio/multi/mutex/contended
PASS 4 ide-test /x86_64/ide/bmdma/trim
==12543==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 5 ide-test /x86_64/ide/bmdma/short_prdt
==12549==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 6 ide-test /x86_64/ide/bmdma/one_sector_short_prdt
==12555==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 test-aio-multithread /aio/multi/mutex/handoff
PASS 7 ide-test /x86_64/ide/bmdma/long_prdt
==12566==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==12566==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fff10f6d000; bottom 0x7f54b57fe000; size: 0x00aa5b76f000 (731678961664)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 5 test-aio-multithread /aio/multi/mutex/mcs
---
PASS 6 test-aio-multithread /aio/multi/mutex/pthread
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-throttle -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-throttle" 
PASS 9 ide-test /x86_64/ide/flush/nodev
==12584==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-throttle /throttle/leak_bucket
PASS 2 test-throttle /throttle/compute_wait
PASS 3 test-throttle /throttle/init
---
PASS 14 test-throttle /throttle/config/max
PASS 15 test-throttle /throttle/config/iops_size
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-thread-pool -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-thread-pool" 
==12590==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-thread-pool /thread-pool/submit
PASS 2 test-thread-pool /thread-pool/submit-aio
PASS 3 test-thread-pool /thread-pool/submit-co
PASS 4 test-thread-pool /thread-pool/submit-many
==12586==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 10 ide-test /x86_64/ide/flush/empty_drive
==12661==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 11 ide-test /x86_64/ide/flush/retry_pci
==12668==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 5 test-thread-pool /thread-pool/cancel
PASS 12 ide-test /x86_64/ide/flush/retry_isa
==12674==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 13 ide-test /x86_64/ide/cdrom/pio
PASS 6 test-thread-pool /thread-pool/cancel-async
==12680==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-hbitmap -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-hbitmap" 
PASS 1 test-hbitmap /hbitmap/granularity
PASS 2 test-hbitmap /hbitmap/size/0
---
PASS 14 test-hbitmap /hbitmap/set/twice
PASS 15 test-hbitmap /hbitmap/set/overlap
PASS 16 test-hbitmap /hbitmap/reset/empty
==12691==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 15 ide-test /x86_64/ide/cdrom/dma
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/ahci-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="ahci-test" 
PASS 17 test-hbitmap /hbitmap/reset/general
---
PASS 28 test-hbitmap /hbitmap/truncate/shrink/medium
PASS 29 test-hbitmap /hbitmap/truncate/shrink/large
PASS 30 test-hbitmap /hbitmap/meta/zero
==12705==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 ahci-test /x86_64/ahci/sanity
==12711==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 ahci-test /x86_64/ahci/pci_spec
==12717==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 ahci-test /x86_64/ahci/pci_enable
==12723==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 ahci-test /x86_64/ahci/hba_spec
PASS 31 test-hbitmap /hbitmap/meta/one
PASS 32 test-hbitmap /hbitmap/meta/byte
PASS 33 test-hbitmap /hbitmap/meta/word
==12729==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 5 ahci-test /x86_64/ahci/hba_enable
PASS 34 test-hbitmap /hbitmap/meta/sector
PASS 35 test-hbitmap /hbitmap/serialize/align
==12735==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 6 ahci-test /x86_64/ahci/identify
==12741==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 36 test-hbitmap /hbitmap/serialize/basic
PASS 7 ahci-test /x86_64/ahci/max
PASS 37 test-hbitmap /hbitmap/serialize/part
---
PASS 44 test-hbitmap /hbitmap/next_dirty_area/next_dirty_area_4
PASS 45 test-hbitmap /hbitmap/next_dirty_area/next_dirty_area_after_truncate
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-bdrv-drain -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bdrv-drain" 
==12750==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==12747==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-bdrv-drain /bdrv-drain/nested
PASS 2 test-bdrv-drain /bdrv-drain/multiparent
PASS 3 test-bdrv-drain /bdrv-drain/set_aio_context
---
PASS 41 test-bdrv-drain /bdrv-drain/bdrv_drop_intermediate/poll
PASS 42 test-bdrv-drain /bdrv-drain/replace_child/mid-drain
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-bdrv-graph-mod -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bdrv-graph-mod" 
==12795==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-bdrv-graph-mod /bdrv-graph-mod/update-perm-tree
PASS 2 test-bdrv-graph-mod /bdrv-graph-mod/should-update-child
PASS 8 ahci-test /x86_64/ahci/reset
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-blockjob -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-blockjob" 
==12801==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-blockjob /blockjob/ids
PASS 2 test-blockjob /blockjob/cancel/created
PASS 3 test-blockjob /blockjob/cancel/running
---
PASS 6 test-blockjob /blockjob/cancel/standby
PASS 7 test-blockjob /blockjob/cancel/pending
PASS 8 test-blockjob /blockjob/cancel/concluded
==12799==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-blockjob-txn -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-blockjob-txn" 
==12810==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-blockjob-txn /single/success
PASS 2 test-blockjob-txn /single/failure
PASS 3 test-blockjob-txn /single/cancel
PASS 4 test-blockjob-txn /pair/success
==12799==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fff4ff76000; bottom 0x7fcda33fe000; size: 0x0031acb78000 (213351104512)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 5 test-blockjob-txn /pair/failure
---
PASS 7 test-blockjob-txn /pair/fail-cancel-race
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-block-backend -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-block-backend" 
PASS 9 ahci-test /x86_64/ahci/io/pio/lba28/simple/zero
==12816==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-block-backend /block-backend/drain_aio_error
PASS 2 test-block-backend /block-backend/drain_all_aio_error
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-block-iothread -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-block-iothread" 
==12818==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==12822==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-block-iothread /sync-op/pread
PASS 2 test-block-iothread /sync-op/pwrite
PASS 3 test-block-iothread /sync-op/load_vmstate
---
PASS 14 test-block-iothread /propagate/basic
PASS 15 test-block-iothread /propagate/diamond
PASS 16 test-block-iothread /propagate/mirror
==12818==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fff92846000; bottom 0x7f040adfe000; size: 0x00fb87a48000 (1080312496128)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-image-locking -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-image-locking" 
==12848==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 10 ahci-test /x86_64/ahci/io/pio/lba28/simple/low
PASS 1 test-image-locking /image-locking/basic
PASS 2 test-image-locking /image-locking/set-perm-abort
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-x86-cpuid -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-x86-cpuid" 
PASS 1 test-x86-cpuid /cpuid/topology/basic
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-xbzrle -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-xbzrle" 
==12852==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-xbzrle /xbzrle/uleb
PASS 2 test-xbzrle /xbzrle/encode_decode_zero
PASS 3 test-xbzrle /xbzrle/encode_decode_unchanged
PASS 4 test-xbzrle /xbzrle/encode_decode_1_byte
PASS 5 test-xbzrle /xbzrle/encode_decode_overflow
==12852==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffd68126000; bottom 0x7f625c3fe000; size: 0x009b0bd28000 (665918275584)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 6 test-xbzrle /xbzrle/encode_decode
---
PASS 1 test-shift128 /host-utils/test_lshift
PASS 2 test-shift128 /host-utils/test_rshift
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-mul64 -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-mul64" 
==12871==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-mul64 /host-utils/mulu64
PASS 2 test-mul64 /host-utils/muls64
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-int128 -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-int128" 
---
PASS 9 test-int128 /int128/int128_gt
PASS 10 test-int128 /int128/int128_rshift
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/rcutorture -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="rcutorture" 
==12871==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe566b7000; bottom 0x7fd5fcffe000; size: 0x0028596b9000 (173298913280)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 12 ahci-test /x86_64/ahci/io/pio/lba28/double/zero
PASS 1 rcutorture /rcu/torture/1reader
==12904==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==12904==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffd7d0a7000; bottom 0x7fa65ebfe000; size: 0x00571e4a9000 (374170357760)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 13 ahci-test /x86_64/ahci/io/pio/lba28/double/low
PASS 2 rcutorture /rcu/torture/10readers
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-rcu-list -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-rcu-list" 
==12926==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==12926==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe90d24000; bottom 0x7f09449fe000; size: 0x00f54c326000 (1053545357312)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-rcu-list /rcu/qlist/single-threaded
PASS 14 ahci-test /x86_64/ahci/io/pio/lba28/double/high
==12945==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==12945==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffeed2ae000; bottom 0x7f98ef9fe000; size: 0x0065fd8b0000 (438045442048)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 2 test-rcu-list /rcu/qlist/short-few
PASS 15 ahci-test /x86_64/ahci/io/pio/lba28/long/zero
==12972==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==12972==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffcb20b6000; bottom 0x7f2ac7524000; size: 0x00d1eab92000 (901586165760)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 16 ahci-test /x86_64/ahci/io/pio/lba28/long/low
PASS 3 test-rcu-list /rcu/qlist/long-many
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-rcu-simpleq -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-rcu-simpleq" 
==12978==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==12978==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffd57d00000; bottom 0x7fed7f77c000; size: 0x000fd8584000 (68054171648)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-rcu-simpleq /rcu/qsimpleq/single-threaded
PASS 17 ahci-test /x86_64/ahci/io/pio/lba28/long/high
PASS 2 test-rcu-simpleq /rcu/qsimpleq/short-few
==12997==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 18 ahci-test /x86_64/ahci/io/pio/lba28/short/zero
==13024==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 19 ahci-test /x86_64/ahci/io/pio/lba28/short/low
PASS 3 test-rcu-simpleq /rcu/qsimpleq/long-many
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-rcu-tailq -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-rcu-tailq" 
==13030==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 20 ahci-test /x86_64/ahci/io/pio/lba28/short/high
PASS 1 test-rcu-tailq /rcu/qtailq/single-threaded
==13043==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13043==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fffb5dfa000; bottom 0x7ff66e5fe000; size: 0x0009477fc000 (39854260224)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 21 ahci-test /x86_64/ahci/io/pio/lba48/simple/zero
PASS 2 test-rcu-tailq /rcu/qtailq/short-few
==13055==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13055==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fff3403f000; bottom 0x7f705f5fe000; size: 0x008ed4a41000 (613452877824)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 22 ahci-test /x86_64/ahci/io/pio/lba48/simple/low
==13082==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13082==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe63b3c000; bottom 0x7f08f09fe000; size: 0x00f57313e000 (1054197669888)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 23 ahci-test /x86_64/ahci/io/pio/lba48/simple/high
PASS 3 test-rcu-tailq /rcu/qtailq/long-many
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-qdist -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qdist" 
==13088==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-qdist /qdist/none
PASS 2 test-qdist /qdist/pr
PASS 3 test-qdist /qdist/single/empty
---
PASS 7 test-qdist /qdist/binning/expand
PASS 8 test-qdist /qdist/binning/shrink
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-qht -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qht" 
==13088==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fff7407c000; bottom 0x7fe080dfe000; size: 0x001ef327e000 (132928495616)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 24 ahci-test /x86_64/ahci/io/pio/lba48/double/zero
==13103==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13103==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fff26f1d000; bottom 0x7fdeb7ffe000; size: 0x00206ef1f000 (139300302848)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 25 ahci-test /x86_64/ahci/io/pio/lba48/double/low
==13109==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13109==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc9740d000; bottom 0x7fe4609fe000; size: 0x001836a0f000 (103995731968)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 26 ahci-test /x86_64/ahci/io/pio/lba48/double/high
==13115==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13115==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe535e5000; bottom 0x7fd8dd77c000; size: 0x002575e69000 (160891834368)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 27 ahci-test /x86_64/ahci/io/pio/lba48/long/zero
==13121==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13121==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc88bb9000; bottom 0x7fc15477c000; size: 0x003b3443d000 (254279929856)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 28 ahci-test /x86_64/ahci/io/pio/lba48/long/low
==13127==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13127==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fff71e70000; bottom 0x7fcd3417c000; size: 0x00323dcf4000 (215785357312)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 29 ahci-test /x86_64/ahci/io/pio/lba48/long/high
==13133==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 30 ahci-test /x86_64/ahci/io/pio/lba48/short/zero
==13139==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 31 ahci-test /x86_64/ahci/io/pio/lba48/short/low
==13145==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 32 ahci-test /x86_64/ahci/io/pio/lba48/short/high
==13151==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 33 ahci-test /x86_64/ahci/io/dma/lba28/fragmented
==13157==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 34 ahci-test /x86_64/ahci/io/dma/lba28/retry
==13163==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-qht /qht/mode/default
PASS 2 test-qht /qht/mode/resize
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-qht-par -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qht-par" 
PASS 35 ahci-test /x86_64/ahci/io/dma/lba28/simple/zero
==13179==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 36 ahci-test /x86_64/ahci/io/dma/lba28/simple/low
PASS 1 test-qht-par /qht/parallel/2threads-0%updates-1s
==13185==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 test-qht-par /qht/parallel/2threads-20%updates-1s
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-bitops -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bitops" 
PASS 37 ahci-test /x86_64/ahci/io/dma/lba28/simple/high
---
PASS 3 test-bitcnt /bitcnt/ctpop32
PASS 4 test-bitcnt /bitcnt/ctpop64
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-qdev-global-props -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qdev-global-props" 
==13203==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-qdev-global-props /qdev/properties/static/default
PASS 2 test-qdev-global-props /qdev/properties/static/global
PASS 3 test-qdev-global-props /qdev/properties/dynamic/global
---
PASS 4 test-crypto-hash /crypto/hash/digest
PASS 5 test-crypto-hash /crypto/hash/base64
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-crypto-hmac -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-hmac" 
==13244==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-crypto-hmac /crypto/hmac/iov
PASS 2 test-crypto-hmac /crypto/hmac/alloc
PASS 3 test-crypto-hmac /crypto/hmac/prealloc
---
PASS 39 ahci-test /x86_64/ahci/io/dma/lba28/double/low
PASS 1 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/perfectserver
PASS 2 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/perfectclient
==13276==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodca1
PASS 40 ahci-test /x86_64/ahci/io/dma/lba28/double/high
==13282==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 41 ahci-test /x86_64/ahci/io/dma/lba28/long/zero
==13288==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 42 ahci-test /x86_64/ahci/io/dma/lba28/long/low
PASS 4 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodca2
PASS 5 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodca3
PASS 6 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/badca1
PASS 7 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/badca2
PASS 8 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/badca3
==13294==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 9 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver1
PASS 43 ahci-test /x86_64/ahci/io/dma/lba28/long/high
PASS 10 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver2
==13300==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 11 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver3
PASS 12 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver4
PASS 44 ahci-test /x86_64/ahci/io/dma/lba28/short/zero
==13306==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 13 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver5
PASS 45 ahci-test /x86_64/ahci/io/dma/lba28/short/low
PASS 14 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver6
==13312==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 15 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver7
PASS 16 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/badserver1
PASS 17 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/badserver2
---
PASS 39 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/missingclient
PASS 46 ahci-test /x86_64/ahci/io/dma/lba28/short/high
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-crypto-tlssession -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-tlssession" 
==13319==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-crypto-tlssession /qcrypto/tlssession/psk
PASS 47 ahci-test /x86_64/ahci/io/dma/lba48/simple/zero
==13329==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 48 ahci-test /x86_64/ahci/io/dma/lba48/simple/low
==13335==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 49 ahci-test /x86_64/ahci/io/dma/lba48/simple/high
==13341==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 50 ahci-test /x86_64/ahci/io/dma/lba48/double/zero
==13347==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 test-crypto-tlssession /qcrypto/tlssession/basicca
PASS 51 ahci-test /x86_64/ahci/io/dma/lba48/double/low
PASS 3 test-crypto-tlssession /qcrypto/tlssession/differentca
==13353==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 test-crypto-tlssession /qcrypto/tlssession/altname1
PASS 52 ahci-test /x86_64/ahci/io/dma/lba48/double/high
==13359==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 53 ahci-test /x86_64/ahci/io/dma/lba48/long/zero
PASS 5 test-crypto-tlssession /qcrypto/tlssession/altname2
==13365==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 6 test-crypto-tlssession /qcrypto/tlssession/altname3
PASS 54 ahci-test /x86_64/ahci/io/dma/lba48/long/low
==13371==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 7 test-crypto-tlssession /qcrypto/tlssession/altname4
PASS 55 ahci-test /x86_64/ahci/io/dma/lba48/long/high
PASS 8 test-crypto-tlssession /qcrypto/tlssession/altname5
==13377==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 9 test-crypto-tlssession /qcrypto/tlssession/altname6
PASS 10 test-crypto-tlssession /qcrypto/tlssession/wildcard1
PASS 56 ahci-test /x86_64/ahci/io/dma/lba48/short/zero
==13383==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 57 ahci-test /x86_64/ahci/io/dma/lba48/short/low
PASS 11 test-crypto-tlssession /qcrypto/tlssession/wildcard2
==13389==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 12 test-crypto-tlssession /qcrypto/tlssession/wildcard3
PASS 13 test-crypto-tlssession /qcrypto/tlssession/wildcard4
PASS 58 ahci-test /x86_64/ahci/io/dma/lba48/short/high
PASS 14 test-crypto-tlssession /qcrypto/tlssession/wildcard5
==13395==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 15 test-crypto-tlssession /qcrypto/tlssession/wildcard6
PASS 59 ahci-test /x86_64/ahci/io/ncq/simple
PASS 16 test-crypto-tlssession /qcrypto/tlssession/cachain
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-qga -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qga" 
==13401==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 60 ahci-test /x86_64/ahci/io/ncq/retry
PASS 1 test-qga /qga/sync-delimited
PASS 2 test-qga /qga/sync
---
PASS 6 test-qga /qga/get-vcpus
PASS 7 test-qga /qga/get-fsinfo
PASS 8 test-qga /qga/get-memory-block-info
==13413==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 9 test-qga /qga/get-memory-blocks
PASS 10 test-qga /qga/file-ops
PASS 11 test-qga /qga/file-write-read
---
PASS 16 test-qga /qga/invalid-args
PASS 17 test-qga /qga/fsfreeze-status
PASS 61 ahci-test /x86_64/ahci/flush/simple
==13420==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 18 test-qga /qga/blacklist
PASS 19 test-qga /qga/config
PASS 20 test-qga /qga/guest-exec
---
PASS 24 test-qga /qga/guest-get-timezone
PASS 25 test-qga /qga/guest-get-users
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-timed-average -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-timed-average" 
==13433==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-timed-average /timed-average/average
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-util-filemonitor -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-util-filemonitor" 
PASS 1 test-util-filemonitor /util/filemonitor
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-util-sockets -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-util-sockets" 
==13447==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-util-sockets /util/socket/is-socket/bad
PASS 2 test-util-sockets /util/socket/is-socket/good
PASS 3 test-util-sockets /socket/fd-pass/name/good
---
PASS 3 test-io-channel-command /io/channel/command/echo/sync
PASS 4 test-io-channel-command /io/channel/command/echo/async
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-io-channel-buffer -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-io-channel-buffer" 
==13545==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-io-channel-buffer /io/channel/buf
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-base64 -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-base64" 
PASS 1 test-base64 /util/base64/good
---
PASS 8 test-crypto-ivgen /crypto/ivgen/essiv/1f2e3d4c
PASS 9 test-crypto-ivgen /crypto/ivgen/essiv/1f2e3d4c5b6a7988
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-crypto-afsplit -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-afsplit" 
==13570==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-crypto-afsplit /crypto/afsplit/sha256/5
PASS 2 test-crypto-afsplit /crypto/afsplit/sha256/5000
PASS 3 test-crypto-afsplit /crypto/afsplit/sha256/big
---
PASS 1 test-logging /logging/parse_range
PASS 2 test-logging /logging/parse_path
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-replication -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-replication" 
==13602==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-replication /replication/primary/read
PASS 2 test-replication /replication/primary/write
PASS 64 ahci-test /x86_64/ahci/migrate/sanity
---
PASS 4 test-replication /replication/primary/stop
PASS 5 test-replication /replication/primary/do_checkpoint
PASS 6 test-replication /replication/primary/get_error_all
==13608==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 7 test-replication /replication/secondary/read
==13613==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 8 test-replication /replication/secondary/write
PASS 65 ahci-test /x86_64/ahci/migrate/dma/simple
==13622==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13627==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 66 ahci-test /x86_64/ahci/migrate/dma/halted
==13636==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13602==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe98ac5000; bottom 0x7f76087fc000; size: 0x0088902c9000 (586534391808)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
==13642==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 9 test-replication /replication/secondary/start
PASS 67 ahci-test /x86_64/ahci/migrate/ncq/simple
==13681==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13686==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 10 test-replication /replication/secondary/stop
PASS 68 ahci-test /x86_64/ahci/migrate/ncq/halted
==13695==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 69 ahci-test /x86_64/ahci/cdrom/eject
==13700==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 11 test-replication /replication/secondary/do_checkpoint
PASS 70 ahci-test /x86_64/ahci/cdrom/dma/single
==13706==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 12 test-replication /replication/secondary/get_error_all
PASS 71 ahci-test /x86_64/ahci/cdrom/dma/multi
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-bufferiszero -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bufferiszero" 
==13713==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 72 ahci-test /x86_64/ahci/cdrom/pio/single
==13722==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13722==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffcce640000; bottom 0x7f9001ffe000; size: 0x006ccc642000 (467285581824)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 73 ahci-test /x86_64/ahci/cdrom/pio/multi
==13728==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 74 ahci-test /x86_64/ahci/cdrom/pio/bcl
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/hd-geo-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="hd-geo-test" 
PASS 1 hd-geo-test /x86_64/hd-geo/ide/none
==13742==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 hd-geo-test /x86_64/hd-geo/ide/drive/cd_0
==13748==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 hd-geo-test /x86_64/hd-geo/ide/drive/mbr/blank
==13754==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 hd-geo-test /x86_64/hd-geo/ide/drive/mbr/lba
==13760==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 5 hd-geo-test /x86_64/hd-geo/ide/drive/mbr/chs
==13766==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 6 hd-geo-test /x86_64/hd-geo/ide/device/mbr/blank
==13772==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 7 hd-geo-test /x86_64/hd-geo/ide/device/mbr/lba
==13778==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 8 hd-geo-test /x86_64/hd-geo/ide/device/mbr/chs
==13784==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 9 hd-geo-test /x86_64/hd-geo/ide/device/user/chs
==13789==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 10 hd-geo-test /x86_64/hd-geo/ide/device/user/chst
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/boot-order-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="boot-order-test" 
PASS 1 boot-order-test /x86_64/boot-order/pc
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==13857==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!

Looking for expected file 'tests/data/acpi/pc/FACP'
Using expected file 'tests/data/acpi/pc/FACP'
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==13863==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-bufferiszero /cutils/bufferiszero
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-uuid -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-uuid" 
PASS 1 test-uuid /uuid/is_null
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==13886==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!

Looking for expected file 'tests/data/acpi/pc/FACP.bridge'
Looking for expected file 'tests/data/acpi/pc/FACP'
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==13892==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!

Looking for expected file 'tests/data/acpi/pc/FACP.ipmikcs'
Looking for expected file 'tests/data/acpi/pc/FACP'
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==13898==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!

Looking for expected file 'tests/data/acpi/pc/FACP.cphp'
Looking for expected file 'tests/data/acpi/pc/FACP'
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==13905==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!

Looking for expected file 'tests/data/acpi/pc/FACP.memhp'
Looking for expected file 'tests/data/acpi/pc/FACP'
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==13911==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!

Looking for expected file 'tests/data/acpi/pc/FACP.numamem'
Looking for expected file 'tests/data/acpi/pc/FACP'
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==13917==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!

Looking for expected file 'tests/data/acpi/pc/FACP.dimmpxm'
Looking for expected file 'tests/data/acpi/pc/FACP'
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==13926==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!

Looking for expected file 'tests/data/acpi/pc/FACP.acpihmat'
Looking for expected file 'tests/data/acpi/pc/FACP'
---
Looking for expected file 'tests/data/acpi/pc/SRAT.acpihmat'
Looking for expected file 'tests/data/acpi/pc/SRAT'
**
ERROR:/tmp/qemu-test/src/tests/bios-tables-test.c:327:load_expected_aml: assertion failed: (exp_sdt.aml_file)
ERROR - Bail out! ERROR:/tmp/qemu-test/src/tests/bios-tables-test.c:327:load_expected_aml: assertion failed: (exp_sdt.aml_file)
make: *** [/tmp/qemu-test/src/tests/Makefile.include:900: check-qtest-x86_64] Error 1
make: *** Waiting for unfinished jobs....
Traceback (most recent call last):


The full log is available at
http://patchew.org/logs/20190920074349.2616-1-tao3.xu@intel.com/testing.asan/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 05/11] numa: Extend CLI to provide initiator information for numa nodes
  2019-09-20  7:43 ` [PATCH v12 05/11] numa: Extend CLI to provide initiator information for numa nodes Tao Xu
@ 2019-09-30 11:25   ` Igor Mammedov
  0 siblings, 0 replies; 34+ messages in thread
From: Igor Mammedov @ 2019-09-30 11:25 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, jonathan.cameron,
	dan.j.williams

On Fri, 20 Sep 2019 15:43:43 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
> The initiator represents processor which access to memory. And in 5.2.27.3
> Memory Proximity Domain Attributes Structure, the attached initiator is
> defined as where the memory controller responsible for a memory proximity
> domain. With attached initiator information, the topology of heterogeneous
> memory can be described.
> 
> Extend CLI of "-numa node" option to indicate the initiator numa node-id.
> In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
> the platform's HMAT tables.
> 
> Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> Changes in v12:
>     - Fix the bug that a memory-only node without initiator setting
>       doesn't report error. (reported by Danmei Wei)
> 
> No changes in v11.
> 
> Changes in v10:
>     - Add machine oprion properties "-machine hmat=on|off" for enabling
>       or disabling HMAT in QEMU.
>     - Add more description for initiator option.
>     - Report error then HMAT is enable and initiator option is missing.
>       Not allow invaild initiator now. (Igor)
> ---
>  hw/core/machine.c     | 71 +++++++++++++++++++++++++++++++++++++++++++
>  hw/core/numa.c        | 24 +++++++++++++++
>  include/sysemu/numa.h |  6 ++++
>  qapi/machine.json     | 10 +++++-
>  qemu-options.hx       | 35 ++++++++++++++++++---
>  5 files changed, 140 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 1689ad3bf8..087baaf571 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -518,6 +518,20 @@ static void machine_set_nvdimm(Object *obj, bool value, Error **errp)
>      ms->nvdimms_state->is_enabled = value;
>  }
>  
> +static bool machine_get_hmat(Object *obj, Error **errp)
> +{
> +    MachineState *ms = MACHINE(obj);
> +
> +    return ms->numa_state->hmat_enabled;
> +}
> +
> +static void machine_set_hmat(Object *obj, bool value, Error **errp)
> +{
> +    MachineState *ms = MACHINE(obj);
> +
> +    ms->numa_state->hmat_enabled = value;
> +}
> +
>  static char *machine_get_nvdimm_persistence(Object *obj, Error **errp)
>  {
>      MachineState *ms = MACHINE(obj);
> @@ -645,6 +659,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
>                                 const CpuInstanceProperties *props, Error **errp)
>  {
>      MachineClass *mc = MACHINE_GET_CLASS(machine);
> +    NodeInfo *numa_info = machine->numa_state->nodes;
>      bool match = false;
>      int i;
>  
> @@ -714,6 +729,16 @@ void machine_set_cpu_numa_node(MachineState *machine,
>          match = true;
>          slot->props.node_id = props->node_id;
>          slot->props.has_node_id = props->has_node_id;
> +
> +        if (numa_info[props->node_id].initiator_valid &&
> +            (props->node_id != numa_info[props->node_id].initiator)) {
> +            error_setg(errp, "The initiator of CPU NUMA node %" PRId64
> +                       " should be itself.", props->node_id);
> +            return;
> +        }
> +        numa_info[props->node_id].initiator_valid = true;
> +        numa_info[props->node_id].has_cpu = true;
> +        numa_info[props->node_id].initiator = props->node_id;
>      }
>  
>      if (!match) {
> @@ -960,6 +985,13 @@ static void machine_initfn(Object *obj)
>  
>      if (mc->numa_mem_supported) {
>          ms->numa_state = g_new0(NumaState, 1);
> +        object_property_add_bool(obj, "hmat",
> +                                 machine_get_hmat, machine_set_hmat,
> +                                 &error_abort);
> +        object_property_set_description(obj, "hmat",
> +                                        "Set on/off to enable/disable "
> +                                        "ACPI Heterogeneous Memory Attribute "
> +                                        "Table (HMAT)", NULL);
>      }
>  
>      /* Register notifier when init is done for sysbus sanity checks */
> @@ -1048,6 +1080,40 @@ static char *cpu_slot_to_string(const CPUArchId *cpu)
>      return g_string_free(s, false);
>  }
>  
> +static void numa_validate_initiator(NumaState *nstat)
> +{
> +    int i;
> +    NodeInfo *numa_info = nstat->nodes;
> +
> +    for (i = 0; i < nstat->num_nodes; i++) {
> +        if (numa_info[i].initiator == MAX_NODES) {
> +            error_report("The initiator of NUMA node %d is missing, use "
> +                         "'-numa node,initiator' option to declare it.", i);
> +            goto err;
> +        }
> +
> +        if (!numa_info[numa_info[i].initiator].present) {
> +            error_report("NUMA node %" PRIu16 " is missing, use "
> +                         "'-numa node' option to declare it first.",
> +                         numa_info[i].initiator);
> +            goto err;
> +        }
> +
> +        if (numa_info[numa_info[i].initiator].has_cpu) {
> +            numa_info[i].initiator_valid = true;
> +        } else {
> +            error_report("The initiator of NUMA node %d is invalid.", i);
> +            goto err;
> +        }
> +    }
> +
> +    return;
> +
> +err:
> +    error_printf("\n");
> +    exit(1);
> +}
> +
>  static void machine_numa_finish_cpu_init(MachineState *machine)
>  {
>      int i;
> @@ -1088,6 +1154,11 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
>              machine_set_cpu_numa_node(machine, &props, &error_fatal);
>          }
>      }
> +
> +    if (machine->numa_state->hmat_enabled) {
> +        numa_validate_initiator(machine->numa_state);
> +    }
> +
>      if (s->len && !qtest_enabled()) {
>          warn_report("CPU(s) not present in any NUMA nodes: %s",
>                      s->str);
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index 4dfec5c95b..eff5491f6f 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -133,6 +133,30 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>          numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
>          numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
>      }
> +
I'd put here
   /* ... */
   numa_info[nodenr].initiator = MAX_NODES;

and drop "else if" below, so initiator would be always invalid unless it was set.
If you do that then you probably can get rid of "initiator_valid" field
since "initiator < MAX_NODES" can do to the same


> +    if (node->has_initiator) {
> +        if (!ms->numa_state->hmat_enabled) {
> +            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
> +                       "(HMAT) is disabled, use -machine hmat=on before "
> +                       "set initiator of NUMA");
> +            return;
> +        }
> +
> +        if (node->initiator >= MAX_NODES) {
> +            error_report("The initiator id %" PRIu16 " expects an integer "
> +                         "between 0 and %d", node->initiator,
> +                         MAX_NODES - 1);
> +            return;
> +        }
> +
> +        numa_info[nodenr].initiator = node->initiator;
> +    } else if (ms->numa_state->hmat_enabled) {
> +        /*
> +         * If not set the initiator, set it to MAX_NODES. And if
> +         * HMAT is enabled and this node has no cpus, QEMU will raise error.
> +         */
> +        numa_info[nodenr].initiator = MAX_NODES;
> +    }
>      numa_info[nodenr].present = true;
>      max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
>      ms->numa_state->num_nodes++;
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index ae9c41d02b..a788c3b126 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -18,6 +18,9 @@ struct NodeInfo {
>      uint64_t node_mem;
>      struct HostMemoryBackend *node_memdev;
>      bool present;
> +    bool has_cpu;
> +    bool initiator_valid;
> +    uint16_t initiator;
>      uint8_t distance[MAX_NODES];
>  };
>  
> @@ -33,6 +36,9 @@ struct NumaState {
>      /* Allow setting NUMA distance for different NUMA nodes */
>      bool have_numa_distance;
>  
> +    /* Detect if HMAT support is enabled. */
> +    bool hmat_enabled;
> +
>      /* NUMA nodes information */
>      NodeInfo nodes[MAX_NODES];
>  };
> diff --git a/qapi/machine.json b/qapi/machine.json
> index ca26779f1a..3c2914cd1c 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -463,6 +463,13 @@
>  # @memdev: memory backend object.  If specified for one node,
>  #          it must be specified for all nodes.
>  #
> +# @initiator: defined in ACPI 6.3 Chapter 5.2.27.3 Table 5-145,
> +#             indicate the nodeid which has the memory controller
s/indicate/indicates/
or even better 'points to'


> +#             responsible for this NUMA node. This field provides
> +#             additional information as to the initiator node that
> +#             is closest (as in directly attached) to this node, and
> +#             therefore has the best performance (since 4.2)
> +#
>  # Since: 2.1
>  ##
>  { 'struct': 'NumaNodeOptions',
> @@ -470,7 +477,8 @@
>     '*nodeid': 'uint16',
>     '*cpus':   ['uint16'],
>     '*mem':    'size',
> -   '*memdev': 'str' }}
> +   '*memdev': 'str',
> +   '*initiator': 'uint16' }}
>  
>  ##
>  # @NumaDistOptions:
> diff --git a/qemu-options.hx b/qemu-options.hx
> index bbfd936d29..74ccc4d782 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -43,7 +43,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>      "                suppress-vmdesc=on|off disables self-describing migration (default=off)\n"
>      "                nvdimm=on|off controls NVDIMM support (default=off)\n"
>      "                enforce-config-section=on|off enforce configuration section migration (default=off)\n"
> -    "                memory-encryption=@var{} memory encryption object to use (default=none)\n",
> +    "                memory-encryption=@var{} memory encryption object to use (default=none)\n"
> +    "                hmat=on|off controls ACPI HMAT support (default=off)\n",
>      QEMU_ARCH_ALL)
>  STEXI
>  @item -machine [type=]@var{name}[,prop=@var{value}[,...]]
> @@ -103,6 +104,9 @@ NOTE: this parameter is deprecated. Please use @option{-global}
>  @option{migration.send-configuration}=@var{on|off} instead.
>  @item memory-encryption=@var{}
>  Memory encryption object to use. The default is none.
> +@item hmat=on|off
> +Enables or disables ACPI Heterogeneous Memory Attribute Table (HMAT) support.
> +The default is off.
>  @end table
>  ETEXI
>  
> @@ -161,14 +165,14 @@ If any on the three values is given, the total number of CPUs @var{n} can be omi
>  ETEXI
>  
>  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> -    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> -    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> +    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
> +    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>      "-numa dist,src=source,dst=destination,val=distance\n"
>      "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
>      QEMU_ARCH_ALL)
>  STEXI
> -@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> -@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> +@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
> +@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>  @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
>  @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
>  @findex -numa
> @@ -215,6 +219,27 @@ split equally between them.
>  @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore,
>  if one node uses @samp{memdev}, all of them have to use it.
>  
> +@samp{initiator} is an additional option indicate the @var{initiator}
s/indicate/that point to/

s/the/an/


> +NUMA that has best performance (the lowest latency or largest bandwidth)
s/NUMA/NUMA node/

> +to this NUMA @var{node}. Note that this option can be set only when
> +the machine oprion properties "-machine hmat=on".
I'd write it as:
 the machine property 'hmat' is set to 'on'

> +
> +Following example creates a machine with 2 NUMA nodes, node 0 has CPU.
> +node 1 has only memory, and its' initiator is node 0. Note that because
s/its' initiator/its initiator node/

> +node 0 has CPU, by default the initiator of node 0 is itself and must be
> +itself.
> +@example
> +-machine hmat=on \
> +-m 2G,slots=2,maxmem=4G \
> +-object memory-backend-ram,size=1G,id=m0 \
> +-object memory-backend-ram,size=1G,id=m1 \
> +-numa node,nodeid=0,memdev=m0 \
> +-numa node,nodeid=1,memdev=m1,initiator=0 \
> +-smp 2,sockets=2,maxcpus=2  \
> +-numa cpu,node-id=0,socket-id=0 \
> +-numa cpu,node-id=0,socket-id=1
> +@end example
> +
>  @var{source} and @var{destination} are NUMA node IDs.
>  @var{distance} is the NUMA distance from @var{source} to @var{destination}.
>  The distance from a node to itself is always 10. If any pair of nodes is



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 06/11] numa: Extend CLI to provide memory latency and bandwidth information
  2019-09-20  7:43 ` [PATCH v12 06/11] numa: Extend CLI to provide memory latency and bandwidth information Tao Xu
@ 2019-10-02 15:16   ` Igor Mammedov
  2019-10-09  6:39     ` Tao Xu
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2019-10-02 15:16 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, jonathan.cameron,
	dan.j.williams

On Fri, 20 Sep 2019 15:43:44 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> From: Liu Jingqi <jingqi.liu@intel.com>
> 
> Add -numa hmat-lb option to provide System Locality Latency and
> Bandwidth Information. These memory attributes help to build
> System Locality Latency and Bandwidth Information Structure(s)
> in ACPI Heterogeneous Memory Attribute Table (HMAT).
> 
> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> No changes in v12.
> 
> Changes in v11:
>     - Move numa option patches forward.
>     - Add num_initiator in Numa_state to record the number of
>       initiators.
>     - Simplify struct HMAT_LB_Info, use uint64_t array to store data.
>     - Drop hmat_get_base().
> 
> Changes in v10:
>     - use new builtin type 'time' as qapi input.
> ---
>  hw/core/numa.c        | 114 ++++++++++++++++++++++++++++++++++++++++++
>  include/sysemu/numa.h |  44 ++++++++++++++++
>  qapi/machine.json     |  95 ++++++++++++++++++++++++++++++++++-
>  qemu-options.hx       |  49 +++++++++++++++++-
>  4 files changed, 299 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index eff5491f6f..f5a1c9e909 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -199,6 +199,100 @@ void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
>      ms->numa_state->have_numa_distance = true;
>  }
>  
> +void parse_numa_hmat_lb(NumaState *nstat, NumaHmatLBOptions *node,
> +                        Error **errp)
> +{
> +    int i;

> +    int init = node->initiator;
> +    int targ = node->target;
above acronyms are vague, it would be better to use node->initiator/target
as is within this function.

> +    int nb_nodes = nstat->num_nodes;
you are using both local var and  argument within the same function,
pls be consistent and use only one of them for consistency.

> +    NodeInfo *numa_info = nstat->nodes;
> +    HMAT_LB_Info *hmat_lb = nstat->hmat_lb[node->hierarchy][node->data_type];
> +
> +    /* Error checking */
> +    if (init >= nb_nodes) {
> +        error_setg(errp, "Invalid initiator=%d, it should be less than %d.",
> +                   init, nb_nodes);
> +        return;
> +    }
> +    if (targ >= nb_nodes) {
> +        error_setg(errp, "Invalid target=%d, it should be less than %d.",
> +                   targ, nb_nodes);
> +        return;
> +    }
> +    if (!numa_info[init].has_cpu) {
> +        error_setg(errp, "Invalid initiator=%d, it isn't an "
> +                   "initiator proximity domain.", init);
> +        return;
> +    }
> +    if (!numa_info[targ].present) {
> +        error_setg(errp, "Invalid target=%d, it hasn't a valid NUMA node.",
> +                   targ);
> +        return;
> +    }
> +
> +    /* HMAT latency and bandwidth data initialization */
> +    if (nstat->num_initiator == 0) {
> +        for (i = 0; i < nstat->num_nodes; i++) {
> +            if (numa_info[i].has_cpu) {
> +                nstat->num_initiator++;
> +            }
> +        }
> +    }
> +
> +    if (!hmat_lb) {
> +        int size = nstat->num_initiator * nb_nodes * sizeof(uint64_t);
> +        hmat_lb = g_malloc0(sizeof(*hmat_lb));
> +        nstat->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
> +        hmat_lb->latency = g_malloc0(size);
> +        hmat_lb->bandwidth = g_malloc0(size);

during CLI parsing  nb_nodes and nstat->num_initiator would be a moving target
(
ex possible CLI that would break your code:
 -numa node,nodeid=0 -numa hmat-lb,initiator=0... -numa node,nodeid=1)
}

and I don't see a nice way to enforce options order on CLI in this case.

Instead of manually calculating sizes and keeping num_initiator in state,
you could drop num_initiator field and reuse GArray structure which will
do allocation mgmt for you and won't be affected by options ordering.

and then later 9/11 in hmat_build_table_structs()
you can calculate num_initiator at the same time you fill in initiator_list[]
or if initiator_list were GArray, it would be calculated for you automatically.


> +    }
> +    hmat_lb->hierarchy = node->hierarchy;
> +    hmat_lb->data_type = node->data_type;
> +
> +    /* Input latency data */
> +    if (node->data_type <= HMATLB_DATA_TYPE_WRITE_LATENCY) {
> +        if (!node->has_latency) {
> +            error_setg(errp, "Missing 'latency' option.");
> +            return;
> +        }
> +        if (node->has_bandwidth) {
> +            error_setg(errp, "Invalid option 'bandwidth' since "
> +                       "the data type is latency.");
> +            return;
> +        }
> +        if (hmat_lb->latency[init * nb_nodes + targ]) {
> +            error_setg(errp, "Duplicate configuration of the latency for "
> +                        "initiator=%d and target=%d.", init, targ);
> +            return;
> +        }
> +
> +        hmat_lb->latency[init * nb_nodes + targ] = node->latency;
> +    }
> +
> +    /* Input bandwidth data */
> +    if (node->data_type >= HMATLB_DATA_TYPE_ACCESS_BANDWIDTH) {
> +        if (!node->has_bandwidth) {
> +            error_setg(errp, "Missing 'bandwidth' option.");
> +            return;
> +        }
> +        if (node->has_latency) {
> +            error_setg(errp, "Invalid option 'latency' since "
> +                       "the data type is bandwidth.");
> +            return;
> +        }
> +        if (hmat_lb->bandwidth[init * nb_nodes + targ]) {
> +            error_setg(errp, "Duplicate configuration of the bandwidth for "
> +                        "initiator=%d and target=%d.", init, targ);
> +            return;
> +        }
> +
> +        /* Convert Byte to Megabyte */
> +        hmat_lb->bandwidth[init * nb_nodes + targ] =
> +            node->bandwidth / 1024 / 1024;

if node->bandwidth (size type) is not multiple of 1MB,
you will loose user provided value here.

check for that and error out instead of ignoring not suitable value

also replace "1024 / 1024" with MiB

> +    }
> +}
> +
>  void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>  {
>      Error *err = NULL;
> @@ -237,6 +331,19 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>          machine_set_cpu_numa_node(ms, qapi_NumaCpuOptions_base(&object->u.cpu),
>                                    &err);
>          break;
> +    case NUMA_OPTIONS_TYPE_HMAT_LB:
> +        if (!ms->numa_state->hmat_enabled) {
> +            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
> +                       "(HMAT) is disabled, use -machine hmat=on before "
s/use/enable it with/

> +                       "set initiator of NUMA");
s/set initiator of NUMA/using any of hmat specific options/

> +            return;
> +        }
> +
> +        parse_numa_hmat_lb(ms->numa_state, &object->u.hmat_lb, &err);
> +        if (err) {
> +            goto end;
> +        }
> +        break;
>      default:
>          abort();
>      }
> @@ -264,6 +371,13 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
>          qemu_strtosz_MiB(mem_str, NULL, &object->u.node.mem);
>      }
>  
> +    /* Set up suffix-less bandwidth as megabytes */
> +    if ((object->type == NUMA_OPTIONS_TYPE_HMAT_LB) &&
> +        object->u.hmat_lb.has_bandwidth) {
> +        const char *bw_str = qemu_opt_get(opts, "bandwidth");
> +        qemu_strtosz_MiB(bw_str, NULL, &object->u.hmat_lb.bandwidth);
> +    }
Don't do fixups on behalf of user, if user provided nonsense values
error out instead.

>      set_numa_options(ms, object, &err);
>  
>  end:
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index a788c3b126..876beaee22 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -14,6 +14,27 @@ struct CPUArchId;
>  #define NUMA_DISTANCE_MAX         254
>  #define NUMA_DISTANCE_UNREACHABLE 255
>  
> +/* the value of AcpiHmatLBInfo flags */
> +enum {
> +    HMAT_LB_MEM_MEMORY           = 0,
> +    HMAT_LB_MEM_CACHE_1ST_LEVEL  = 1,
> +    HMAT_LB_MEM_CACHE_2ND_LEVEL  = 2,
> +    HMAT_LB_MEM_CACHE_3RD_LEVEL  = 3,
> +};
> +
> +/* the value of AcpiHmatLBInfo data type */
> +enum {
> +    HMAT_LB_DATA_ACCESS_LATENCY   = 0,
> +    HMAT_LB_DATA_READ_LATENCY     = 1,
> +    HMAT_LB_DATA_WRITE_LATENCY    = 2,
> +    HMAT_LB_DATA_ACCESS_BANDWIDTH = 3,
> +    HMAT_LB_DATA_READ_BANDWIDTH   = 4,
> +    HMAT_LB_DATA_WRITE_BANDWIDTH  = 5,
> +};
> +
> +#define HMAT_LB_LEVELS    (HMAT_LB_MEM_CACHE_3RD_LEVEL + 1)
> +#define HMAT_LB_TYPES     (HMAT_LB_DATA_WRITE_BANDWIDTH + 1)
> +
>  struct NodeInfo {
>      uint64_t node_mem;
>      struct HostMemoryBackend *node_memdev;
> @@ -29,6 +50,21 @@ struct NumaNodeMem {
>      uint64_t node_plugged_mem;
>  };
>  
> +struct HMAT_LB_Info {
> +    /* Indicates it's memory or the specified level memory side cache. */
> +    uint8_t     hierarchy;
> +
> +    /* Present the type of data, access/read/write latency or bandwidth. */
> +    uint8_t     data_type;
> +
> +    /* Array to store the latencies */
specify units it's stored in

> +    uint64_t    *latency;
> +
> +    /* Array to store the bandwidthes */
ditto

> +    uint64_t    *bandwidth;
btw:

what was the reason for picking uint64_t for storing above values?

it seems in this patch you dumb down bandwidth to MB/s above but
store latency as is.

and then in 9/11 build_hmat_lb you divide that on 'base' units,
where are guaranties that value stored here will fit into 2 bytes
used in HMAT to store it in the table?

if this structure should store values in terms on HMAT table it should
probably use uint16_t and check that user provided value won't overflow
at the time of CLI parsing.

> +};
> +typedef struct HMAT_LB_Info HMAT_LB_Info;
> +
>  struct NumaState {
>      /* Number of NUMA nodes */
>      int num_nodes;
> @@ -39,13 +75,21 @@ struct NumaState {
>      /* Detect if HMAT support is enabled. */
>      bool hmat_enabled;
>  
> +    /* Number of Proximity Domains that can initiate memory access requests. */
> +    int num_initiator;
> +
>      /* NUMA nodes information */
>      NodeInfo nodes[MAX_NODES];
> +
> +    /* NUMA nodes HMAT Locality Latency and Bandwidth Information */
> +    HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES];
>  };
>  typedef struct NumaState NumaState;
>  
>  void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
>  void parse_numa_opts(MachineState *ms);
> +void parse_numa_hmat_lb(NumaState *nstat, NumaHmatLBOptions *node,
> +                        Error **errp);
>  void numa_complete_configuration(MachineState *ms);
>  void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
>  extern QemuOptsList qemu_numa_opts;
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 3c2914cd1c..b6019335e8 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -426,10 +426,12 @@
>  #
>  # @cpu: property based CPU(s) to node mapping (Since: 2.10)
>  #
> +# @hmat-lb: memory latency and bandwidth information (Since: 4.2)
> +#
>  # Since: 2.1
>  ##
>  { 'enum': 'NumaOptionsType',
> -  'data': [ 'node', 'dist', 'cpu' ] }
> +  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
>  
>  ##
>  # @NumaOptions:
> @@ -444,7 +446,8 @@
>    'data': {
>      'node': 'NumaNodeOptions',
>      'dist': 'NumaDistOptions',
> -    'cpu': 'NumaCpuOptions' }}
> +    'cpu': 'NumaCpuOptions',
> +    'hmat-lb': 'NumaHmatLBOptions' }}
>  
>  ##
>  # @NumaNodeOptions:
> @@ -557,6 +560,94 @@
>     'base': 'CpuInstanceProperties',
>     'data' : {} }
>  
> +##
> +# @HmatLBMemoryHierarchy:
> +#
> +# The memory hierarchy in the System Locality Latency
> +# and Bandwidth Information Structure of HMAT (Heterogeneous
> +# Memory Attribute Table)
> +#
> +# For more information of @HmatLBMemoryHierarchy see
> +# the chapter 5.2.27.4: Table 5-142: Field "Flags" of ACPI 6.3 spec.
> +#
> +# @memory: the structure represents the memory performance
> +#
> +# @first-level: first level memory of memory side cached memory
> +#
> +# @second-level: second level memory of memory side cached memory
> +#
> +# @third-level: third level memory of memory side cached memory
> +#
> +# Since: 4.2
> +##
> +{ 'enum': 'HmatLBMemoryHierarchy',
> +  'data': [ 'memory', 'first-level', 'second-level', 'third-level' ] }
> +
> +##
> +# @HmatLBDataType:
> +#
> +# Data type in the System Locality Latency
> +# and Bandwidth Information Structure of HMAT (Heterogeneous
> +# Memory Attribute Table)
> +#
> +# For more information of @HmatLBDataType see
> +# the chapter 5.2.27.4: Table 5-142:  Field "Data Type" of ACPI 6.3 spec.
> +#
> +# @access-latency: access latency (nanoseconds)
> +#
> +# @read-latency: read latency (nanoseconds)
> +#
> +# @write-latency: write latency (nanoseconds)
> +#
> +# @access-bandwidth: access bandwidth (MB/s)
> +#
> +# @read-bandwidth: read bandwidth (MB/s)
> +#
> +# @write-bandwidth: write bandwidth (MB/s)
> +#
> +# Since: 4.2
> +##
> +{ 'enum': 'HmatLBDataType',
> +  'data': [ 'access-latency', 'read-latency', 'write-latency',
> +            'access-bandwidth', 'read-bandwidth', 'write-bandwidth' ] }
> +
> +##
> +# @NumaHmatLBOptions:
> +#
> +# Set the system locality latency and bandwidth information
> +# between Initiator and Target proximity Domains.
> +#
> +# For more information of @NumaHmatLBOptions see
> +# the chapter 5.2.27.4: Table 5-142 of ACPI 6.3 spec.
> +#
> +# @initiator: the Initiator Proximity Domain.
> +#
> +# @target: the Target Proximity Domain.
> +#
> +# @hierarchy: the Memory Hierarchy. Indicates the performance
> +#             of memory or side cache.
> +#
> +# @data-type: presents the type of data, access/read/write
> +#             latency or hit latency.
> +#
> +# @latency: the value of latency from @initiator to @target proximity domain,
> +#           the latency units are "ps(picosecond)", "ns(nanosecond)" or
> +#           "us(microsecond)".
> +#
> +# @bandwidth: the value of bandwidth between @initiator and @target proximity
> +#             domain, the bandwidth units are "MB(/s)","GB(/s)" or "TB(/s)".
> +#
> +# Since: 4.2
> +##
> +{ 'struct': 'NumaHmatLBOptions',
> +    'data': {
> +    'initiator': 'uint16',
> +    'target': 'uint16',
> +    'hierarchy': 'HmatLBMemoryHierarchy',
> +    'data-type': 'HmatLBDataType',
> +    '*latency': 'time',
> +    '*bandwidth': 'size' }}
> +
>  ##
>  # @HostMemPolicy:
>  #
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 74ccc4d782..129da0cdc3 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -168,16 +168,19 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>      "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>      "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>      "-numa dist,src=source,dst=destination,val=distance\n"
> -    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
> +    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
> +    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
>      QEMU_ARCH_ALL)
>  STEXI
>  @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>  @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>  @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
>  @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
> +@itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{str},data-type=@var{str}[,latency=@var{lat}][,bandwidth=@var{bw}]
>  @findex -numa
>  Define a NUMA node and assign RAM and VCPUs to it.
>  Set the NUMA distance from a source node to a destination node.
> +Set the ACPI Heterogeneous Memory Attributes for the given nodes.
>  
>  Legacy VCPU assignment uses @samp{cpus} option where
>  @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
> @@ -256,6 +259,50 @@ specified resources, it just assigns existing resources to NUMA
>  nodes. This means that one still has to use the @option{-m},
>  @option{-smp} options to allocate RAM and VCPUs respectively.
>  
> +Use @samp{hmat-lb} to set System Locality Latency and Bandwidth Information
> +between initiator and target NUMA nodes in ACPI Heterogeneous Attribute Memory Table (HMAT).
> +Initiator NUMA node can create memory requests, usually including one or more processors.
> +Target NUMA node contains addressable memory.
> +
> +In @samp{hmat-lb} option, @var{node} are NUMA node IDs. @var{str} of 'hierarchy'
> +is the memory hierarchy of the target NUMA node: if @var{str} is 'memory', the structure
> +represents the memory performance; if @var{str} is 'first-level|second-level|third-level',
> +this structure represents aggregated performance of memory side caches for each domain.
> +@var{str} of 'data-type' is type of data represented by this structure instance:
> +if 'hierarchy' is 'memory', 'data-type' is 'access|read|write' latency(nanoseconds)
> +or 'access|read|write' bandwidth(MB/s) of the target memory; if 'hierarchy' is
> +'first-level|second-level|third-level', 'data-type' is 'access|read|write' hit latency
> +or 'access|read|write' hit bandwidth of the target memory side cache.
> +
> +@var{lat} of 'latency' is latency value, the possible value and units are
> +NUM[ps|ns|us] (picosecond|nanosecond|microsecond), the recommended unit is 'ns'. @var{bw}
> +is bandwidth value, the possible value and units are NUM[M|G|T], mean that
> +the bandwidth value are NUM MB/s, GB/s or TB/s. Note that max NUM is 65534,
> +if NUM is 0, means the corresponding latency or bandwidth information is not provided.
> +And if input numbers without any unit, the latency unit will be 'ps' and the bandwidth
> +will be MB/s.
> +
> +For example, the following option assigns NUMA node 0 and 1. Node 0 has 2 cpus and
> +a ram, node 1 has only a ram. The processors in node 0 access memory in node
> +0 with access-latency 5 nanoseconds, access-bandwidth is 200 MB/s;
> +The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
> +nanoseconds, access-bandwidth is 100 MB/s.
> +@example
> +-machine hmat=on \
> +-m 2G \
> +-object memory-backend-ram,size=1G,id=m0 \
> +-object memory-backend-ram,size=1G,id=m1 \
> +-smp 2 \
> +-numa node,nodeid=0,memdev=m0 \
> +-numa node,nodeid=1,memdev=m1,initiator=0 \
> +-numa cpu,node-id=0,socket-id=0 \
> +-numa cpu,node-id=0,socket-id=1 \
> +-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
> +-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M
> +@end example
> +
>  ETEXI
>  
>  DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 07/11] numa: Extend CLI to provide memory side cache information
  2019-09-20  7:43 ` [PATCH v12 07/11] numa: Extend CLI to provide memory side cache information Tao Xu
@ 2019-10-03 11:19   ` Igor Mammedov
  2019-10-09  7:54     ` Tao Xu
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2019-10-03 11:19 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, Daniel Black,
	jonathan.cameron, dan.j.williams

On Fri, 20 Sep 2019 15:43:45 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> From: Liu Jingqi <jingqi.liu@intel.com>
> 
> Add -numa hmat-cache option to provide Memory Side Cache Information.
> These memory attributes help to build Memory Side Cache Information
> Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT).
> 
> Reviewed-by: Daniel Black <daniel@linux.ibm.com>
> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> No changes in v12.
> 
> Changes in v11:
>     - Move numa option patches forward.
> ---
>  hw/core/numa.c        | 74 +++++++++++++++++++++++++++++++++++++++
>  include/sysemu/numa.h | 31 +++++++++++++++++
>  qapi/machine.json     | 81 +++++++++++++++++++++++++++++++++++++++++--
>  qemu-options.hx       | 16 +++++++--
>  4 files changed, 198 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index f5a1c9e909..182e4d9d62 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -293,6 +293,67 @@ void parse_numa_hmat_lb(NumaState *nstat, NumaHmatLBOptions *node,
>      }
>  }
>  
> +void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
> +                           Error **errp)
> +{
> +    int nb_numa_nodes = ms->numa_state->num_nodes;
> +    HMAT_Cache_Info *hmat_cache = NULL;
> +
> +    if (node->node_id >= nb_numa_nodes) {
> +        error_setg(errp, "Invalid node-id=%" PRIu32
> +                   ", it should be less than %d.",
> +                   node->node_id, nb_numa_nodes);
> +        return;
> +    }
> +
> +    if (node->total > MAX_HMAT_CACHE_LEVEL) {
> +        error_setg(errp, "Invalid total=%" PRIu8
> +                   ", it should be less than or equal to %d.",
> +                   node->total, MAX_HMAT_CACHE_LEVEL);
> +        return;
> +    }
> +    if (node->level > node->total) {
> +        error_setg(errp, "Invalid level=%" PRIu8
> +                   ", it should be less than or equal to"
> +                   " total=%" PRIu8 ".",
> +                   node->level, node->total);
> +        return;
> +    }
> +    if (ms->numa_state->hmat_cache[node->node_id][node->level]) {
> +        error_setg(errp, "Duplicate configuration of the side cache for "
> +                   "node-id=%" PRIu32 " and level=%" PRIu8 ".",
> +                   node->node_id, node->level);
> +        return;
> +    }
> +
> +    if ((node->level > 1) &&
> +        ms->numa_state->hmat_cache[node->node_id][node->level - 1] &&
> +        (node->size >=
> +            ms->numa_state->hmat_cache[node->node_id][node->level - 1]->size)) {
> +        error_setg(errp, "Invalid size=0x%" PRIx64
> +                   ", the size of level=%" PRIu8
> +                   " should be less than the size(0x%" PRIx64
> +                   ") of level=%" PRIu8 ".",
> +                   node->size, node->level,
> +                   ms->numa_state->hmat_cache[node->node_id]
> +                                             [node->level - 1]->size,
> +                   node->level - 1);
> +        return;
> +    }
> +
> +    hmat_cache = g_malloc0(sizeof(*hmat_cache));
> +
> +    hmat_cache->mem_proximity = node->node_id;
> +    hmat_cache->size = node->size;
> +    hmat_cache->total_levels = node->total;
> +    hmat_cache->level = node->level;
> +    hmat_cache->associativity = node->assoc;
> +    hmat_cache->write_policy = node->policy;
> +    hmat_cache->line_size = node->line;
> +
> +    ms->numa_state->hmat_cache[node->node_id][node->level] = hmat_cache;
> +}
> +
>  void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>  {
>      Error *err = NULL;
> @@ -344,6 +405,19 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>              goto end;
>          }
>          break;
> +    case NUMA_OPTIONS_TYPE_HMAT_CACHE:
> +        if (!ms->numa_state->hmat_enabled) {
> +            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
> +                       "(HMAT) is disabled, use -machine hmat=on before "
> +                       "set initiator of NUMA");
> +            return;
the same as in 6/11 at similar place

> +        }
> +
> +        parse_numa_hmat_cache(ms, &object->u.hmat_cache, &err);
> +        if (err) {
> +            goto end;
> +        }
> +        break;
>      default:
>          abort();
>      }
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index 876beaee22..39312eefd4 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -35,6 +35,8 @@ enum {
>  #define HMAT_LB_LEVELS    (HMAT_LB_MEM_CACHE_3RD_LEVEL + 1)
>  #define HMAT_LB_TYPES     (HMAT_LB_DATA_WRITE_BANDWIDTH + 1)
>  
> +#define MAX_HMAT_CACHE_LEVEL        3

s/3/HMAT_LB_MEM_CACHE_3RD_LEVEL/


>  struct NodeInfo {
>      uint64_t node_mem;
>      struct HostMemoryBackend *node_memdev;
> @@ -65,6 +67,30 @@ struct HMAT_LB_Info {
>  };
>  typedef struct HMAT_LB_Info HMAT_LB_Info;
>  
> +struct HMAT_Cache_Info {
> +    /* The memory proximity domain to which the memory belongs. */
> +    uint32_t    mem_proximity;
mem prefix here is redundant

> +    /* Size of memory side cache in bytes. */
> +    uint64_t    size;
> +
> +    /* Total cache levels for this memory proximity domain. */
> +    uint8_t     total_levels;
> +
> +    /* Cache level described in this structure. */
> +    uint8_t     level;
> +
> +    /* Cache Associativity: None/Direct Mapped/Comple Cache Indexing */
> +    uint8_t     associativity;
> +
> +    /* Write Policy: None/Write Back(WB)/Write Through(WT) */
> +    uint8_t     write_policy;
> +
> +    /* Cache Line size in bytes. */
> +    uint16_t    line_size;
> +};
> +typedef struct HMAT_Cache_Info HMAT_Cache_Info;
> +
>  struct NumaState {
>      /* Number of NUMA nodes */
>      int num_nodes;
> @@ -83,6 +109,9 @@ struct NumaState {
>  
>      /* NUMA nodes HMAT Locality Latency and Bandwidth Information */
>      HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES];
> +
> +    /* Memory Side Cache Information Structure */
> +    HMAT_Cache_Info *hmat_cache[MAX_NODES][MAX_HMAT_CACHE_LEVEL + 1];
>  };
>  typedef struct NumaState NumaState;
>  
> @@ -90,6 +119,8 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
>  void parse_numa_opts(MachineState *ms);
>  void parse_numa_hmat_lb(NumaState *nstat, NumaHmatLBOptions *node,
>                          Error **errp);
> +void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
> +                           Error **errp);
>  void numa_complete_configuration(MachineState *ms);
>  void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
>  extern QemuOptsList qemu_numa_opts;
> diff --git a/qapi/machine.json b/qapi/machine.json
> index b6019335e8..088be81920 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -428,10 +428,12 @@
>  #
>  # @hmat-lb: memory latency and bandwidth information (Since: 4.2)
>  #
> +# @hmat-cache: memory side cache information (Since: 4.2)
> +#
>  # Since: 2.1
>  ##
>  { 'enum': 'NumaOptionsType',
> -  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
> +  'data': [ 'node', 'dist', 'cpu', 'hmat-lb', 'hmat-cache' ] }
>  
>  ##
>  # @NumaOptions:
> @@ -447,7 +449,8 @@
>      'node': 'NumaNodeOptions',
>      'dist': 'NumaDistOptions',
>      'cpu': 'NumaCpuOptions',
> -    'hmat-lb': 'NumaHmatLBOptions' }}
> +    'hmat-lb': 'NumaHmatLBOptions',
> +    'hmat-cache': 'NumaHmatCacheOptions' }}
>  
>  ##
>  # @NumaNodeOptions:
> @@ -648,6 +651,80 @@
>      '*latency': 'time',
>      '*bandwidth': 'size' }}
>  
> +##
> +# @HmatCacheAssociativity:
> +#
> +# Cache associativity in the Memory Side Cache
> +# Information Structure of HMAT
> +#
> +# For more information of @HmatCacheAssociativity see
> +# the chapter 5.2.27.5: Table 5-143 of ACPI 6.3 spec.
> +#
> +# @none: None
> +#
> +# @direct: Direct Mapped
> +#
> +# @complex: Complex Cache Indexing (implementation specific)
> +#
> +# Since: 4.2
> +##
> +{ 'enum': 'HmatCacheAssociativity',
> +  'data': [ 'none', 'direct', 'complex' ] }
> +
> +##
> +# @HmatCacheWritePolicy:
> +#
> +# Cache write policy in the Memory Side Cache
> +# Information Structure of HMAT
> +#
> +# For more information of @HmatCacheWritePolicy see
> +# the chapter 5.2.27.5: Table 5-143: Field "Cache Attributes" of ACPI 6.3 spec.
> +#
> +# @none: None
> +#
> +# @write-back: Write Back (WB)
> +#
> +# @write-through: Write Through (WT)
> +#
> +# Since: 4.2
> +##
> +{ 'enum': 'HmatCacheWritePolicy',
> +  'data': [ 'none', 'write-back', 'write-through' ] }
> +
> +##
> +# @NumaHmatCacheOptions:
> +#
> +# Set the memory side cache information for a given memory domain.
> +#
> +# For more information of @NumaHmatCacheOptions see
> +# the chapter 5.2.27.5: Table 5-143: Field "Cache Attributes" of ACPI 6.3 spec.
> +#
> +# @node-id: the memory proximity domain to which the memory belongs.
> +#
> +# @size: the size of memory side cache in bytes.
> +#
> +# @total: the total cache levels for this memory proximity domain.

Can we calculate this without making user to do it?

> +# @level: the cache level described in this structure.
> +#
> +# @assoc: the cache associativity, none/direct-mapped/complex(complex cache indexing).
> +#
> +# @policy: the write policy, none/write-back/write-through.
> +#
> +# @line: the cache Line size in bytes.
> +#
> +# Since: 4.2
> +##
> +{ 'struct': 'NumaHmatCacheOptions',
> +  'data': {
> +   'node-id': 'uint32',
> +   'size': 'size',
> +   'total': 'uint8',
> +   'level': 'uint8',
> +   'assoc': 'HmatCacheAssociativity',
> +   'policy': 'HmatCacheWritePolicy',
> +   'line': 'uint16' }}
> +
>  ##
>  # @HostMemPolicy:
>  #
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 129da0cdc3..7cf214a653 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -169,7 +169,8 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>      "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>      "-numa dist,src=source,dst=destination,val=distance\n"
>      "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
> -    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
> +    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n"
> +    "-numa hmat-cache,node-id=node,size=size,total=total,level=level[,assoc=none|direct|complex][,policy=none|write-back|write-through][,line=size]\n",
>      QEMU_ARCH_ALL)
>  STEXI
>  @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
> @@ -177,6 +178,7 @@ STEXI
>  @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
>  @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
>  @itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{str},data-type=@var{str}[,latency=@var{lat}][,bandwidth=@var{bw}]
> +@itemx -numa hmat-cache,node-id=@var{node},size=@var{size},total=@var{total},level=@var{level}[,assoc=@var{str}][,policy=@var{str}][,line=@var{size}]
>  @findex -numa
>  Define a NUMA node and assign RAM and VCPUs to it.
>  Set the NUMA distance from a source node to a destination node.
> @@ -282,11 +284,19 @@ if NUM is 0, means the corresponding latency or bandwidth information is not pro
>  And if input numbers without any unit, the latency unit will be 'ps' and the bandwidth
>  will be MB/s.
>  
> +In @samp{hmat-cache} option, @var{node-id} is the NUMA-id of the memory belongs.
> +@var{size} is the size of memory side cache in bytes. @var{total} is the total cache levels.
> +@var{level} is the cache level described in this structure. @var{assoc} is the cache associativity,
> +the possible value is 'none/direct(direct-mapped)/complex(complex cache indexing)'.
> +@var{policy} is the write policy. @var{line} is the cache Line size in bytes.
> +
>  For example, the following option assigns NUMA node 0 and 1. Node 0 has 2 cpus and
>  a ram, node 1 has only a ram. The processors in node 0 access memory in node
>  0 with access-latency 5 nanoseconds, access-bandwidth is 200 MB/s;
>  The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
>  nanoseconds, access-bandwidth is 100 MB/s.
> +And for memory side cache information, NUMA node 0 and 1 both have 1 level memory
> +cache, size is 0x20000 bytes, policy is write-back, the cache Line size is 8 bytes:
hex is not particularly user readable format, use decimal here and size suffixes
here and in the example below.

>  @example
>  -machine hmat=on \
>  -m 2G \
> @@ -300,7 +310,9 @@ nanoseconds, access-bandwidth is 100 MB/s.
>  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
>  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
>  -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
> --numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M
> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \
> +-numa hmat-cache,node-id=0,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8 \
> +-numa hmat-cache,node-id=1,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8
>  @end example
>  
>  ETEXI



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 08/11] hmat acpi: Build Memory Proximity Domain Attributes Structure(s)
  2019-09-20  7:43 ` [PATCH v12 08/11] hmat acpi: Build Memory Proximity Domain Attributes Structure(s) Tao Xu
@ 2019-10-03 13:44   ` Igor Mammedov
  0 siblings, 0 replies; 34+ messages in thread
From: Igor Mammedov @ 2019-10-03 13:44 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, Daniel Black,
	Jonathan Cameron, dan.j.williams

On Fri, 20 Sep 2019 15:43:46 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> From: Liu Jingqi <jingqi.liu@intel.com>
> 
> HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
> (HMAT). The specification references below link:
> http://www.uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf
> 
> It describes the memory attributes, such as memory side cache
> attributes and bandwidth and latency details, related to the
> Memory Proximity Domain. The software is
> expected to use this information as hint for optimization.
> 
> This structure describes Memory Proximity Domain Attributes by memory
> subsystem and its associativity with processor proximity domain as well as
> hint for memory usage.
> 
> In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
> the platform's HMAT tables.
> 
> Reviewed-by: Daniel Black <daniel@linux.ibm.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> No changes in v12.
> 
> Changes in v11:
>     - Move numa option patches forward.
> ---
>  hw/acpi/Kconfig       |   5 +++
>  hw/acpi/Makefile.objs |   1 +
>  hw/acpi/hmat.c        | 101 ++++++++++++++++++++++++++++++++++++++++++
>  hw/acpi/hmat.h        |  45 +++++++++++++++++++
>  hw/i386/acpi-build.c  |   5 +++
>  5 files changed, 157 insertions(+)
>  create mode 100644 hw/acpi/hmat.c
>  create mode 100644 hw/acpi/hmat.h
> 
> diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
> index 7c59cf900b..039bb99efa 100644
> --- a/hw/acpi/Kconfig
> +++ b/hw/acpi/Kconfig
> @@ -7,6 +7,7 @@ config ACPI_X86
>      select ACPI_NVDIMM
>      select ACPI_CPU_HOTPLUG
>      select ACPI_MEMORY_HOTPLUG
> +    select ACPI_HMAT
>  
>  config ACPI_X86_ICH
>      bool
> @@ -31,3 +32,7 @@ config ACPI_VMGENID
>      bool
>      default y
>      depends on PC
> +
> +config ACPI_HMAT
> +    bool
> +    depends on ACPI
> diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
> index 9bb2101e3b..c05019b059 100644
> --- a/hw/acpi/Makefile.objs
> +++ b/hw/acpi/Makefile.objs
> @@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
>  common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
>  common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
>  common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
> +common-obj-$(CONFIG_ACPI_HMAT) += hmat.o
>  common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
>  
>  common-obj-y += acpi_interface.o
> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
> new file mode 100644
> index 0000000000..1368fce7ee
> --- /dev/null
> +++ b/hw/acpi/hmat.c
> @@ -0,0 +1,101 @@
> +/*
> + * HMAT ACPI Implementation
> + *
> + * Copyright(C) 2019 Intel Corporation.
> + *
> + * Author:
> + *  Liu jingqi <jingqi.liu@linux.intel.com>
> + *  Tao Xu <tao3.xu@intel.com>
> + *
> + * HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
> + * (HMAT)
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>
> + */
> +
> +#include "qemu/osdep.h"
> +#include "sysemu/numa.h"
> +#include "hw/acpi/hmat.h"
> +
> +/*
> + * ACPI 6.3:
> + * 5.2.27.3 Memory Proximity Domain Attributes Structure: Table 5-145
> + */
> +static void build_hmat_mpda(GArray *table_data, uint16_t flags, int initiator,
> +                           int mem_node)
> +{
> +
> +    /* Memory Proximity Domain Attributes Structure */
> +    /* Type */
> +    build_append_int_noprefix(table_data, 0, 2);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 2);
> +    /* Length */
> +    build_append_int_noprefix(table_data, 40, 4);
> +    /* Flags */
> +    build_append_int_noprefix(table_data, flags, 2);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 2);
> +    /* Proximity Domain for the Attached Initiator */
> +    build_append_int_noprefix(table_data, initiator, 4);
                                             ^^^ make this argument uint16_t

> +    /* Proximity Domain for the Memory */
> +    build_append_int_noprefix(table_data, mem_node, 4);
                                             ^^^ ditto

> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 4);
> +    /*
> +     * Reserved:
> +     * Previously defined as the Start Address of the System Physical
> +     * Address Range. Deprecated since ACPI Spec 6.3.
> +     */
> +    build_append_int_noprefix(table_data, 0, 8);
> +    /*
> +     * Reserved:
> +     * Previously defined as the Range Length of the region in bytes.
> +     * Deprecated since ACPI Spec 6.3.
> +     */
> +    build_append_int_noprefix(table_data, 0, 8);
> +}
> +
> +/* Build HMAT sub table structures */
> +static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)

'nstat' is rather obscure, suggest replace it with numa_state in this patch/series

> +{
> +    uint16_t flags;
> +    int i;
> +
> +    for (i = 0; i < nstat->num_nodes; i++) {
> +        flags = 0;
> +
> +        if (nstat->nodes[i].initiator_valid) {
> +            flags |= HMAT_PROX_INIT_VALID;
> +        }
> +
> +        build_hmat_mpda(table_data, flags, nstat->nodes[i].initiator, i);
> +    }
> +}
> +
> +void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat)
> +{
> +    uint64_t hmat_start;
s/uint64_t/int/ and initialize it at the declaration site 

> +
> +    hmat_start = table_data->len;
> +
> +    /* reserve space for HMAT header  */
> +    acpi_data_push(table_data, 40);
> +
> +    hmat_build_table_structs(table_data, nstat);
> +
> +    build_header(linker, table_data,
> +                 (void *)(table_data->data + hmat_start),
> +                 "HMAT", table_data->len - hmat_start, 2, NULL, NULL);
> +}
> diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
> new file mode 100644
> index 0000000000..0c1839cf6f
> --- /dev/null
> +++ b/hw/acpi/hmat.h
> @@ -0,0 +1,45 @@
> +/*
> + * HMAT ACPI Implementation Header
> + *
> + * Copyright(C) 2019 Intel Corporation.
> + *
> + * Author:
> + *  Liu jingqi <jingqi.liu@linux.intel.com>
> + *  Tao Xu <tao3.xu@intel.com>
> + *
> + * HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
> + * (HMAT)
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>
> + */
> +
> +#ifndef HMAT_H
> +#define HMAT_H
> +
> +#include "hw/acpi/acpi-defs.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/bios-linker-loader.h"
> +#include "hw/acpi/aml-build.h"
Do you really need all of these headers for declaring build_hmat()?

> +
> +/*
> + * ACPI 6.3: 5.2.27.3 Memory Proximity Domain Attributes Structure,
> + * Table 5-145, Field "flag", Bit [0]: set to 1 to indicate that data in
> + * the Proximity Domain for the Attached Initiator field is valid.
> + * Other bits reserved.
> + */
> +#define HMAT_PROX_INIT_VALID 0x1

shortening INITIATOR to INIT makes it too vague, just use INITIATOR
same for PROX.
(applies to other places in this series with similar names)


> +void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat);
> +
> +#endif
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index e54e571a75..7f2e05f1a9 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -68,6 +68,7 @@
>  #include "hw/i386/intel_iommu.h"
>  
>  #include "hw/acpi/ipmi.h"
> +#include "hw/acpi/hmat.h"
>  
>  /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
>   * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
> @@ -2698,6 +2699,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>              acpi_add_table(table_offsets, tables_blob);
>              build_slit(tables_blob, tables->linker, machine);
>          }
> +        if (machine->numa_state->hmat_enabled) {
> +            acpi_add_table(table_offsets, tables_blob);
> +            build_hmat(tables_blob, tables->linker, machine->numa_state);
> +        }
>      }
>      if (acpi_get_mcfg(&mcfg)) {
>          acpi_add_table(table_offsets, tables_blob);



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 09/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)
  2019-09-20  7:43 ` [PATCH v12 09/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao Xu
@ 2019-10-03 14:41   ` Igor Mammedov
  2019-10-10  6:53     ` Tao Xu
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2019-10-03 14:41 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, jonathan.cameron,
	dan.j.williams

On Fri, 20 Sep 2019 15:43:47 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> From: Liu Jingqi <jingqi.liu@intel.com>
> 
> This structure describes the memory access latency and bandwidth
> information from various memory access initiator proximity domains.
> The latency and bandwidth numbers represented in this structure
> correspond to rated latency and bandwidth for the platform.
> The software could use this information as hint for optimization.
> 
> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> Changes in v12:
>     - Fix a bug that if HMAT is enabled and without hmat-lb setting,
>       QEMU will crash. (reported by Danmei Wei)
> 
> Changes in v11:
>     - Calculate base in build_hmat_lb().
> ---
>  hw/acpi/hmat.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  hw/acpi/hmat.h |   2 +
>  2 files changed, 127 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
> index 1368fce7ee..e7be849581 100644
> --- a/hw/acpi/hmat.c
> +++ b/hw/acpi/hmat.c
> @@ -27,6 +27,7 @@
>  #include "qemu/osdep.h"
>  #include "sysemu/numa.h"
>  #include "hw/acpi/hmat.h"
> +#include "qemu/error-report.h"
>  
>  /*
>   * ACPI 6.3:
> @@ -67,11 +68,105 @@ static void build_hmat_mpda(GArray *table_data, uint16_t flags, int initiator,
>      build_append_int_noprefix(table_data, 0, 8);
>  }
>  
> +static bool entry_overflow(uint64_t *lb_data, uint64_t base, int len)
> +{
> +    int i;
> +
> +    for (i = 0; i < len; i++) {
> +        if (lb_data[i] / base >= UINT16_MAX) {
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
I suggest to do this check at CLI parsing time

> +/*
> + * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
> + * Structure: Table 5-146
> + */
> +static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
> +                          uint32_t num_initiator, uint32_t num_target,
> +                          uint32_t *initiator_list, int type)
> +{
> +    uint8_t mask = 0x0f; 
> +    uint32_t s = num_initiator;
> +    uint32_t t = num_target;
drop this locals and use arguments directly

> +    uint64_t base = 1;
> +    uint64_t *lb_data;
> +    int i, unit;
> +
> +    /* Type */
> +    build_append_int_noprefix(table_data, 1, 2);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 2);
> +    /* Length */
> +    build_append_int_noprefix(table_data, 32 + 4 * s + 4 * t + 2 * s * t, 4);
                                             ^^^^
to me above looks like /dev/random output, absolutely unreadable.
Suggest to use local var (like: lb_length) for expression with comments
beside magic numbers.

> +    /* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
> +    build_append_int_noprefix(table_data, hmat_lb->hierarchy & mask, 1);

why do you need to use mask here?

> +    /* Data Type */
> +    build_append_int_noprefix(table_data, hmat_lb->data_type, 1);

Isn't hmat_lb->data_type and passed argument 'type' the same?


> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 2);
> +    /* Number of Initiator Proximity Domains (s) */
> +    build_append_int_noprefix(table_data, s, 4);
> +    /* Number of Target Proximity Domains (t) */
> +    build_append_int_noprefix(table_data, t, 4);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 4);
> +
> +    if (HMAT_IS_LATENCY(type)) {
> +        unit = 1000;
> +        lb_data = hmat_lb->latency;
> +    } else {
> +        unit = 1024;
> +        lb_data = hmat_lb->bandwidth;
> +    }
> +
> +    while (entry_overflow(lb_data, base, s * t)) {
> +        for (i = 0; i < s * t; i++) {
> +            if (!QEMU_IS_ALIGNED(lb_data[i], unit * base)) {
> +                error_report("Invalid latency/bandwidth input, all "
> +                "latencies/bandwidths should be specified in the same units.");
> +                exit(1);
> +            }
> +        }
> +        base *= unit;
> +    }
Can you clarify what you are trying to check here?

> +
> +    /* Entry Base Unit */
> +    build_append_int_noprefix(table_data, base, 8);
> +
> +    /* Initiator Proximity Domain List */
> +    for (i = 0; i < s; i++) {
> +        build_append_int_noprefix(table_data, initiator_list[i], 4);
> +    }
> +
> +    /* Target Proximity Domain List */
> +    for (i = 0; i < t; i++) {
> +        build_append_int_noprefix(table_data, i, 4);
> +    }
> +
> +    /* Latency or Bandwidth Entries */
> +    for (i = 0; i < s * t; i++) {
> +        uint16_t entry;
> +
> +        if (HMAT_IS_LATENCY(type)) {
drop if condition and reuse lb_data, that you've just initialized above


> +            entry = hmat_lb->latency[i] / base;
...
> +            entry = hmat_lb->bandwidth[i] / base;
I'm not sure that above is correct.
Pls clarify math behind above 2 expressions

> +        }
> +
> +        build_append_int_noprefix(table_data, entry, 2);
> +    }
> +}
> +
>  /* Build HMAT sub table structures */
>  static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
>  {
>      uint16_t flags;
> -    int i;
> +    uint32_t *initiator_list = NULL;
> +    int i, j, hrchy, type;
s/hrchy/hierarchy/

> +    HMAT_LB_Info *numa_hmat_lb;
>  
>      for (i = 0; i < nstat->num_nodes; i++) {
>          flags = 0;
> @@ -82,6 +177,35 @@ static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
>  
>          build_hmat_mpda(table_data, flags, nstat->nodes[i].initiator, i);
>      }
> +
> +    if (nstat->num_initiator) {
> +        initiator_list = g_malloc0(nstat->num_initiator * sizeof(uint32_t));
> +        for (i = 0, j = 0; i < nstat->num_nodes; i++) {
> +            if (nstat->nodes[i].has_cpu) {
> +                initiator_list[j] = i;
> +                j++;
> +            }
> +        }
> +    }
> +
> +    /*
> +     * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
> +     * Structure: Table 5-146
> +     */
> +    for (hrchy = HMAT_LB_MEM_MEMORY;
> +         hrchy <= HMAT_LB_MEM_CACHE_3RD_LEVEL; hrchy++) {
> +        for (type = HMAT_LB_DATA_ACCESS_LATENCY;
> +             type <= HMAT_LB_DATA_WRITE_BANDWIDTH; type++) {
> +            numa_hmat_lb = nstat->hmat_lb[hrchy][type];
> +
> +            if (numa_hmat_lb) {
> +                build_hmat_lb(table_data, numa_hmat_lb, nstat->num_initiator,
> +                              nstat->num_nodes, initiator_list, type);
> +            }
> +        }
> +    }
> +
> +    g_free(initiator_list);
>  }
>  
>  void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat)
> diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
> index 0c1839cf6f..1154dfb48e 100644
> --- a/hw/acpi/hmat.h
> +++ b/hw/acpi/hmat.h
> @@ -40,6 +40,8 @@
>   */
>  #define HMAT_PROX_INIT_VALID 0x1
>  
> +#define HMAT_IS_LATENCY(type) (type <= HMAT_LB_DATA_WRITE_LATENCY)

it's not worth to create macro for 1-off calculation, just drop it
and s/if (HMAT_IS_LATENCY(type))/if(type <= HMAT_LB_DATA_WRITE_LATENCY)/

> +
>  void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat);
>  
>  #endif



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 10/11] hmat acpi: Build Memory Side Cache Information Structure(s)
  2019-09-20  7:43 ` [PATCH v12 10/11] hmat acpi: Build Memory Side Cache " Tao Xu
@ 2019-10-04  8:01   ` Igor Mammedov
  0 siblings, 0 replies; 34+ messages in thread
From: Igor Mammedov @ 2019-10-04  8:01 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, Daniel Black,
	Jonathan Cameron, dan.j.williams

On Fri, 20 Sep 2019 15:43:48 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> From: Liu Jingqi <jingqi.liu@intel.com>
> 
> This structure describes memory side cache information for memory
> proximity domains if the memory side cache is present and the
> physical device forms the memory side cache.
> The software could use this information to effectively place
> the data in memory to maximize the performance of the system
> memory that use the memory side cache.
> 
> Reviewed-by: Daniel Black <daniel@linux.ibm.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> No changes in v12.
> 
> Changes in v11:
>     - Move numa option patches forward.
> ---
>  hw/acpi/hmat.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 63 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
> index e7be849581..6b260eeef5 100644
> --- a/hw/acpi/hmat.c
> +++ b/hw/acpi/hmat.c
> @@ -160,13 +160,62 @@ static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
>      }
>  }
>  
> +/* ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure: Table 5-147 */
> +static void build_hmat_cache(GArray *table_data, HMAT_Cache_Info *hmat_cache)
> +{
> +    /*
> +     * Cache Attributes: Bits [3:0] – Total Cache Levels
> +     * for this Memory Proximity Domain
> +     */
> +    uint32_t cache_attr = hmat_cache->total_levels & 0xF;
> +
> +    /* Bits [7:4] : Cache Level described in this structure */
> +    cache_attr |= (hmat_cache->level & 0xF) << 4;


> +    /* Bits [11:8] - Cache Associativity */
> +    cache_attr |= (hmat_cache->associativity & 0xF) << 8;
> +
> +    /* Bits [15:12] - Write Policy */
> +    cache_attr |= (hmat_cache->write_policy & 0xF) << 12;

s/0xF/0x7/ for  Cache Associativity /  Write Policy

> +
> +    /* Bits [31:16] - Cache Line size in bytes */
> +    cache_attr |= (hmat_cache->line_size & 0xFFFF) << 16;
> +
> +    cache_attr = cpu_to_le32(cache_attr);
> +
> +    /* Type */
> +    build_append_int_noprefix(table_data, 2, 2);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 2);
> +    /* Length */
> +    build_append_int_noprefix(table_data, 32, 4);
> +    /* Proximity Domain for the Memory */
> +    build_append_int_noprefix(table_data, hmat_cache->mem_proximity, 4);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 4);
> +    /* Memory Side Cache Size */
> +    build_append_int_noprefix(table_data, hmat_cache->size, 8);
> +    /* Cache Attributes */
> +    build_append_int_noprefix(table_data, cache_attr, 4);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 2);
> +    /*
> +     * Number of SMBIOS handles (n)
> +     * Linux kernel uses Memory Side Cache Information Structure
> +     * without SMBIOS entries for now, so set Number of SMBIOS handles
> +     * as 0.
> +     */
> +    build_append_int_noprefix(table_data, 0, 2);
> +}
> +
>  /* Build HMAT sub table structures */
>  static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
>  {
>      uint16_t flags;
>      uint32_t *initiator_list = NULL;
> -    int i, j, hrchy, type;
> +    int i, j, hrchy, type, level;

s/level/cache_level/

>      HMAT_LB_Info *numa_hmat_lb;
> +    HMAT_Cache_Info *numa_hmat_cache;
>  
>      for (i = 0; i < nstat->num_nodes; i++) {
>          flags = 0;
> @@ -205,6 +254,19 @@ static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
>          }
>      }
>  
> +    /*
> +     * ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure:
> +     * Table 5-147
> +     */
> +    for (i = 0; i < nstat->num_nodes; i++) {
> +        for (level = 0; level <= MAX_HMAT_CACHE_LEVEL; level++) {
> +            numa_hmat_cache = nstat->hmat_cache[i][level];
> +            if (numa_hmat_cache) {
> +                build_hmat_cache(table_data, numa_hmat_cache);
> +            }
> +        }
> +    }
> +
>      g_free(initiator_list);
>  }
>  



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 11/11] tests/bios-tables-test: add test cases for ACPI HMAT
  2019-09-20  7:43 ` [PATCH v12 11/11] tests/bios-tables-test: add test cases for ACPI HMAT Tao Xu
@ 2019-10-04  8:08   ` Igor Mammedov
  0 siblings, 0 replies; 34+ messages in thread
From: Igor Mammedov @ 2019-10-04  8:08 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, Jingqi Liu, fan.du, qemu-devel, Daniel Black,
	jonathan.cameron, dan.j.williams

On Fri, 20 Sep 2019 15:43:49 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> ACPI table HMAT has been introduced, QEMU now builds HMAT tables for
> Heterogeneous Memory with boot option '-numa node'.
> 
> Add test cases on PC and Q35 machines with 2 numa nodes.
> Because HMAT is generated when system enable numa, the
> following tables need to be added for this test:
>   tests/acpi-test-data/pc/*.acpihmat
>   tests/acpi-test-data/pc/HMAT.*
>   tests/acpi-test-data/q35/*.acpihmat
>   tests/acpi-test-data/q35/HMAT.*
> 
> Reviewed-by: Daniel Black <daniel@linux.ibm.com>
> Reviewed-by: Jingqi Liu <Jingqi.liu@intel.com>
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> No changes in V11 and v12.
> 
> Changes in v10:
>     - Update test case, add "-machine hmat=on"
> ---
>  tests/bios-tables-test.c | 44 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
> index 9b3d8b0d1b..976788b6fa 100644
> --- a/tests/bios-tables-test.c
> +++ b/tests/bios-tables-test.c
> @@ -870,6 +870,48 @@ static void test_acpi_piix4_tcg_dimm_pxm(void)
>      test_acpi_tcg_dimm_pxm(MACHINE_PC);
>  }
>  
> +static void test_acpi_tcg_acpi_hmat(const char *machine)
> +{
> +    test_data data;
> +
> +    memset(&data, 0, sizeof(data));
> +    data.machine = machine;
> +    data.variant = ".acpihmat";
> +    test_acpi_one(" -machine hmat=on"
> +                  " -smp 2,sockets=2"
> +                  " -m 128M,slots=2,maxmem=1G"
> +                  " -object memory-backend-ram,size=64M,id=m0"
> +                  " -object memory-backend-ram,size=64M,id=m1"
> +                  " -numa node,nodeid=0,memdev=m0"
> +                  " -numa node,nodeid=1,memdev=m1,initiator=0"
> +                  " -numa cpu,node-id=0,socket-id=0"
> +                  " -numa cpu,node-id=0,socket-id=1"
> +                  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
> +                  "data-type=access-latency,latency=5ns"
> +                  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
> +                  "data-type=access-bandwidth,bandwidth=500M"
> +                  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
> +                  "data-type=access-latency,latency=10ns"
> +                  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
> +                  "data-type=access-bandwidth,bandwidth=100M"
> +                  " -numa hmat-cache,node-id=0,size=0x20000,total=1,level=1"
> +                  ",assoc=direct,policy=write-back,line=8"
> +                  " -numa hmat-cache,node-id=1,size=0x20000,total=1,level=1"

use decimal notation with appropriate suffix for CLI args

other than that looks good to me, so above fixed

Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> +                  ",assoc=direct,policy=write-back,line=8",
> +                  &data);
> +    free_test_data(&data);
> +}
> +
> +static void test_acpi_q35_tcg_acpi_hmat(void)
> +{
> +    test_acpi_tcg_acpi_hmat(MACHINE_Q35);
> +}
> +
> +static void test_acpi_piix4_tcg_acpi_hmat(void)
> +{
> +    test_acpi_tcg_acpi_hmat(MACHINE_PC);
> +}
> +
>  static void test_acpi_virt_tcg(void)
>  {
>      test_data data = {
> @@ -914,6 +956,8 @@ int main(int argc, char *argv[])
>          qtest_add_func("acpi/q35/numamem", test_acpi_q35_tcg_numamem);
>          qtest_add_func("acpi/piix4/dimmpxm", test_acpi_piix4_tcg_dimm_pxm);
>          qtest_add_func("acpi/q35/dimmpxm", test_acpi_q35_tcg_dimm_pxm);
> +        qtest_add_func("acpi/piix4/acpihmat", test_acpi_piix4_tcg_acpi_hmat);
> +        qtest_add_func("acpi/q35/acpihmat", test_acpi_q35_tcg_acpi_hmat);
>      } else if (strcmp(arch, "aarch64") == 0) {
>          qtest_add_func("acpi/virt", test_acpi_virt_tcg);
>      }



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 06/11] numa: Extend CLI to provide memory latency and bandwidth information
  2019-10-02 15:16   ` Igor Mammedov
@ 2019-10-09  6:39     ` Tao Xu
  2019-10-11 13:56       ` Igor Mammedov
  0 siblings, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-10-09  6:39 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, jonathan.cameron,
	Williams, Dan J

On 10/2/2019 11:16 PM, Igor Mammedov wrote:
> On Fri, 20 Sep 2019 15:43:44 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
[...]
>> +struct HMAT_LB_Info {
>> +    /* Indicates it's memory or the specified level memory side cache. */
>> +    uint8_t     hierarchy;
>> +
>> +    /* Present the type of data, access/read/write latency or bandwidth. */
>> +    uint8_t     data_type;
>> +
>> +    /* Array to store the latencies */
> specify units it's stored in
> 
>> +    uint64_t    *latency;
>> +
>> +    /* Array to store the bandwidthes */
> ditto
> 
>> +    uint64_t    *bandwidth;
> btw:
> 
> what was the reason for picking uint64_t for storing above values?
> 
> it seems in this patch you dumb down bandwidth to MB/s above but
> store latency as is.

Because I want to store the bandwidth or latency value (minimum unit) 
that user input. In HMAT, the minimum unit of bandwidth is MB/s, but in 
QAPI, the minimum unit of size is Byte. So I convert size into MB/s and 
time unit is "ps", need not convert.
> 
> and then in 9/11 build_hmat_lb you divide that on 'base' units,
> where are guaranties that value stored here will fit into 2 bytes
> used in HMAT to store it in the table?
> 
For HMAT spec, for a matrix of bandwidth or latency, there is only one 
base (in order to save ACPI tables space). We need to extract base for a 
matrix, but user input bandwidth or latency line by line. So after all 
data input, we can extract the base (as in 9/11).

There is another benefit. If user input different but similar units, 
such as "10ns" and "100ps", we can also store them. Only If user input 
big gap units, such as "1ps" and "1000ms". we can't store them and raise 
error.

> if this structure should store values in terms on HMAT table it should
> probably use uint16_t and check that user provided value won't overflow
> at the time of CLI parsing.
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 07/11] numa: Extend CLI to provide memory side cache information
  2019-10-03 11:19   ` Igor Mammedov
@ 2019-10-09  7:54     ` Tao Xu
  2019-10-11 14:10       ` Igor Mammedov
  0 siblings, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-10-09  7:54 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, Daniel Black,
	jonathan.cameron, Williams, Dan J

On 10/3/2019 7:19 PM, Igor Mammedov wrote:
> On Fri, 20 Sep 2019 15:43:45 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> From: Liu Jingqi <jingqi.liu@intel.com>
>>
>> Add -numa hmat-cache option to provide Memory Side Cache Information.
>> These memory attributes help to build Memory Side Cache Information
>> Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT).
>>
>> Reviewed-by: Daniel Black <daniel@linux.ibm.com>
>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
>>
>> No changes in v12.
>>
>> Changes in v11:
>>      - Move numa option patches forward.
>> ---
>>   hw/core/numa.c        | 74 +++++++++++++++++++++++++++++++++++++++
>>   include/sysemu/numa.h | 31 +++++++++++++++++
>>   qapi/machine.json     | 81 +++++++++++++++++++++++++++++++++++++++++--
>>   qemu-options.hx       | 16 +++++++--
>>   4 files changed, 198 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/core/numa.c b/hw/core/numa.c
>> index f5a1c9e909..182e4d9d62 100644
>> --- a/hw/core/numa.c
>> +++ b/hw/core/numa.c
>> @@ -293,6 +293,67 @@ void parse_numa_hmat_lb(NumaState *nstat, NumaHmatLBOptions *node,
>>       }
>>   }
>>   
>> +void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
>> +                           Error **errp)
>> +{
>> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>> +    HMAT_Cache_Info *hmat_cache = NULL;
>> +
>> +    if (node->node_id >= nb_numa_nodes) {
>> +        error_setg(errp, "Invalid node-id=%" PRIu32
>> +                   ", it should be less than %d.",
>> +                   node->node_id, nb_numa_nodes);
>> +        return;
>> +    }
>> +
>> +    if (node->total > MAX_HMAT_CACHE_LEVEL) {
>> +        error_setg(errp, "Invalid total=%" PRIu8
>> +                   ", it should be less than or equal to %d.",
>> +                   node->total, MAX_HMAT_CACHE_LEVEL);
>> +        return;
>> +    }
>> +    if (node->level > node->total) {
>> +        error_setg(errp, "Invalid level=%" PRIu8
>> +                   ", it should be less than or equal to"
>> +                   " total=%" PRIu8 ".",
>> +                   node->level, node->total);
>> +        return;
>> +    }
>> +    if (ms->numa_state->hmat_cache[node->node_id][node->level]) {
>> +        error_setg(errp, "Duplicate configuration of the side cache for "
>> +                   "node-id=%" PRIu32 " and level=%" PRIu8 ".",
>> +                   node->node_id, node->level);
>> +        return;
>> +    }
>> +
>> +    if ((node->level > 1) &&
>> +        ms->numa_state->hmat_cache[node->node_id][node->level - 1] &&
>> +        (node->size >=
>> +            ms->numa_state->hmat_cache[node->node_id][node->level - 1]->size)) {
>> +        error_setg(errp, "Invalid size=0x%" PRIx64
>> +                   ", the size of level=%" PRIu8
>> +                   " should be less than the size(0x%" PRIx64
>> +                   ") of level=%" PRIu8 ".",
>> +                   node->size, node->level,
>> +                   ms->numa_state->hmat_cache[node->node_id]
>> +                                             [node->level - 1]->size,
>> +                   node->level - 1);
>> +        return;
>> +    }
>> +
>> +    hmat_cache = g_malloc0(sizeof(*hmat_cache));
>> +
>> +    hmat_cache->mem_proximity = node->node_id;
>> +    hmat_cache->size = node->size;
>> +    hmat_cache->total_levels = node->total;
>> +    hmat_cache->level = node->level;
>> +    hmat_cache->associativity = node->assoc;
>> +    hmat_cache->write_policy = node->policy;
>> +    hmat_cache->line_size = node->line;
>> +
>> +    ms->numa_state->hmat_cache[node->node_id][node->level] = hmat_cache;
>> +}
>> +
>>   void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>>   {
>>       Error *err = NULL;
>> @@ -344,6 +405,19 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>>               goto end;
>>           }
>>           break;
>> +    case NUMA_OPTIONS_TYPE_HMAT_CACHE:
>> +        if (!ms->numa_state->hmat_enabled) {
>> +            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
>> +                       "(HMAT) is disabled, use -machine hmat=on before "
>> +                       "set initiator of NUMA");
>> +            return;
> the same as in 6/11 at similar place
> 
>> +        }
>> +
>> +        parse_numa_hmat_cache(ms, &object->u.hmat_cache, &err);
>> +        if (err) {
>> +            goto end;
>> +        }
>> +        break;
>>       default:
>>           abort();
>>       }
>> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
>> index 876beaee22..39312eefd4 100644
>> --- a/include/sysemu/numa.h
>> +++ b/include/sysemu/numa.h
>> @@ -35,6 +35,8 @@ enum {
>>   #define HMAT_LB_LEVELS    (HMAT_LB_MEM_CACHE_3RD_LEVEL + 1)
>>   #define HMAT_LB_TYPES     (HMAT_LB_DATA_WRITE_BANDWIDTH + 1)
>>   
>> +#define MAX_HMAT_CACHE_LEVEL        3
> 
> s/3/HMAT_LB_MEM_CACHE_3RD_LEVEL/
> 
> 
>>   struct NodeInfo {
>>       uint64_t node_mem;
>>       struct HostMemoryBackend *node_memdev;
>> @@ -65,6 +67,30 @@ struct HMAT_LB_Info {
>>   };
>>   typedef struct HMAT_LB_Info HMAT_LB_Info;
>>   
>> +struct HMAT_Cache_Info {
>> +    /* The memory proximity domain to which the memory belongs. */
>> +    uint32_t    mem_proximity;
> mem prefix here is redundant
> 
>> +    /* Size of memory side cache in bytes. */
>> +    uint64_t    size;
>> +
>> +    /* Total cache levels for this memory proximity domain. */
>> +    uint8_t     total_levels;
>> +
>> +    /* Cache level described in this structure. */
>> +    uint8_t     level;
>> +
>> +    /* Cache Associativity: None/Direct Mapped/Comple Cache Indexing */
>> +    uint8_t     associativity;
>> +
>> +    /* Write Policy: None/Write Back(WB)/Write Through(WT) */
>> +    uint8_t     write_policy;
>> +
>> +    /* Cache Line size in bytes. */
>> +    uint16_t    line_size;
>> +};
>> +typedef struct HMAT_Cache_Info HMAT_Cache_Info;
>> +
>>   struct NumaState {
>>       /* Number of NUMA nodes */
>>       int num_nodes;
>> @@ -83,6 +109,9 @@ struct NumaState {
>>   
>>       /* NUMA nodes HMAT Locality Latency and Bandwidth Information */
>>       HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES];
>> +
>> +    /* Memory Side Cache Information Structure */
>> +    HMAT_Cache_Info *hmat_cache[MAX_NODES][MAX_HMAT_CACHE_LEVEL + 1];
>>   };
>>   typedef struct NumaState NumaState;
>>   
>> @@ -90,6 +119,8 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
>>   void parse_numa_opts(MachineState *ms);
>>   void parse_numa_hmat_lb(NumaState *nstat, NumaHmatLBOptions *node,
>>                           Error **errp);
>> +void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
>> +                           Error **errp);
>>   void numa_complete_configuration(MachineState *ms);
>>   void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
>>   extern QemuOptsList qemu_numa_opts;
>> diff --git a/qapi/machine.json b/qapi/machine.json
>> index b6019335e8..088be81920 100644
>> --- a/qapi/machine.json
>> +++ b/qapi/machine.json
>> @@ -428,10 +428,12 @@
>>   #
>>   # @hmat-lb: memory latency and bandwidth information (Since: 4.2)
>>   #
>> +# @hmat-cache: memory side cache information (Since: 4.2)
>> +#
>>   # Since: 2.1
>>   ##
>>   { 'enum': 'NumaOptionsType',
>> -  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
>> +  'data': [ 'node', 'dist', 'cpu', 'hmat-lb', 'hmat-cache' ] }
>>   
>>   ##
>>   # @NumaOptions:
>> @@ -447,7 +449,8 @@
>>       'node': 'NumaNodeOptions',
>>       'dist': 'NumaDistOptions',
>>       'cpu': 'NumaCpuOptions',
>> -    'hmat-lb': 'NumaHmatLBOptions' }}
>> +    'hmat-lb': 'NumaHmatLBOptions',
>> +    'hmat-cache': 'NumaHmatCacheOptions' }}
>>   
>>   ##
>>   # @NumaNodeOptions:
>> @@ -648,6 +651,80 @@
>>       '*latency': 'time',
>>       '*bandwidth': 'size' }}
>>   
>> +##
>> +# @HmatCacheAssociativity:
>> +#
>> +# Cache associativity in the Memory Side Cache
>> +# Information Structure of HMAT
>> +#
>> +# For more information of @HmatCacheAssociativity see
>> +# the chapter 5.2.27.5: Table 5-143 of ACPI 6.3 spec.
>> +#
>> +# @none: None
>> +#
>> +# @direct: Direct Mapped
>> +#
>> +# @complex: Complex Cache Indexing (implementation specific)
>> +#
>> +# Since: 4.2
>> +##
>> +{ 'enum': 'HmatCacheAssociativity',
>> +  'data': [ 'none', 'direct', 'complex' ] }
>> +
>> +##
>> +# @HmatCacheWritePolicy:
>> +#
>> +# Cache write policy in the Memory Side Cache
>> +# Information Structure of HMAT
>> +#
>> +# For more information of @HmatCacheWritePolicy see
>> +# the chapter 5.2.27.5: Table 5-143: Field "Cache Attributes" of ACPI 6.3 spec.
>> +#
>> +# @none: None
>> +#
>> +# @write-back: Write Back (WB)
>> +#
>> +# @write-through: Write Through (WT)
>> +#
>> +# Since: 4.2
>> +##
>> +{ 'enum': 'HmatCacheWritePolicy',
>> +  'data': [ 'none', 'write-back', 'write-through' ] }
>> +
>> +##
>> +# @NumaHmatCacheOptions:
>> +#
>> +# Set the memory side cache information for a given memory domain.
>> +#
>> +# For more information of @NumaHmatCacheOptions see
>> +# the chapter 5.2.27.5: Table 5-143: Field "Cache Attributes" of ACPI 6.3 spec.
>> +#
>> +# @node-id: the memory proximity domain to which the memory belongs.
>> +#
>> +# @size: the size of memory side cache in bytes.
>> +#
>> +# @total: the total cache levels for this memory proximity domain.
> 
> Can we calculate this without making user to do it?
> 

Yes we can. For example, if user input level 1 2 3, total is 3.

>> +# @level: the cache level described in this structure.
>> +#
>> +# @assoc: the cache associativity, none/direct-mapped/complex(complex cache indexing).
>> +#
>> +# @policy: the write policy, none/write-back/write-through.
>> +#
>> +# @line: the cache Line size in bytes.
>> +#
>> +# Since: 4.2
>> +##
>> +{ 'struct': 'NumaHmatCacheOptions',
>> +  'data': {
>> +   'node-id': 'uint32',
>> +   'size': 'size',
>> +   'total': 'uint8',
>> +   'level': 'uint8',
>> +   'assoc': 'HmatCacheAssociativity',
>> +   'policy': 'HmatCacheWritePolicy',
>> +   'line': 'uint16' }}
>> +
>>   ##
>>   # @HostMemPolicy:
>>   #
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 129da0cdc3..7cf214a653 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -169,7 +169,8 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>>       "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>>       "-numa dist,src=source,dst=destination,val=distance\n"
>>       "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
>> -    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
>> +    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n"
>> +    "-numa hmat-cache,node-id=node,size=size,total=total,level=level[,assoc=none|direct|complex][,policy=none|write-back|write-through][,line=size]\n",
>>       QEMU_ARCH_ALL)
>>   STEXI
>>   @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>> @@ -177,6 +178,7 @@ STEXI
>>   @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
>>   @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
>>   @itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{str},data-type=@var{str}[,latency=@var{lat}][,bandwidth=@var{bw}]
>> +@itemx -numa hmat-cache,node-id=@var{node},size=@var{size},total=@var{total},level=@var{level}[,assoc=@var{str}][,policy=@var{str}][,line=@var{size}]
>>   @findex -numa
>>   Define a NUMA node and assign RAM and VCPUs to it.
>>   Set the NUMA distance from a source node to a destination node.
>> @@ -282,11 +284,19 @@ if NUM is 0, means the corresponding latency or bandwidth information is not pro
>>   And if input numbers without any unit, the latency unit will be 'ps' and the bandwidth
>>   will be MB/s.
>>   
>> +In @samp{hmat-cache} option, @var{node-id} is the NUMA-id of the memory belongs.
>> +@var{size} is the size of memory side cache in bytes. @var{total} is the total cache levels.
>> +@var{level} is the cache level described in this structure. @var{assoc} is the cache associativity,
>> +the possible value is 'none/direct(direct-mapped)/complex(complex cache indexing)'.
>> +@var{policy} is the write policy. @var{line} is the cache Line size in bytes.
>> +
>>   For example, the following option assigns NUMA node 0 and 1. Node 0 has 2 cpus and
>>   a ram, node 1 has only a ram. The processors in node 0 access memory in node
>>   0 with access-latency 5 nanoseconds, access-bandwidth is 200 MB/s;
>>   The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
>>   nanoseconds, access-bandwidth is 100 MB/s.
>> +And for memory side cache information, NUMA node 0 and 1 both have 1 level memory
>> +cache, size is 0x20000 bytes, policy is write-back, the cache Line size is 8 bytes:
> hex is not particularly user readable format, use decimal here and size suffixes
> here and in the example below.
> 
>>   @example
>>   -machine hmat=on \
>>   -m 2G \
>> @@ -300,7 +310,9 @@ nanoseconds, access-bandwidth is 100 MB/s.
>>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
>>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
>>   -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
>> --numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M
>> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \
>> +-numa hmat-cache,node-id=0,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8 \
>> +-numa hmat-cache,node-id=1,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8
>>   @end example
>>   
>>   ETEXI
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 09/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)
  2019-10-03 14:41   ` Igor Mammedov
@ 2019-10-10  6:53     ` Tao Xu
  2019-10-11 14:08       ` Igor Mammedov
  0 siblings, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-10-10  6:53 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, jonathan.cameron,
	Williams, Dan J

On 10/3/2019 10:41 PM, Igor Mammedov wrote:
> On Fri, 20 Sep 2019 15:43:47 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> From: Liu Jingqi <jingqi.liu@intel.com>
>>
>> This structure describes the memory access latency and bandwidth
>> information from various memory access initiator proximity domains.
>> The latency and bandwidth numbers represented in this structure
>> correspond to rated latency and bandwidth for the platform.
>> The software could use this information as hint for optimization.
>>
>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
>>
>> Changes in v12:
>>      - Fix a bug that if HMAT is enabled and without hmat-lb setting,
>>        QEMU will crash. (reported by Danmei Wei)
>>
>> Changes in v11:
>>      - Calculate base in build_hmat_lb().
>> ---
>>   hw/acpi/hmat.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++++-
>>   hw/acpi/hmat.h |   2 +
>>   2 files changed, 127 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
>> index 1368fce7ee..e7be849581 100644
>> --- a/hw/acpi/hmat.c
>> +++ b/hw/acpi/hmat.c
>> @@ -27,6 +27,7 @@
>>   #include "qemu/osdep.h"
>>   #include "sysemu/numa.h"
>>   #include "hw/acpi/hmat.h"
>> +#include "qemu/error-report.h"
>>   
>>   /*
>>    * ACPI 6.3:
>> @@ -67,11 +68,105 @@ static void build_hmat_mpda(GArray *table_data, uint16_t flags, int initiator,
>>       build_append_int_noprefix(table_data, 0, 8);
>>   }
>>   
>> +static bool entry_overflow(uint64_t *lb_data, uint64_t base, int len)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < len; i++) {
>> +        if (lb_data[i] / base >= UINT16_MAX) {
>> +            return true;
>> +        }
>> +    }
>> +
>> +    return false;
>> +}
> I suggest to do this check at CLI parsing time
> 
>> +/*
>> + * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
>> + * Structure: Table 5-146
>> + */
>> +static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
>> +                          uint32_t num_initiator, uint32_t num_target,
>> +                          uint32_t *initiator_list, int type)
>> +{
>> +    uint8_t mask = 0x0f;
>> +    uint32_t s = num_initiator;
>> +    uint32_t t = num_target;
> drop this locals and use arguments directly
> 
>> +    uint64_t base = 1;
>> +    uint64_t *lb_data;
>> +    int i, unit;
>> +
>> +    /* Type */
>> +    build_append_int_noprefix(table_data, 1, 2);
>> +    /* Reserved */
>> +    build_append_int_noprefix(table_data, 0, 2);
>> +    /* Length */
>> +    build_append_int_noprefix(table_data, 32 + 4 * s + 4 * t + 2 * s * t, 4);
>                                               ^^^^
> to me above looks like /dev/random output, absolutely unreadable.
> Suggest to use local var (like: lb_length) for expression with comments
> beside magic numbers.
> 
>> +    /* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
>> +    build_append_int_noprefix(table_data, hmat_lb->hierarchy & mask, 1);
> 
> why do you need to use mask here?
> 
Because Bits[7:4] Reserved, so I use mask to keep it reserved.

>> +    /* Data Type */
>> +    build_append_int_noprefix(table_data, hmat_lb->data_type, 1);
> 
> Isn't hmat_lb->data_type and passed argument 'type' the same?
> 
Yes, I will drop 'type'.
> 
>> +    /* Reserved */
>> +    build_append_int_noprefix(table_data, 0, 2);
>> +    /* Number of Initiator Proximity Domains (s) */
>> +    build_append_int_noprefix(table_data, s, 4);
>> +    /* Number of Target Proximity Domains (t) */
>> +    build_append_int_noprefix(table_data, t, 4);
>> +    /* Reserved */
>> +    build_append_int_noprefix(table_data, 0, 4);
>> +
>> +    if (HMAT_IS_LATENCY(type)) {
>> +        unit = 1000;
>> +        lb_data = hmat_lb->latency;
>> +    } else {
>> +        unit = 1024;
>> +        lb_data = hmat_lb->bandwidth;
>> +    }
>> +
>> +    while (entry_overflow(lb_data, base, s * t)) {
>> +        for (i = 0; i < s * t; i++) {
>> +            if (!QEMU_IS_ALIGNED(lb_data[i], unit * base)) {
>> +                error_report("Invalid latency/bandwidth input, all "
>> +                "latencies/bandwidths should be specified in the same units.");
>> +                exit(1);
>> +            }
>> +        }
>> +        base *= unit;
>> +    }
> Can you clarify what you are trying to check here?
> 
This part I use entry_overflow() to check if uint16 can store entry. If 
can't store and the entries matrix can be divisible by unit * base, then 
base will be unit * base.

For example, if lb_data[i] are 1048576(1TB/s) and 1024(1GB/s), unit is 
1024, so 1048576 is bigger than UINT16_MAX, and can be divisible by 1024 
* 1, so base is 1024 and entries are 1024 and 1 (see entry = 
hmat_lb->latency[i] / base;). The benefit is even user input different 
unit(TB/s vs GB/s), we can still store the data as far as possible.

>> +
>> +    /* Entry Base Unit */
>> +    build_append_int_noprefix(table_data, base, 8);
>> +
>> +    /* Initiator Proximity Domain List */
>> +    for (i = 0; i < s; i++) {
>> +        build_append_int_noprefix(table_data, initiator_list[i], 4);
>> +    }
>> +
>> +    /* Target Proximity Domain List */
>> +    for (i = 0; i < t; i++) {
>> +        build_append_int_noprefix(table_data, i, 4);
>> +    }
>> +
>> +    /* Latency or Bandwidth Entries */
>> +    for (i = 0; i < s * t; i++) {
>> +        uint16_t entry;
>> +
>> +        if (HMAT_IS_LATENCY(type)) {
> drop if condition and reuse lb_data, that you've just initialized above
> 
> 
>> +            entry = hmat_lb->latency[i] / base;
> ...
>> +            entry = hmat_lb->bandwidth[i] / base;
> I'm not sure that above is correct.
> Pls clarify math behind above 2 expressions
> 
>> +        }
>> +
>> +        build_append_int_noprefix(table_data, entry, 2);
>> +    }
>> +}
>> +
>>   /* Build HMAT sub table structures */
>>   static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
>>   {
>>       uint16_t flags;
>> -    int i;
>> +    uint32_t *initiator_list = NULL;
>> +    int i, j, hrchy, type;
> s/hrchy/hierarchy/
> 
>> +    HMAT_LB_Info *numa_hmat_lb;
>>   
>>       for (i = 0; i < nstat->num_nodes; i++) {
>>           flags = 0;
>> @@ -82,6 +177,35 @@ static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
>>   
>>           build_hmat_mpda(table_data, flags, nstat->nodes[i].initiator, i);
>>       }
>> +
>> +    if (nstat->num_initiator) {
>> +        initiator_list = g_malloc0(nstat->num_initiator * sizeof(uint32_t));
>> +        for (i = 0, j = 0; i < nstat->num_nodes; i++) {
>> +            if (nstat->nodes[i].has_cpu) {
>> +                initiator_list[j] = i;
>> +                j++;
>> +            }
>> +        }
>> +    }
>> +
>> +    /*
>> +     * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
>> +     * Structure: Table 5-146
>> +     */
>> +    for (hrchy = HMAT_LB_MEM_MEMORY;
>> +         hrchy <= HMAT_LB_MEM_CACHE_3RD_LEVEL; hrchy++) {
>> +        for (type = HMAT_LB_DATA_ACCESS_LATENCY;
>> +             type <= HMAT_LB_DATA_WRITE_BANDWIDTH; type++) {
>> +            numa_hmat_lb = nstat->hmat_lb[hrchy][type];
>> +
>> +            if (numa_hmat_lb) {
>> +                build_hmat_lb(table_data, numa_hmat_lb, nstat->num_initiator,
>> +                              nstat->num_nodes, initiator_list, type);
>> +            }
>> +        }
>> +    }
>> +
>> +    g_free(initiator_list);
>>   }
>>   
>>   void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat)
>> diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
>> index 0c1839cf6f..1154dfb48e 100644
>> --- a/hw/acpi/hmat.h
>> +++ b/hw/acpi/hmat.h
>> @@ -40,6 +40,8 @@
>>    */
>>   #define HMAT_PROX_INIT_VALID 0x1
>>   
>> +#define HMAT_IS_LATENCY(type) (type <= HMAT_LB_DATA_WRITE_LATENCY)
> 
> it's not worth to create macro for 1-off calculation, just drop it
> and s/if (HMAT_IS_LATENCY(type))/if(type <= HMAT_LB_DATA_WRITE_LATENCY)/
> 
>> +
>>   void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat);
>>   
>>   #endif
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 06/11] numa: Extend CLI to provide memory latency and bandwidth information
  2019-10-09  6:39     ` Tao Xu
@ 2019-10-11 13:56       ` Igor Mammedov
  2019-10-12  2:54         ` Tao Xu
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2019-10-11 13:56 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, jonathan.cameron,
	Williams, Dan J

On Wed, 9 Oct 2019 14:39:46 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> On 10/2/2019 11:16 PM, Igor Mammedov wrote:
> > On Fri, 20 Sep 2019 15:43:44 +0800
> > Tao Xu <tao3.xu@intel.com> wrote:
> >   
> [...]
> >> +struct HMAT_LB_Info {
> >> +    /* Indicates it's memory or the specified level memory side cache. */
> >> +    uint8_t     hierarchy;
> >> +
> >> +    /* Present the type of data, access/read/write latency or bandwidth. */
> >> +    uint8_t     data_type;
> >> +
> >> +    /* Array to store the latencies */  
> > specify units it's stored in
> >   
> >> +    uint64_t    *latency;
> >> +
> >> +    /* Array to store the bandwidthes */  
> > ditto
> >   
> >> +    uint64_t    *bandwidth;  
> > btw:
> > 
> > what was the reason for picking uint64_t for storing above values?
> > 
> > it seems in this patch you dumb down bandwidth to MB/s above but
> > store latency as is.  
> 
> Because I want to store the bandwidth or latency value (minimum unit) 
> that user input. In HMAT, the minimum unit of bandwidth is MB/s, but in 
> QAPI, the minimum unit of size is Byte. So I convert size into MB/s and 
> time unit is "ps", need not convert.
Just be consistent and store (user input) raw values for both fields
(i.e. B/s PS/s) and post-process them later to uint16_t.

> > and then in 9/11 build_hmat_lb you divide that on 'base' units,
> > where are guaranties that value stored here will fit into 2 bytes
> > used in HMAT to store it in the table?
> >   
> For HMAT spec, for a matrix of bandwidth or latency, there is only one 
> base (in order to save ACPI tables space). We need to extract base for a 
> matrix, but user input bandwidth or latency line by line. So after all 
> data input, we can extract the base (as in 9/11).
> 
> There is another benefit. If user input different but similar units, 
> such as "10ns" and "100ps", we can also store them. Only If user input 
> big gap units, such as "1ps" and "1000ms". we can't store them and raise 
> error.
No disagreement here,

but I suggest to move verification and base calculation from 09/11
into a separate patch (right after this one) and doing it at
numa_complete_configuration() time.
To store calculated base you can add a common_base field to
sub-table structure (HMAT_LB_Info) and use it when building ACPI
table without extra calculations.

> > if this structure should store values in terms on HMAT table it should
> > probably use uint16_t and check that user provided value won't overflow
> > at the time of CLI parsing.
> >   
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 09/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)
  2019-10-10  6:53     ` Tao Xu
@ 2019-10-11 14:08       ` Igor Mammedov
  2019-10-12  3:04         ` Tao Xu
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2019-10-11 14:08 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, jonathan.cameron,
	Williams, Dan J

On Thu, 10 Oct 2019 14:53:56 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> On 10/3/2019 10:41 PM, Igor Mammedov wrote:
> > On Fri, 20 Sep 2019 15:43:47 +0800
> > Tao Xu <tao3.xu@intel.com> wrote:
> >   
> >> From: Liu Jingqi <jingqi.liu@intel.com>
> >>
> >> This structure describes the memory access latency and bandwidth
> >> information from various memory access initiator proximity domains.
> >> The latency and bandwidth numbers represented in this structure
> >> correspond to rated latency and bandwidth for the platform.
> >> The software could use this information as hint for optimization.
> >>
> >> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> >> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> >> ---
> >>
> >> Changes in v12:
> >>      - Fix a bug that if HMAT is enabled and without hmat-lb setting,
> >>        QEMU will crash. (reported by Danmei Wei)
> >>
> >> Changes in v11:
> >>      - Calculate base in build_hmat_lb().
> >> ---
> >>   hw/acpi/hmat.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >>   hw/acpi/hmat.h |   2 +
> >>   2 files changed, 127 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
> >> index 1368fce7ee..e7be849581 100644
> >> --- a/hw/acpi/hmat.c
> >> +++ b/hw/acpi/hmat.c
> >> @@ -27,6 +27,7 @@
> >>   #include "qemu/osdep.h"
> >>   #include "sysemu/numa.h"
> >>   #include "hw/acpi/hmat.h"
> >> +#include "qemu/error-report.h"
> >>   
> >>   /*
> >>    * ACPI 6.3:
> >> @@ -67,11 +68,105 @@ static void build_hmat_mpda(GArray *table_data, uint16_t flags, int initiator,
> >>       build_append_int_noprefix(table_data, 0, 8);
> >>   }
> >>   
> >> +static bool entry_overflow(uint64_t *lb_data, uint64_t base, int len)
> >> +{
> >> +    int i;
> >> +
> >> +    for (i = 0; i < len; i++) {
> >> +        if (lb_data[i] / base >= UINT16_MAX) {
> >> +            return true;
> >> +        }
> >> +    }
> >> +
> >> +    return false;
> >> +}  
> > I suggest to do this check at CLI parsing time
> >   
> >> +/*
> >> + * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
> >> + * Structure: Table 5-146
> >> + */
> >> +static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
> >> +                          uint32_t num_initiator, uint32_t num_target,
> >> +                          uint32_t *initiator_list, int type)
> >> +{
> >> +    uint8_t mask = 0x0f;
> >> +    uint32_t s = num_initiator;
> >> +    uint32_t t = num_target;  
> > drop this locals and use arguments directly
> >   
> >> +    uint64_t base = 1;
> >> +    uint64_t *lb_data;
> >> +    int i, unit;
> >> +
> >> +    /* Type */
> >> +    build_append_int_noprefix(table_data, 1, 2);
> >> +    /* Reserved */
> >> +    build_append_int_noprefix(table_data, 0, 2);
> >> +    /* Length */
> >> +    build_append_int_noprefix(table_data, 32 + 4 * s + 4 * t + 2 * s * t, 4);  
> >                                               ^^^^
> > to me above looks like /dev/random output, absolutely unreadable.
> > Suggest to use local var (like: lb_length) for expression with comments
> > beside magic numbers.
> >   
> >> +    /* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
> >> +    build_append_int_noprefix(table_data, hmat_lb->hierarchy & mask, 1);  
> > 
> > why do you need to use mask here?
> >   
> Because Bits[7:4] Reserved, so I use mask to keep it reserved.

these bits are not user provided and set to 0, if they get set it's
programming error and instead of masking problem out QEMU should abort,
I suggest replace masking with assert(!foo>>x).

> 
> >> +    /* Data Type */
> >> +    build_append_int_noprefix(table_data, hmat_lb->data_type, 1);  
> > 
> > Isn't hmat_lb->data_type and passed argument 'type' the same?
> >   
> Yes, I will drop 'type'.
> >   
> >> +    /* Reserved */
> >> +    build_append_int_noprefix(table_data, 0, 2);
> >> +    /* Number of Initiator Proximity Domains (s) */
> >> +    build_append_int_noprefix(table_data, s, 4);
> >> +    /* Number of Target Proximity Domains (t) */
> >> +    build_append_int_noprefix(table_data, t, 4);
> >> +    /* Reserved */
> >> +    build_append_int_noprefix(table_data, 0, 4);
> >> +
> >> +    if (HMAT_IS_LATENCY(type)) {
> >> +        unit = 1000;
> >> +        lb_data = hmat_lb->latency;
> >> +    } else {
> >> +        unit = 1024;
> >> +        lb_data = hmat_lb->bandwidth;
> >> +    }
> >> +
> >> +    while (entry_overflow(lb_data, base, s * t)) {
> >> +        for (i = 0; i < s * t; i++) {
> >> +            if (!QEMU_IS_ALIGNED(lb_data[i], unit * base)) {
> >> +                error_report("Invalid latency/bandwidth input, all "
> >> +                "latencies/bandwidths should be specified in the same units.");
> >> +                exit(1);
> >> +            }
> >> +        }
> >> +        base *= unit;
> >> +    }  
> > Can you clarify what you are trying to check here?
> >   
> This part I use entry_overflow() to check if uint16 can store entry. If 
> can't store and the entries matrix can be divisible by unit * base, then 
> base will be unit * base.
> 
> For example, if lb_data[i] are 1048576(1TB/s) and 1024(1GB/s), unit is 
> 1024, so 1048576 is bigger than UINT16_MAX, and can be divisible by 1024 
> * 1, so base is 1024 and entries are 1024 and 1 (see entry = 
> hmat_lb->latency[i] / base;). The benefit is even user input different 
> unit(TB/s vs GB/s), we can still store the data as far as possible.

Is it possible instead of doing multiple iterations over lb_data
until it finds valid base, just go over lb_data once to find MIN/MAX
and then calculate base using it. Error out with max/min offending
values if it's not possible to compress the range into uint16_t?


> >> +
> >> +    /* Entry Base Unit */
> >> +    build_append_int_noprefix(table_data, base, 8);
> >> +
> >> +    /* Initiator Proximity Domain List */
> >> +    for (i = 0; i < s; i++) {
> >> +        build_append_int_noprefix(table_data, initiator_list[i], 4);
> >> +    }
> >> +
> >> +    /* Target Proximity Domain List */
> >> +    for (i = 0; i < t; i++) {
> >> +        build_append_int_noprefix(table_data, i, 4);
> >> +    }
> >> +
> >> +    /* Latency or Bandwidth Entries */
> >> +    for (i = 0; i < s * t; i++) {
> >> +        uint16_t entry;
> >> +
> >> +        if (HMAT_IS_LATENCY(type)) {  
> > drop if condition and reuse lb_data, that you've just initialized above
> > 
> >   
> >> +            entry = hmat_lb->latency[i] / base;  
> > ...  
> >> +            entry = hmat_lb->bandwidth[i] / base;  
> > I'm not sure that above is correct.
> > Pls clarify math behind above 2 expressions
> >   
> >> +        }
> >> +
> >> +        build_append_int_noprefix(table_data, entry, 2);
> >> +    }
> >> +}
> >> +
> >>   /* Build HMAT sub table structures */
> >>   static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
> >>   {
> >>       uint16_t flags;
> >> -    int i;
> >> +    uint32_t *initiator_list = NULL;
> >> +    int i, j, hrchy, type;  
> > s/hrchy/hierarchy/
> >   
> >> +    HMAT_LB_Info *numa_hmat_lb;
> >>   
> >>       for (i = 0; i < nstat->num_nodes; i++) {
> >>           flags = 0;
> >> @@ -82,6 +177,35 @@ static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
> >>   
> >>           build_hmat_mpda(table_data, flags, nstat->nodes[i].initiator, i);
> >>       }
> >> +
> >> +    if (nstat->num_initiator) {
> >> +        initiator_list = g_malloc0(nstat->num_initiator * sizeof(uint32_t));
> >> +        for (i = 0, j = 0; i < nstat->num_nodes; i++) {
> >> +            if (nstat->nodes[i].has_cpu) {
> >> +                initiator_list[j] = i;
> >> +                j++;
> >> +            }
> >> +        }
> >> +    }
> >> +
> >> +    /*
> >> +     * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
> >> +     * Structure: Table 5-146
> >> +     */
> >> +    for (hrchy = HMAT_LB_MEM_MEMORY;
> >> +         hrchy <= HMAT_LB_MEM_CACHE_3RD_LEVEL; hrchy++) {
> >> +        for (type = HMAT_LB_DATA_ACCESS_LATENCY;
> >> +             type <= HMAT_LB_DATA_WRITE_BANDWIDTH; type++) {
> >> +            numa_hmat_lb = nstat->hmat_lb[hrchy][type];
> >> +
> >> +            if (numa_hmat_lb) {
> >> +                build_hmat_lb(table_data, numa_hmat_lb, nstat->num_initiator,
> >> +                              nstat->num_nodes, initiator_list, type);
> >> +            }
> >> +        }
> >> +    }
> >> +
> >> +    g_free(initiator_list);
> >>   }
> >>   
> >>   void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat)
> >> diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
> >> index 0c1839cf6f..1154dfb48e 100644
> >> --- a/hw/acpi/hmat.h
> >> +++ b/hw/acpi/hmat.h
> >> @@ -40,6 +40,8 @@
> >>    */
> >>   #define HMAT_PROX_INIT_VALID 0x1
> >>   
> >> +#define HMAT_IS_LATENCY(type) (type <= HMAT_LB_DATA_WRITE_LATENCY)  
> > 
> > it's not worth to create macro for 1-off calculation, just drop it
> > and s/if (HMAT_IS_LATENCY(type))/if(type <= HMAT_LB_DATA_WRITE_LATENCY)/
> >   
> >> +
> >>   void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat);
> >>   
> >>   #endif  
> >   
> 
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 07/11] numa: Extend CLI to provide memory side cache information
  2019-10-09  7:54     ` Tao Xu
@ 2019-10-11 14:10       ` Igor Mammedov
  0 siblings, 0 replies; 34+ messages in thread
From: Igor Mammedov @ 2019-10-11 14:10 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, Daniel Black,
	jonathan.cameron, Williams, Dan J

On Wed, 9 Oct 2019 15:54:00 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> On 10/3/2019 7:19 PM, Igor Mammedov wrote:
> > On Fri, 20 Sep 2019 15:43:45 +0800
> > Tao Xu <tao3.xu@intel.com> wrote:
> >   
> >> From: Liu Jingqi <jingqi.liu@intel.com>
> >>
> >> Add -numa hmat-cache option to provide Memory Side Cache Information.
> >> These memory attributes help to build Memory Side Cache Information
> >> Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT).
> >>
> >> Reviewed-by: Daniel Black <daniel@linux.ibm.com>
> >> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> >> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> >> ---
> >>
> >> No changes in v12.
> >>
> >> Changes in v11:
> >>      - Move numa option patches forward.
> >> ---
> >>   hw/core/numa.c        | 74 +++++++++++++++++++++++++++++++++++++++
> >>   include/sysemu/numa.h | 31 +++++++++++++++++
> >>   qapi/machine.json     | 81 +++++++++++++++++++++++++++++++++++++++++--
> >>   qemu-options.hx       | 16 +++++++--
> >>   4 files changed, 198 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/hw/core/numa.c b/hw/core/numa.c
> >> index f5a1c9e909..182e4d9d62 100644
> >> --- a/hw/core/numa.c
> >> +++ b/hw/core/numa.c
> >> @@ -293,6 +293,67 @@ void parse_numa_hmat_lb(NumaState *nstat, NumaHmatLBOptions *node,
> >>       }
> >>   }
> >>   
> >> +void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
> >> +                           Error **errp)
> >> +{
> >> +    int nb_numa_nodes = ms->numa_state->num_nodes;
> >> +    HMAT_Cache_Info *hmat_cache = NULL;
> >> +
> >> +    if (node->node_id >= nb_numa_nodes) {
> >> +        error_setg(errp, "Invalid node-id=%" PRIu32
> >> +                   ", it should be less than %d.",
> >> +                   node->node_id, nb_numa_nodes);
> >> +        return;
> >> +    }
> >> +
> >> +    if (node->total > MAX_HMAT_CACHE_LEVEL) {
> >> +        error_setg(errp, "Invalid total=%" PRIu8
> >> +                   ", it should be less than or equal to %d.",
> >> +                   node->total, MAX_HMAT_CACHE_LEVEL);
> >> +        return;
> >> +    }
> >> +    if (node->level > node->total) {
> >> +        error_setg(errp, "Invalid level=%" PRIu8
> >> +                   ", it should be less than or equal to"
> >> +                   " total=%" PRIu8 ".",
> >> +                   node->level, node->total);
> >> +        return;
> >> +    }
> >> +    if (ms->numa_state->hmat_cache[node->node_id][node->level]) {
> >> +        error_setg(errp, "Duplicate configuration of the side cache for "
> >> +                   "node-id=%" PRIu32 " and level=%" PRIu8 ".",
> >> +                   node->node_id, node->level);
> >> +        return;
> >> +    }
> >> +
> >> +    if ((node->level > 1) &&
> >> +        ms->numa_state->hmat_cache[node->node_id][node->level - 1] &&
> >> +        (node->size >=
> >> +            ms->numa_state->hmat_cache[node->node_id][node->level - 1]->size)) {
> >> +        error_setg(errp, "Invalid size=0x%" PRIx64
> >> +                   ", the size of level=%" PRIu8
> >> +                   " should be less than the size(0x%" PRIx64
> >> +                   ") of level=%" PRIu8 ".",
> >> +                   node->size, node->level,
> >> +                   ms->numa_state->hmat_cache[node->node_id]
> >> +                                             [node->level - 1]->size,
> >> +                   node->level - 1);
> >> +        return;
> >> +    }
> >> +
> >> +    hmat_cache = g_malloc0(sizeof(*hmat_cache));
> >> +
> >> +    hmat_cache->mem_proximity = node->node_id;
> >> +    hmat_cache->size = node->size;
> >> +    hmat_cache->total_levels = node->total;
> >> +    hmat_cache->level = node->level;
> >> +    hmat_cache->associativity = node->assoc;
> >> +    hmat_cache->write_policy = node->policy;
> >> +    hmat_cache->line_size = node->line;
> >> +
> >> +    ms->numa_state->hmat_cache[node->node_id][node->level] = hmat_cache;
> >> +}
> >> +
> >>   void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
> >>   {
> >>       Error *err = NULL;
> >> @@ -344,6 +405,19 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
> >>               goto end;
> >>           }
> >>           break;
> >> +    case NUMA_OPTIONS_TYPE_HMAT_CACHE:
> >> +        if (!ms->numa_state->hmat_enabled) {
> >> +            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
> >> +                       "(HMAT) is disabled, use -machine hmat=on before "
> >> +                       "set initiator of NUMA");
> >> +            return;  
> > the same as in 6/11 at similar place
> >   
> >> +        }
> >> +
> >> +        parse_numa_hmat_cache(ms, &object->u.hmat_cache, &err);
> >> +        if (err) {
> >> +            goto end;
> >> +        }
> >> +        break;
> >>       default:
> >>           abort();
> >>       }
> >> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> >> index 876beaee22..39312eefd4 100644
> >> --- a/include/sysemu/numa.h
> >> +++ b/include/sysemu/numa.h
> >> @@ -35,6 +35,8 @@ enum {
> >>   #define HMAT_LB_LEVELS    (HMAT_LB_MEM_CACHE_3RD_LEVEL + 1)
> >>   #define HMAT_LB_TYPES     (HMAT_LB_DATA_WRITE_BANDWIDTH + 1)
> >>   
> >> +#define MAX_HMAT_CACHE_LEVEL        3  
> > 
> > s/3/HMAT_LB_MEM_CACHE_3RD_LEVEL/
> > 
> >   
> >>   struct NodeInfo {
> >>       uint64_t node_mem;
> >>       struct HostMemoryBackend *node_memdev;
> >> @@ -65,6 +67,30 @@ struct HMAT_LB_Info {
> >>   };
> >>   typedef struct HMAT_LB_Info HMAT_LB_Info;
> >>   
> >> +struct HMAT_Cache_Info {
> >> +    /* The memory proximity domain to which the memory belongs. */
> >> +    uint32_t    mem_proximity;  
> > mem prefix here is redundant
> >   
> >> +    /* Size of memory side cache in bytes. */
> >> +    uint64_t    size;
> >> +
> >> +    /* Total cache levels for this memory proximity domain. */
> >> +    uint8_t     total_levels;
> >> +
> >> +    /* Cache level described in this structure. */
> >> +    uint8_t     level;
> >> +
> >> +    /* Cache Associativity: None/Direct Mapped/Comple Cache Indexing */
> >> +    uint8_t     associativity;
> >> +
> >> +    /* Write Policy: None/Write Back(WB)/Write Through(WT) */
> >> +    uint8_t     write_policy;
> >> +
> >> +    /* Cache Line size in bytes. */
> >> +    uint16_t    line_size;
> >> +};
> >> +typedef struct HMAT_Cache_Info HMAT_Cache_Info;
> >> +
> >>   struct NumaState {
> >>       /* Number of NUMA nodes */
> >>       int num_nodes;
> >> @@ -83,6 +109,9 @@ struct NumaState {
> >>   
> >>       /* NUMA nodes HMAT Locality Latency and Bandwidth Information */
> >>       HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES];
> >> +
> >> +    /* Memory Side Cache Information Structure */
> >> +    HMAT_Cache_Info *hmat_cache[MAX_NODES][MAX_HMAT_CACHE_LEVEL + 1];
> >>   };
> >>   typedef struct NumaState NumaState;
> >>   
> >> @@ -90,6 +119,8 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
> >>   void parse_numa_opts(MachineState *ms);
> >>   void parse_numa_hmat_lb(NumaState *nstat, NumaHmatLBOptions *node,
> >>                           Error **errp);
> >> +void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
> >> +                           Error **errp);
> >>   void numa_complete_configuration(MachineState *ms);
> >>   void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
> >>   extern QemuOptsList qemu_numa_opts;
> >> diff --git a/qapi/machine.json b/qapi/machine.json
> >> index b6019335e8..088be81920 100644
> >> --- a/qapi/machine.json
> >> +++ b/qapi/machine.json
> >> @@ -428,10 +428,12 @@
> >>   #
> >>   # @hmat-lb: memory latency and bandwidth information (Since: 4.2)
> >>   #
> >> +# @hmat-cache: memory side cache information (Since: 4.2)
> >> +#
> >>   # Since: 2.1
> >>   ##
> >>   { 'enum': 'NumaOptionsType',
> >> -  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
> >> +  'data': [ 'node', 'dist', 'cpu', 'hmat-lb', 'hmat-cache' ] }
> >>   
> >>   ##
> >>   # @NumaOptions:
> >> @@ -447,7 +449,8 @@
> >>       'node': 'NumaNodeOptions',
> >>       'dist': 'NumaDistOptions',
> >>       'cpu': 'NumaCpuOptions',
> >> -    'hmat-lb': 'NumaHmatLBOptions' }}
> >> +    'hmat-lb': 'NumaHmatLBOptions',
> >> +    'hmat-cache': 'NumaHmatCacheOptions' }}
> >>   
> >>   ##
> >>   # @NumaNodeOptions:
> >> @@ -648,6 +651,80 @@
> >>       '*latency': 'time',
> >>       '*bandwidth': 'size' }}
> >>   
> >> +##
> >> +# @HmatCacheAssociativity:
> >> +#
> >> +# Cache associativity in the Memory Side Cache
> >> +# Information Structure of HMAT
> >> +#
> >> +# For more information of @HmatCacheAssociativity see
> >> +# the chapter 5.2.27.5: Table 5-143 of ACPI 6.3 spec.
> >> +#
> >> +# @none: None
> >> +#
> >> +# @direct: Direct Mapped
> >> +#
> >> +# @complex: Complex Cache Indexing (implementation specific)
> >> +#
> >> +# Since: 4.2
> >> +##
> >> +{ 'enum': 'HmatCacheAssociativity',
> >> +  'data': [ 'none', 'direct', 'complex' ] }
> >> +
> >> +##
> >> +# @HmatCacheWritePolicy:
> >> +#
> >> +# Cache write policy in the Memory Side Cache
> >> +# Information Structure of HMAT
> >> +#
> >> +# For more information of @HmatCacheWritePolicy see
> >> +# the chapter 5.2.27.5: Table 5-143: Field "Cache Attributes" of ACPI 6.3 spec.
> >> +#
> >> +# @none: None
> >> +#
> >> +# @write-back: Write Back (WB)
> >> +#
> >> +# @write-through: Write Through (WT)
> >> +#
> >> +# Since: 4.2
> >> +##
> >> +{ 'enum': 'HmatCacheWritePolicy',
> >> +  'data': [ 'none', 'write-back', 'write-through' ] }
> >> +
> >> +##
> >> +# @NumaHmatCacheOptions:
> >> +#
> >> +# Set the memory side cache information for a given memory domain.
> >> +#
> >> +# For more information of @NumaHmatCacheOptions see
> >> +# the chapter 5.2.27.5: Table 5-143: Field "Cache Attributes" of ACPI 6.3 spec.
> >> +#
> >> +# @node-id: the memory proximity domain to which the memory belongs.
> >> +#
> >> +# @size: the size of memory side cache in bytes.
> >> +#
> >> +# @total: the total cache levels for this memory proximity domain.  
> > 
> > Can we calculate this without making user to do it?
> >   
> 
> Yes we can. For example, if user input level 1 2 3, total is 3.

Please do so

> 
> >> +# @level: the cache level described in this structure.
> >> +#
> >> +# @assoc: the cache associativity, none/direct-mapped/complex(complex cache indexing).
> >> +#
> >> +# @policy: the write policy, none/write-back/write-through.
> >> +#
> >> +# @line: the cache Line size in bytes.
> >> +#
> >> +# Since: 4.2
> >> +##
> >> +{ 'struct': 'NumaHmatCacheOptions',
> >> +  'data': {
> >> +   'node-id': 'uint32',
> >> +   'size': 'size',
> >> +   'total': 'uint8',
> >> +   'level': 'uint8',
> >> +   'assoc': 'HmatCacheAssociativity',
> >> +   'policy': 'HmatCacheWritePolicy',
> >> +   'line': 'uint16' }}
> >> +
> >>   ##
> >>   # @HostMemPolicy:
> >>   #
> >> diff --git a/qemu-options.hx b/qemu-options.hx
> >> index 129da0cdc3..7cf214a653 100644
> >> --- a/qemu-options.hx
> >> +++ b/qemu-options.hx
> >> @@ -169,7 +169,8 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> >>       "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
> >>       "-numa dist,src=source,dst=destination,val=distance\n"
> >>       "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
> >> -    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
> >> +    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n"
> >> +    "-numa hmat-cache,node-id=node,size=size,total=total,level=level[,assoc=none|direct|complex][,policy=none|write-back|write-through][,line=size]\n",
> >>       QEMU_ARCH_ALL)
> >>   STEXI
> >>   @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
> >> @@ -177,6 +178,7 @@ STEXI
> >>   @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
> >>   @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
> >>   @itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{str},data-type=@var{str}[,latency=@var{lat}][,bandwidth=@var{bw}]
> >> +@itemx -numa hmat-cache,node-id=@var{node},size=@var{size},total=@var{total},level=@var{level}[,assoc=@var{str}][,policy=@var{str}][,line=@var{size}]
> >>   @findex -numa
> >>   Define a NUMA node and assign RAM and VCPUs to it.
> >>   Set the NUMA distance from a source node to a destination node.
> >> @@ -282,11 +284,19 @@ if NUM is 0, means the corresponding latency or bandwidth information is not pro
> >>   And if input numbers without any unit, the latency unit will be 'ps' and the bandwidth
> >>   will be MB/s.
> >>   
> >> +In @samp{hmat-cache} option, @var{node-id} is the NUMA-id of the memory belongs.
> >> +@var{size} is the size of memory side cache in bytes. @var{total} is the total cache levels.
> >> +@var{level} is the cache level described in this structure. @var{assoc} is the cache associativity,
> >> +the possible value is 'none/direct(direct-mapped)/complex(complex cache indexing)'.
> >> +@var{policy} is the write policy. @var{line} is the cache Line size in bytes.
> >> +
> >>   For example, the following option assigns NUMA node 0 and 1. Node 0 has 2 cpus and
> >>   a ram, node 1 has only a ram. The processors in node 0 access memory in node
> >>   0 with access-latency 5 nanoseconds, access-bandwidth is 200 MB/s;
> >>   The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
> >>   nanoseconds, access-bandwidth is 100 MB/s.
> >> +And for memory side cache information, NUMA node 0 and 1 both have 1 level memory
> >> +cache, size is 0x20000 bytes, policy is write-back, the cache Line size is 8 bytes:  
> > hex is not particularly user readable format, use decimal here and size suffixes
> > here and in the example below.
> >   
> >>   @example
> >>   -machine hmat=on \
> >>   -m 2G \
> >> @@ -300,7 +310,9 @@ nanoseconds, access-bandwidth is 100 MB/s.
> >>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
> >>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
> >>   -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
> >> --numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M
> >> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \
> >> +-numa hmat-cache,node-id=0,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8 \
> >> +-numa hmat-cache,node-id=1,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8
> >>   @end example
> >>   
> >>   ETEXI  
> >   
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 06/11] numa: Extend CLI to provide memory latency and bandwidth information
  2019-10-11 13:56       ` Igor Mammedov
@ 2019-10-12  2:54         ` Tao Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Tao Xu @ 2019-10-12  2:54 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, jonathan.cameron,
	Williams, Dan J

On 10/11/2019 9:56 PM, Igor Mammedov wrote:
> On Wed, 9 Oct 2019 14:39:46 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> On 10/2/2019 11:16 PM, Igor Mammedov wrote:
>>> On Fri, 20 Sep 2019 15:43:44 +0800
>>> Tao Xu <tao3.xu@intel.com> wrote:
>>>    
>> [...]
>>>> +struct HMAT_LB_Info {
>>>> +    /* Indicates it's memory or the specified level memory side cache. */
>>>> +    uint8_t     hierarchy;
>>>> +
>>>> +    /* Present the type of data, access/read/write latency or bandwidth. */
>>>> +    uint8_t     data_type;
>>>> +
>>>> +    /* Array to store the latencies */
>>> specify units it's stored in
>>>    
>>>> +    uint64_t    *latency;
>>>> +
>>>> +    /* Array to store the bandwidthes */
>>> ditto
>>>    
>>>> +    uint64_t    *bandwidth;
>>> btw:
>>>
>>> what was the reason for picking uint64_t for storing above values?
>>>
>>> it seems in this patch you dumb down bandwidth to MB/s above but
>>> store latency as is.
>>
>> Because I want to store the bandwidth or latency value (minimum unit)
>> that user input. In HMAT, the minimum unit of bandwidth is MB/s, but in
>> QAPI, the minimum unit of size is Byte. So I convert size into MB/s and
>> time unit is "ps", need not convert.
> Just be consistent and store (user input) raw values for both fields
> (i.e. B/s PS/s) and post-process them later to uint16_t.
> 
>>> and then in 9/11 build_hmat_lb you divide that on 'base' units,
>>> where are guaranties that value stored here will fit into 2 bytes
>>> used in HMAT to store it in the table?
>>>    
>> For HMAT spec, for a matrix of bandwidth or latency, there is only one
>> base (in order to save ACPI tables space). We need to extract base for a
>> matrix, but user input bandwidth or latency line by line. So after all
>> data input, we can extract the base (as in 9/11).
>>
>> There is another benefit. If user input different but similar units,
>> such as "10ns" and "100ps", we can also store them. Only If user input
>> big gap units, such as "1ps" and "1000ms". we can't store them and raise
>> error.
> No disagreement here,
> 
> but I suggest to move verification and base calculation from 09/11
> into a separate patch (right after this one) and doing it at
> numa_complete_configuration() time.
> To store calculated base you can add a common_base field to
> sub-table structure (HMAT_LB_Info) and use it when building ACPI
> table without extra calculations.
> 

OK, Thank you for your suggestion.
>>> if this structure should store values in terms on HMAT table it should
>>> probably use uint16_t and check that user provided value won't overflow
>>> at the time of CLI parsing.
>>>    
>>
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 09/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)
  2019-10-11 14:08       ` Igor Mammedov
@ 2019-10-12  3:04         ` Tao Xu
  2019-10-14  9:00           ` Igor Mammedov
  0 siblings, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-10-12  3:04 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, jonathan.cameron,
	Williams, Dan J

On 10/11/2019 10:08 PM, Igor Mammedov wrote:
> On Thu, 10 Oct 2019 14:53:56 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> On 10/3/2019 10:41 PM, Igor Mammedov wrote:
>>> On Fri, 20 Sep 2019 15:43:47 +0800
>>> Tao Xu <tao3.xu@intel.com> wrote:
>>>    
>>>> From: Liu Jingqi <jingqi.liu@intel.com>
>>>>
>>>> This structure describes the memory access latency and bandwidth
>>>> information from various memory access initiator proximity domains.
>>>> The latency and bandwidth numbers represented in this structure
>>>> correspond to rated latency and bandwidth for the platform.
>>>> The software could use this information as hint for optimization.
>>>>
>>>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
>>>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>>>> ---
>>>>
>>>> Changes in v12:
>>>>       - Fix a bug that if HMAT is enabled and without hmat-lb setting,
>>>>         QEMU will crash. (reported by Danmei Wei)
>>>>
>>>> Changes in v11:
>>>>       - Calculate base in build_hmat_lb().
>>>> ---
>>>>    hw/acpi/hmat.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>    hw/acpi/hmat.h |   2 +
>>>>    2 files changed, 127 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
>>>> index 1368fce7ee..e7be849581 100644
>>>> --- a/hw/acpi/hmat.c
>>>> +++ b/hw/acpi/hmat.c
>>>> @@ -27,6 +27,7 @@
>>>>    #include "qemu/osdep.h"
>>>>    #include "sysemu/numa.h"
>>>>    #include "hw/acpi/hmat.h"
>>>> +#include "qemu/error-report.h"
>>>>    
>>>>    /*
>>>>     * ACPI 6.3:
>>>> @@ -67,11 +68,105 @@ static void build_hmat_mpda(GArray *table_data, uint16_t flags, int initiator,
>>>>        build_append_int_noprefix(table_data, 0, 8);
>>>>    }
>>>>    
>>>> +static bool entry_overflow(uint64_t *lb_data, uint64_t base, int len)
>>>> +{
>>>> +    int i;
>>>> +
>>>> +    for (i = 0; i < len; i++) {
>>>> +        if (lb_data[i] / base >= UINT16_MAX) {
>>>> +            return true;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    return false;
>>>> +}
>>> I suggest to do this check at CLI parsing time
>>>    
>>>> +/*
>>>> + * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
>>>> + * Structure: Table 5-146
>>>> + */
>>>> +static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
>>>> +                          uint32_t num_initiator, uint32_t num_target,
>>>> +                          uint32_t *initiator_list, int type)
>>>> +{
>>>> +    uint8_t mask = 0x0f;
>>>> +    uint32_t s = num_initiator;
>>>> +    uint32_t t = num_target;
>>> drop this locals and use arguments directly
>>>    
>>>> +    uint64_t base = 1;
>>>> +    uint64_t *lb_data;
>>>> +    int i, unit;
>>>> +
>>>> +    /* Type */
>>>> +    build_append_int_noprefix(table_data, 1, 2);
>>>> +    /* Reserved */
>>>> +    build_append_int_noprefix(table_data, 0, 2);
>>>> +    /* Length */
>>>> +    build_append_int_noprefix(table_data, 32 + 4 * s + 4 * t + 2 * s * t, 4);
>>>                                                ^^^^
>>> to me above looks like /dev/random output, absolutely unreadable.
>>> Suggest to use local var (like: lb_length) for expression with comments
>>> beside magic numbers.
>>>    
>>>> +    /* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
>>>> +    build_append_int_noprefix(table_data, hmat_lb->hierarchy & mask, 1);
>>>
>>> why do you need to use mask here?
>>>    
>> Because Bits[7:4] Reserved, so I use mask to keep it reserved.
> 
> these bits are not user provided and set to 0, if they get set it's
> programming error and instead of masking problem out QEMU should abort,
> I suggest replace masking with assert(!foo>>x).
> 
>>
>>>> +    /* Data Type */
>>>> +    build_append_int_noprefix(table_data, hmat_lb->data_type, 1);
>>>
>>> Isn't hmat_lb->data_type and passed argument 'type' the same?
>>>    
>> Yes, I will drop 'type'.
>>>    
>>>> +    /* Reserved */
>>>> +    build_append_int_noprefix(table_data, 0, 2);
>>>> +    /* Number of Initiator Proximity Domains (s) */
>>>> +    build_append_int_noprefix(table_data, s, 4);
>>>> +    /* Number of Target Proximity Domains (t) */
>>>> +    build_append_int_noprefix(table_data, t, 4);
>>>> +    /* Reserved */
>>>> +    build_append_int_noprefix(table_data, 0, 4);
>>>> +
>>>> +    if (HMAT_IS_LATENCY(type)) {
>>>> +        unit = 1000;
>>>> +        lb_data = hmat_lb->latency;
>>>> +    } else {
>>>> +        unit = 1024;
>>>> +        lb_data = hmat_lb->bandwidth;
>>>> +    }
>>>> +
>>>> +    while (entry_overflow(lb_data, base, s * t)) {
>>>> +        for (i = 0; i < s * t; i++) {
>>>> +            if (!QEMU_IS_ALIGNED(lb_data[i], unit * base)) {
>>>> +                error_report("Invalid latency/bandwidth input, all "
>>>> +                "latencies/bandwidths should be specified in the same units.");
>>>> +                exit(1);
>>>> +            }
>>>> +        }
>>>> +        base *= unit;
>>>> +    }
>>> Can you clarify what you are trying to check here?
>>>    
>> This part I use entry_overflow() to check if uint16 can store entry. If
>> can't store and the entries matrix can be divisible by unit * base, then
>> base will be unit * base.
>>
>> For example, if lb_data[i] are 1048576(1TB/s) and 1024(1GB/s), unit is
>> 1024, so 1048576 is bigger than UINT16_MAX, and can be divisible by 1024
>> * 1, so base is 1024 and entries are 1024 and 1 (see entry =
>> hmat_lb->latency[i] / base;). The benefit is even user input different
>> unit(TB/s vs GB/s), we can still store the data as far as possible.
> 
> Is it possible instead of doing multiple iterations over lb_data
> until it finds valid base, just go over lb_data once to find MIN/MAX
> and then calculate base using it. Error out with max/min offending
> values if it's not possible to compress the range into uint16_t?
> 

Although we tell user input same unit data, such as use 1GB/s 3GB/s. If 
user input data such as 1048575, 1048576(1TB/s) and 1024(1GB/s), then we 
will get 1024 * (1023 1024 1). I am wondering if it is appropriate 
because we lose a float number(0.999020). But in our codes, it will 
raise error.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 09/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)
  2019-10-12  3:04         ` Tao Xu
@ 2019-10-14  9:00           ` Igor Mammedov
  2019-10-15  0:59             ` Tao Xu
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2019-10-14  9:00 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, jonathan.cameron,
	Williams, Dan J

On Sat, 12 Oct 2019 11:04:03 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> On 10/11/2019 10:08 PM, Igor Mammedov wrote:
> > On Thu, 10 Oct 2019 14:53:56 +0800
> > Tao Xu <tao3.xu@intel.com> wrote:
> >   
> >> On 10/3/2019 10:41 PM, Igor Mammedov wrote:  
> >>> On Fri, 20 Sep 2019 15:43:47 +0800
> >>> Tao Xu <tao3.xu@intel.com> wrote:
> >>>      
> >>>> From: Liu Jingqi <jingqi.liu@intel.com>
> >>>>
> >>>> This structure describes the memory access latency and bandwidth
> >>>> information from various memory access initiator proximity domains.
> >>>> The latency and bandwidth numbers represented in this structure
> >>>> correspond to rated latency and bandwidth for the platform.
> >>>> The software could use this information as hint for optimization.
> >>>>
> >>>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> >>>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> >>>> ---
> >>>>
> >>>> Changes in v12:
> >>>>       - Fix a bug that if HMAT is enabled and without hmat-lb setting,
> >>>>         QEMU will crash. (reported by Danmei Wei)
> >>>>
> >>>> Changes in v11:
> >>>>       - Calculate base in build_hmat_lb().
> >>>> ---
> >>>>    hw/acpi/hmat.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >>>>    hw/acpi/hmat.h |   2 +
> >>>>    2 files changed, 127 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
> >>>> index 1368fce7ee..e7be849581 100644
> >>>> --- a/hw/acpi/hmat.c
> >>>> +++ b/hw/acpi/hmat.c
> >>>> @@ -27,6 +27,7 @@
> >>>>    #include "qemu/osdep.h"
> >>>>    #include "sysemu/numa.h"
> >>>>    #include "hw/acpi/hmat.h"
> >>>> +#include "qemu/error-report.h"
> >>>>    
> >>>>    /*
> >>>>     * ACPI 6.3:
> >>>> @@ -67,11 +68,105 @@ static void build_hmat_mpda(GArray *table_data, uint16_t flags, int initiator,
> >>>>        build_append_int_noprefix(table_data, 0, 8);
> >>>>    }
> >>>>    
> >>>> +static bool entry_overflow(uint64_t *lb_data, uint64_t base, int len)
> >>>> +{
> >>>> +    int i;
> >>>> +
> >>>> +    for (i = 0; i < len; i++) {
> >>>> +        if (lb_data[i] / base >= UINT16_MAX) {
> >>>> +            return true;
> >>>> +        }
> >>>> +    }
> >>>> +
> >>>> +    return false;
> >>>> +}  
> >>> I suggest to do this check at CLI parsing time
> >>>      
> >>>> +/*
> >>>> + * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
> >>>> + * Structure: Table 5-146
> >>>> + */
> >>>> +static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
> >>>> +                          uint32_t num_initiator, uint32_t num_target,
> >>>> +                          uint32_t *initiator_list, int type)
> >>>> +{
> >>>> +    uint8_t mask = 0x0f;
> >>>> +    uint32_t s = num_initiator;
> >>>> +    uint32_t t = num_target;  
> >>> drop this locals and use arguments directly
> >>>      
> >>>> +    uint64_t base = 1;
> >>>> +    uint64_t *lb_data;
> >>>> +    int i, unit;
> >>>> +
> >>>> +    /* Type */
> >>>> +    build_append_int_noprefix(table_data, 1, 2);
> >>>> +    /* Reserved */
> >>>> +    build_append_int_noprefix(table_data, 0, 2);
> >>>> +    /* Length */
> >>>> +    build_append_int_noprefix(table_data, 32 + 4 * s + 4 * t + 2 * s * t, 4);  
> >>>                                                ^^^^
> >>> to me above looks like /dev/random output, absolutely unreadable.
> >>> Suggest to use local var (like: lb_length) for expression with comments
> >>> beside magic numbers.
> >>>      
> >>>> +    /* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
> >>>> +    build_append_int_noprefix(table_data, hmat_lb->hierarchy & mask, 1);  
> >>>
> >>> why do you need to use mask here?
> >>>      
> >> Because Bits[7:4] Reserved, so I use mask to keep it reserved.  
> > 
> > these bits are not user provided and set to 0, if they get set it's
> > programming error and instead of masking problem out QEMU should abort,
> > I suggest replace masking with assert(!foo>>x).
> >   
> >>  
> >>>> +    /* Data Type */
> >>>> +    build_append_int_noprefix(table_data, hmat_lb->data_type, 1);  
> >>>
> >>> Isn't hmat_lb->data_type and passed argument 'type' the same?
> >>>      
> >> Yes, I will drop 'type'.  
> >>>      
> >>>> +    /* Reserved */
> >>>> +    build_append_int_noprefix(table_data, 0, 2);
> >>>> +    /* Number of Initiator Proximity Domains (s) */
> >>>> +    build_append_int_noprefix(table_data, s, 4);
> >>>> +    /* Number of Target Proximity Domains (t) */
> >>>> +    build_append_int_noprefix(table_data, t, 4);
> >>>> +    /* Reserved */
> >>>> +    build_append_int_noprefix(table_data, 0, 4);
> >>>> +
> >>>> +    if (HMAT_IS_LATENCY(type)) {
> >>>> +        unit = 1000;
> >>>> +        lb_data = hmat_lb->latency;
> >>>> +    } else {
> >>>> +        unit = 1024;
> >>>> +        lb_data = hmat_lb->bandwidth;
> >>>> +    }
> >>>> +
> >>>> +    while (entry_overflow(lb_data, base, s * t)) {
> >>>> +        for (i = 0; i < s * t; i++) {
> >>>> +            if (!QEMU_IS_ALIGNED(lb_data[i], unit * base)) {
> >>>> +                error_report("Invalid latency/bandwidth input, all "
> >>>> +                "latencies/bandwidths should be specified in the same units.");
> >>>> +                exit(1);
> >>>> +            }
> >>>> +        }
> >>>> +        base *= unit;
> >>>> +    }  
> >>> Can you clarify what you are trying to check here?
> >>>      
> >> This part I use entry_overflow() to check if uint16 can store entry. If
> >> can't store and the entries matrix can be divisible by unit * base, then
> >> base will be unit * base.
> >>
> >> For example, if lb_data[i] are 1048576(1TB/s) and 1024(1GB/s), unit is
> >> 1024, so 1048576 is bigger than UINT16_MAX, and can be divisible by 1024
> >> * 1, so base is 1024 and entries are 1024 and 1 (see entry =
> >> hmat_lb->latency[i] / base;). The benefit is even user input different
> >> unit(TB/s vs GB/s), we can still store the data as far as possible.  
> > 
> > Is it possible instead of doing multiple iterations over lb_data
> > until it finds valid base, just go over lb_data once to find MIN/MAX
> > and then calculate base using it. Error out with max/min offending
> > values if it's not possible to compress the range into uint16_t?
> >   
> 
> Although we tell user input same unit data, such as use 1GB/s 3GB/s. If 
> user input data such as 1048575, 1048576(1TB/s) and 1024(1GB/s), then we 
> will get 1024 * (1023 1024 1). I am wondering if it is appropriate 
> because we lose a float number(0.999020). But in our codes, it will 
> raise error. 
I do not understand what you are trying to say here, could you rephrase
it, so the problem would be more clear, please?



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 09/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)
  2019-10-14  9:00           ` Igor Mammedov
@ 2019-10-15  0:59             ` Tao Xu
  2019-10-15  5:40               ` Tao Xu
  0 siblings, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-10-15  0:59 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, jonathan.cameron,
	Williams, Dan J

On 10/14/2019 5:00 PM, Igor Mammedov wrote:
> On Sat, 12 Oct 2019 11:04:03 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> On 10/11/2019 10:08 PM, Igor Mammedov wrote:
>>> On Thu, 10 Oct 2019 14:53:56 +0800
>>> Tao Xu <tao3.xu@intel.com> wrote:
>>>    
>>>> On 10/3/2019 10:41 PM, Igor Mammedov wrote:
>>>>> On Fri, 20 Sep 2019 15:43:47 +0800
>>>>> Tao Xu <tao3.xu@intel.com> wrote:
>>>>>       
>>>>>> From: Liu Jingqi <jingqi.liu@intel.com>
>>>>>>
>>>>>> This structure describes the memory access latency and bandwidth
>>>>>> information from various memory access initiator proximity domains.
>>>>>> The latency and bandwidth numbers represented in this structure
>>>>>> correspond to rated latency and bandwidth for the platform.
>>>>>> The software could use this information as hint for optimization.
>>>>>>
>>>>>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
>>>>>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>>>>>> ---
>>>>>>
>>>>>> Changes in v12:
>>>>>>        - Fix a bug that if HMAT is enabled and without hmat-lb setting,
>>>>>>          QEMU will crash. (reported by Danmei Wei)
>>>>>>
>>>>>> Changes in v11:
>>>>>>        - Calculate base in build_hmat_lb().
>>>>>> ---
>>>>>>     hw/acpi/hmat.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>>>     hw/acpi/hmat.h |   2 +
>>>>>>     2 files changed, 127 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
>>>>>> index 1368fce7ee..e7be849581 100644
>>>>>> --- a/hw/acpi/hmat.c
>>>>>> +++ b/hw/acpi/hmat.c
>>>>>> @@ -27,6 +27,7 @@
>>>>>>     #include "qemu/osdep.h"
>>>>>>     #include "sysemu/numa.h"
>>>>>>     #include "hw/acpi/hmat.h"
>>>>>> +#include "qemu/error-report.h"
>>>>>>     
>>>>>>     /*
>>>>>>      * ACPI 6.3:
>>>>>> @@ -67,11 +68,105 @@ static void build_hmat_mpda(GArray *table_data, uint16_t flags, int initiator,
>>>>>>         build_append_int_noprefix(table_data, 0, 8);
>>>>>>     }
>>>>>>     
>>>>>> +static bool entry_overflow(uint64_t *lb_data, uint64_t base, int len)
>>>>>> +{
>>>>>> +    int i;
>>>>>> +
>>>>>> +    for (i = 0; i < len; i++) {
>>>>>> +        if (lb_data[i] / base >= UINT16_MAX) {
>>>>>> +            return true;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    return false;
>>>>>> +}
>>>>> I suggest to do this check at CLI parsing time
>>>>>       
>>>>>> +/*
>>>>>> + * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
>>>>>> + * Structure: Table 5-146
>>>>>> + */
>>>>>> +static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
>>>>>> +                          uint32_t num_initiator, uint32_t num_target,
>>>>>> +                          uint32_t *initiator_list, int type)
>>>>>> +{
>>>>>> +    uint8_t mask = 0x0f;
>>>>>> +    uint32_t s = num_initiator;
>>>>>> +    uint32_t t = num_target;
>>>>> drop this locals and use arguments directly
>>>>>       
>>>>>> +    uint64_t base = 1;
>>>>>> +    uint64_t *lb_data;
>>>>>> +    int i, unit;
>>>>>> +
>>>>>> +    /* Type */
>>>>>> +    build_append_int_noprefix(table_data, 1, 2);
>>>>>> +    /* Reserved */
>>>>>> +    build_append_int_noprefix(table_data, 0, 2);
>>>>>> +    /* Length */
>>>>>> +    build_append_int_noprefix(table_data, 32 + 4 * s + 4 * t + 2 * s * t, 4);
>>>>>                                                 ^^^^
>>>>> to me above looks like /dev/random output, absolutely unreadable.
>>>>> Suggest to use local var (like: lb_length) for expression with comments
>>>>> beside magic numbers.
>>>>>       
>>>>>> +    /* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
>>>>>> +    build_append_int_noprefix(table_data, hmat_lb->hierarchy & mask, 1);
>>>>>
>>>>> why do you need to use mask here?
>>>>>       
>>>> Because Bits[7:4] Reserved, so I use mask to keep it reserved.
>>>
>>> these bits are not user provided and set to 0, if they get set it's
>>> programming error and instead of masking problem out QEMU should abort,
>>> I suggest replace masking with assert(!foo>>x).
>>>    
>>>>   
>>>>>> +    /* Data Type */
>>>>>> +    build_append_int_noprefix(table_data, hmat_lb->data_type, 1);
>>>>>
>>>>> Isn't hmat_lb->data_type and passed argument 'type' the same?
>>>>>       
>>>> Yes, I will drop 'type'.
>>>>>       
>>>>>> +    /* Reserved */
>>>>>> +    build_append_int_noprefix(table_data, 0, 2);
>>>>>> +    /* Number of Initiator Proximity Domains (s) */
>>>>>> +    build_append_int_noprefix(table_data, s, 4);
>>>>>> +    /* Number of Target Proximity Domains (t) */
>>>>>> +    build_append_int_noprefix(table_data, t, 4);
>>>>>> +    /* Reserved */
>>>>>> +    build_append_int_noprefix(table_data, 0, 4);
>>>>>> +
>>>>>> +    if (HMAT_IS_LATENCY(type)) {
>>>>>> +        unit = 1000;
>>>>>> +        lb_data = hmat_lb->latency;
>>>>>> +    } else {
>>>>>> +        unit = 1024;
>>>>>> +        lb_data = hmat_lb->bandwidth;
>>>>>> +    }
>>>>>> +
>>>>>> +    while (entry_overflow(lb_data, base, s * t)) {
>>>>>> +        for (i = 0; i < s * t; i++) {
>>>>>> +            if (!QEMU_IS_ALIGNED(lb_data[i], unit * base)) {
>>>>>> +                error_report("Invalid latency/bandwidth input, all "
>>>>>> +                "latencies/bandwidths should be specified in the same units.");
>>>>>> +                exit(1);
>>>>>> +            }
>>>>>> +        }
>>>>>> +        base *= unit;
>>>>>> +    }
>>>>> Can you clarify what you are trying to check here?
>>>>>       
>>>> This part I use entry_overflow() to check if uint16 can store entry. If
>>>> can't store and the entries matrix can be divisible by unit * base, then
>>>> base will be unit * base.
>>>>
>>>> For example, if lb_data[i] are 1048576(1TB/s) and 1024(1GB/s), unit is
>>>> 1024, so 1048576 is bigger than UINT16_MAX, and can be divisible by 1024
>>>> * 1, so base is 1024 and entries are 1024 and 1 (see entry =
>>>> hmat_lb->latency[i] / base;). The benefit is even user input different
>>>> unit(TB/s vs GB/s), we can still store the data as far as possible.
>>>
>>> Is it possible instead of doing multiple iterations over lb_data
>>> until it finds valid base, just go over lb_data once to find MIN/MAX
>>> and then calculate base using it. Error out with max/min offending
>>> values if it's not possible to compress the range into uint16_t?
>>>    
>>
>> Although we tell user input same unit data, such as use 1GB/s 3GB/s. If
>> user input data such as 1048575, 1048576(1TB/s) and 1024(1GB/s), then we
>> will get 1024 * (1023 1024 1). I am wondering if it is appropriate
>> because we lose a float number(0.999020). But in our codes, it will
>> raise error.
> I do not understand what you are trying to say here, could you rephrase
> it, so the problem would be more clear, please?
> 
Sorry, I mean how we treat the data cannot be divisible if we use 
max/min as base. For another example, If user input the data(including 3 
bandwidths) : 9GB/s 5GB/s 3GB/s. Then max/min result is 3. But entries 
should be uint16, (5GB/s)/3 we can only get 1GB/s, then we should raise 
error(overflow).
But if this patch, we will get the base is 1GB/s.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 09/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)
  2019-10-15  0:59             ` Tao Xu
@ 2019-10-15  5:40               ` Tao Xu
  2019-10-17 14:17                 ` Igor Mammedov
  0 siblings, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-10-15  5:40 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, jonathan.cameron,
	Williams, Dan J

On 10/15/2019 8:59 AM, Tao Xu wrote:
> On 10/14/2019 5:00 PM, Igor Mammedov wrote:
>> On Sat, 12 Oct 2019 11:04:03 +0800
>> Tao Xu <tao3.xu@intel.com> wrote:
>>
>>> On 10/11/2019 10:08 PM, Igor Mammedov wrote:
>>>> On Thu, 10 Oct 2019 14:53:56 +0800
>>>> Tao Xu <tao3.xu@intel.com> wrote:
>>>>> On 10/3/2019 10:41 PM, Igor Mammedov wrote:
>>>>>> On Fri, 20 Sep 2019 15:43:47 +0800
>>>>>> Tao Xu <tao3.xu@intel.com> wrote:
>>>>>>> From: Liu Jingqi <jingqi.liu@intel.com>
>>>>>>>
>>>>>>> This structure describes the memory access latency and bandwidth
>>>>>>> information from various memory access initiator proximity domains.
>>>>>>> The latency and bandwidth numbers represented in this structure
>>>>>>> correspond to rated latency and bandwidth for the platform.
>>>>>>> The software could use this information as hint for optimization.
>>>>>>>
>>>>>>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
>>>>>>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>>>>>>> ---
>>>>>>>
>>>>>>> Changes in v12:
>>>>>>>        - Fix a bug that if HMAT is enabled and without hmat-lb 
>>>>>>> setting,
>>>>>>>          QEMU will crash. (reported by Danmei Wei)
>>>>>>>
>>>>>>> Changes in v11:
>>>>>>>        - Calculate base in build_hmat_lb().
>>>>>>> ---
>>>>>>>     hw/acpi/hmat.c | 126 
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>>>>     hw/acpi/hmat.h |   2 +
>>>>>>>     2 files changed, 127 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
>>>>>>> index 1368fce7ee..e7be849581 100644
>>>>>>> --- a/hw/acpi/hmat.c
>>>>>>> +++ b/hw/acpi/hmat.c
>>>>>>> @@ -27,6 +27,7 @@
>>>>>>>     #include "qemu/osdep.h"
>>>>>>>     #include "sysemu/numa.h"
>>>>>>>     #include "hw/acpi/hmat.h"
>>>>>>> +#include "qemu/error-report.h"
>>>>>>>     /*
>>>>>>>      * ACPI 6.3:
>>>>>>> @@ -67,11 +68,105 @@ static void build_hmat_mpda(GArray 
>>>>>>> *table_data, uint16_t flags, int initiator,
>>>>>>>         build_append_int_noprefix(table_data, 0, 8);
>>>>>>>     }
>>>>>>> +static bool entry_overflow(uint64_t *lb_data, uint64_t base, int 
>>>>>>> len)
>>>>>>> +{
>>>>>>> +    int i;
>>>>>>> +
>>>>>>> +    for (i = 0; i < len; i++) {
>>>>>>> +        if (lb_data[i] / base >= UINT16_MAX) {
>>>>>>> +            return true;
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    return false;
>>>>>>> +}
>>>>>> I suggest to do this check at CLI parsing time
>>>>>>> +/*
>>>>>>> + * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth 
>>>>>>> Information
>>>>>>> + * Structure: Table 5-146
>>>>>>> + */
>>>>>>> +static void build_hmat_lb(GArray *table_data, HMAT_LB_Info 
>>>>>>> *hmat_lb,
>>>>>>> +                          uint32_t num_initiator, uint32_t 
>>>>>>> num_target,
>>>>>>> +                          uint32_t *initiator_list, int type)
>>>>>>> +{
>>>>>>> +    uint8_t mask = 0x0f;
>>>>>>> +    uint32_t s = num_initiator;
>>>>>>> +    uint32_t t = num_target;
>>>>>> drop this locals and use arguments directly
>>>>>>> +    uint64_t base = 1;
>>>>>>> +    uint64_t *lb_data;
>>>>>>> +    int i, unit;
>>>>>>> +
>>>>>>> +    /* Type */
>>>>>>> +    build_append_int_noprefix(table_data, 1, 2);
>>>>>>> +    /* Reserved */
>>>>>>> +    build_append_int_noprefix(table_data, 0, 2);
>>>>>>> +    /* Length */
>>>>>>> +    build_append_int_noprefix(table_data, 32 + 4 * s + 4 * t + 2 
>>>>>>> * s * t, 4);
>>>>>>                                                 ^^^^
>>>>>> to me above looks like /dev/random output, absolutely unreadable.
>>>>>> Suggest to use local var (like: lb_length) for expression with 
>>>>>> comments
>>>>>> beside magic numbers.
>>>>>>> +    /* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
>>>>>>> +    build_append_int_noprefix(table_data, hmat_lb->hierarchy & 
>>>>>>> mask, 1);
>>>>>>
>>>>>> why do you need to use mask here?
>>>>> Because Bits[7:4] Reserved, so I use mask to keep it reserved.
>>>>
>>>> these bits are not user provided and set to 0, if they get set it's
>>>> programming error and instead of masking problem out QEMU should abort,
>>>> I suggest replace masking with assert(!foo>>x).
>>>>>>> +    /* Data Type */
>>>>>>> +    build_append_int_noprefix(table_data, hmat_lb->data_type, 1);
>>>>>>
>>>>>> Isn't hmat_lb->data_type and passed argument 'type' the same?
>>>>> Yes, I will drop 'type'.
>>>>>>> +    /* Reserved */
>>>>>>> +    build_append_int_noprefix(table_data, 0, 2);
>>>>>>> +    /* Number of Initiator Proximity Domains (s) */
>>>>>>> +    build_append_int_noprefix(table_data, s, 4);
>>>>>>> +    /* Number of Target Proximity Domains (t) */
>>>>>>> +    build_append_int_noprefix(table_data, t, 4);
>>>>>>> +    /* Reserved */
>>>>>>> +    build_append_int_noprefix(table_data, 0, 4);
>>>>>>> +
>>>>>>> +    if (HMAT_IS_LATENCY(type)) {
>>>>>>> +        unit = 1000;
>>>>>>> +        lb_data = hmat_lb->latency;
>>>>>>> +    } else {
>>>>>>> +        unit = 1024;
>>>>>>> +        lb_data = hmat_lb->bandwidth;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    while (entry_overflow(lb_data, base, s * t)) {
>>>>>>> +        for (i = 0; i < s * t; i++) {
>>>>>>> +            if (!QEMU_IS_ALIGNED(lb_data[i], unit * base)) {
>>>>>>> +                error_report("Invalid latency/bandwidth input, 
>>>>>>> all "
>>>>>>> +                "latencies/bandwidths should be specified in the 
>>>>>>> same units.");
>>>>>>> +                exit(1);
>>>>>>> +            }
>>>>>>> +        }
>>>>>>> +        base *= unit;
>>>>>>> +    }
>>>>>> Can you clarify what you are trying to check here?
>>>>> This part I use entry_overflow() to check if uint16 can store 
>>>>> entry. If
>>>>> can't store and the entries matrix can be divisible by unit * base, 
>>>>> then
>>>>> base will be unit * base.
>>>>>
>>>>> For example, if lb_data[i] are 1048576(1TB/s) and 1024(1GB/s), unit is
>>>>> 1024, so 1048576 is bigger than UINT16_MAX, and can be divisible by 
>>>>> 1024
>>>>> * 1, so base is 1024 and entries are 1024 and 1 (see entry =
>>>>> hmat_lb->latency[i] / base;). The benefit is even user input different
>>>>> unit(TB/s vs GB/s), we can still store the data as far as possible.
>>>>
>>>> Is it possible instead of doing multiple iterations over lb_data
>>>> until it finds valid base, just go over lb_data once to find MIN/MAX
>>>> and then calculate base using it. Error out with max/min offending
>>>> values if it's not possible to compress the range into uint16_t?
>>>
>>> Although we tell user input same unit data, such as use 1GB/s 3GB/s. If
>>> user input data such as 1048575, 1048576(1TB/s) and 1024(1GB/s), then we
>>> will get 1024 * (1023 1024 1). I am wondering if it is appropriate
>>> because we lose a float number(0.999020). But in our codes, it will
>>> raise error.
>> I do not understand what you are trying to say here, could you rephrase
>> it, so the problem would be more clear, please?
>>
> Sorry, I mean how we treat the data cannot be divisible if we use 
> max/min as base. For another example, If user input the data(including 3 
> bandwidths) : 9GB/s 5GB/s 3GB/s. Then max/min result is 3. But entries 
> should be uint16, (5GB/s)/3 we can only get 1GB/s, then we should raise 
> error(overflow).
> But if this patch, we will get the base is 1GB/s.
I understand the MIN/MAX means, in the case above, we get MAX is 9GB/s, 
MIN is 3GB/s, then I use code below to calculate :

     while (max_data >= UINT16_MAX) {
         if (!QEMU_IS_ALIGNED(max_data, unit * base) ||
             !QEMU_IS_ALIGNED(min_data, unit * base) {
                 error_report("Invalid latency/bandwidth input.");
                 exit(1);
         }
         base *= unit;
     }


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 03/11] qapi: Add builtin type time
  2019-09-20  7:43 ` [PATCH v12 03/11] qapi: Add builtin type time Tao Xu
@ 2019-10-15  6:22   ` Tao Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Tao Xu @ 2019-10-15  6:22 UTC (permalink / raw)
  To: eblake; +Cc: Liu, Jingqi, qemu-devel

Hi Eric,

I am wondering if you could help to review this patch, 1/11, 2/11, 4/11. 
Thanks for your help.

Tao
On 9/20/2019 3:43 PM, Xu, Tao3 wrote:
> Add optional builtin type time, fallback is uint64. This type use
> qemu_strtotime_ps() for pre-converting time suffix to numbers.
> 
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> No changes in v11 and v12.
> 
> New patch in v10.
> ---
>   include/qapi/visitor-impl.h  |  4 ++++
>   include/qapi/visitor.h       |  9 +++++++++
>   qapi/opts-visitor.c          | 22 ++++++++++++++++++++++
>   qapi/qapi-visit-core.c       | 12 ++++++++++++
>   qapi/qobject-input-visitor.c | 18 ++++++++++++++++++
>   qapi/trace-events            |  1 +
>   scripts/qapi/common.py       |  2 ++
>   7 files changed, 68 insertions(+)
> 
> diff --git a/include/qapi/visitor-impl.h b/include/qapi/visitor-impl.h
> index 8ccb3b6c20..e0979563c7 100644
> --- a/include/qapi/visitor-impl.h
> +++ b/include/qapi/visitor-impl.h
> @@ -88,6 +88,10 @@ struct Visitor
>       void (*type_size)(Visitor *v, const char *name, uint64_t *obj,
>                         Error **errp);
>   
> +    /* Optional; fallback is type_uint64() */
> +    void (*type_time)(Visitor *v, const char *name, uint64_t *obj,
> +                      Error **errp);
> +
>       /* Must be set */
>       void (*type_bool)(Visitor *v, const char *name, bool *obj, Error **errp);
>   
> diff --git a/include/qapi/visitor.h b/include/qapi/visitor.h
> index 5b2ed3f202..4c3198b1c5 100644
> --- a/include/qapi/visitor.h
> +++ b/include/qapi/visitor.h
> @@ -554,6 +554,15 @@ void visit_type_int64(Visitor *v, const char *name, int64_t *obj,
>   void visit_type_size(Visitor *v, const char *name, uint64_t *obj,
>                        Error **errp);
>   
> +/*
> + * Visit a uint64_t value.
> + * Like visit_type_uint64(), except that some visitors may choose to
> + * recognize numbers with timeunit suffix, such as "ps", "ns", "us"
> + * "ms" and "s".
> + */
> +void visit_type_time(Visitor *v, const char *name, uint64_t *obj,
> +                     Error **errp);
> +
>   /*
>    * Visit a boolean value.
>    *
> diff --git a/qapi/opts-visitor.c b/qapi/opts-visitor.c
> index 324b197495..d73b2e51a0 100644
> --- a/qapi/opts-visitor.c
> +++ b/qapi/opts-visitor.c
> @@ -508,6 +508,27 @@ opts_type_size(Visitor *v, const char *name, uint64_t *obj, Error **errp)
>       processed(ov, name);
>   }
>   
> +static void
> +opts_type_time(Visitor *v, const char *name, uint64_t *obj, Error **errp)
> +{
> +    OptsVisitor *ov = to_ov(v);
> +    const QemuOpt *opt;
> +    int err;
> +
> +    opt = lookup_scalar(ov, name, errp);
> +    if (!opt) {
> +        return;
> +    }
> +
> +    err = qemu_strtotime_ps(opt->str ? opt->str : "", NULL, obj);
> +    if (err < 0) {
> +        error_setg(errp, QERR_INVALID_PARAMETER_VALUE, opt->name,
> +                   "a time value");
> +        return;
> +    }
> +
> +    processed(ov, name);
> +}
>   
>   static void
>   opts_optional(Visitor *v, const char *name, bool *present)
> @@ -555,6 +576,7 @@ opts_visitor_new(const QemuOpts *opts)
>       ov->visitor.type_int64  = &opts_type_int64;
>       ov->visitor.type_uint64 = &opts_type_uint64;
>       ov->visitor.type_size   = &opts_type_size;
> +    ov->visitor.type_time   = &opts_type_time;
>       ov->visitor.type_bool   = &opts_type_bool;
>       ov->visitor.type_str    = &opts_type_str;
>   
> diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c
> index 5365561b07..ac8896455c 100644
> --- a/qapi/qapi-visit-core.c
> +++ b/qapi/qapi-visit-core.c
> @@ -277,6 +277,18 @@ void visit_type_size(Visitor *v, const char *name, uint64_t *obj,
>       }
>   }
>   
> +void visit_type_time(Visitor *v, const char *name, uint64_t *obj,
> +                     Error **errp)
> +{
> +    assert(obj);
> +    trace_visit_type_time(v, name, obj);
> +    if (v->type_time) {
> +        v->type_time(v, name, obj, errp);
> +    } else {
> +        v->type_uint64(v, name, obj, errp);
> +    }
> +}
> +
>   void visit_type_bool(Visitor *v, const char *name, bool *obj, Error **errp)
>   {
>       assert(obj);
> diff --git a/qapi/qobject-input-visitor.c b/qapi/qobject-input-visitor.c
> index 32236cbcb1..9b66941d8a 100644
> --- a/qapi/qobject-input-visitor.c
> +++ b/qapi/qobject-input-visitor.c
> @@ -627,6 +627,23 @@ static void qobject_input_type_size_keyval(Visitor *v, const char *name,
>       }
>   }
>   
> +static void qobject_input_type_time_keyval(Visitor *v, const char *name,
> +                                           uint64_t *obj, Error **errp)
> +{
> +    QObjectInputVisitor *qiv = to_qiv(v);
> +    const char *str = qobject_input_get_keyval(qiv, name, errp);
> +
> +    if (!str) {
> +        return;
> +    }
> +
> +    if (qemu_strtotime_ps(str, NULL, obj) < 0) {
> +        /* TODO report -ERANGE more nicely */
> +        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
> +                   full_name(qiv, name), "time");
> +    }
> +}
> +
>   static void qobject_input_optional(Visitor *v, const char *name, bool *present)
>   {
>       QObjectInputVisitor *qiv = to_qiv(v);
> @@ -708,6 +725,7 @@ Visitor *qobject_input_visitor_new_keyval(QObject *obj)
>       v->visitor.type_any = qobject_input_type_any;
>       v->visitor.type_null = qobject_input_type_null;
>       v->visitor.type_size = qobject_input_type_size_keyval;
> +    v->visitor.type_time = qobject_input_type_time_keyval;
>       v->keyval = true;
>   
>       return &v->visitor;
> diff --git a/qapi/trace-events b/qapi/trace-events
> index 5eb4afa110..c4605a7ccc 100644
> --- a/qapi/trace-events
> +++ b/qapi/trace-events
> @@ -29,6 +29,7 @@ visit_type_int16(void *v, const char *name, int16_t *obj) "v=%p name=%s obj=%p"
>   visit_type_int32(void *v, const char *name, int32_t *obj) "v=%p name=%s obj=%p"
>   visit_type_int64(void *v, const char *name, int64_t *obj) "v=%p name=%s obj=%p"
>   visit_type_size(void *v, const char *name, uint64_t *obj) "v=%p name=%s obj=%p"
> +visit_type_time(void *v, const char *name, uint64_t *obj) "v=%p name=%s obj=%p"
>   visit_type_bool(void *v, const char *name, bool *obj) "v=%p name=%s obj=%p"
>   visit_type_str(void *v, const char *name, char **obj) "v=%p name=%s obj=%p"
>   visit_type_number(void *v, const char *name, void *obj) "v=%p name=%s obj=%p"
> diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
> index d61bfdc526..3a6f108794 100644
> --- a/scripts/qapi/common.py
> +++ b/scripts/qapi/common.py
> @@ -35,6 +35,7 @@ builtin_types = {
>       'uint32':   'QTYPE_QNUM',
>       'uint64':   'QTYPE_QNUM',
>       'size':     'QTYPE_QNUM',
> +    'time':     'QTYPE_QNUM',
>       'any':      None,           # any QType possible, actually
>       'QType':    'QTYPE_QSTRING',
>   }
> @@ -1834,6 +1835,7 @@ class QAPISchema(object):
>                     ('uint32', 'int',     'uint32_t'),
>                     ('uint64', 'int',     'uint64_t'),
>                     ('size',   'int',     'uint64_t'),
> +                  ('time',   'int',     'uint64_t'),
>                     ('bool',   'boolean', 'bool'),
>                     ('any',    'value',   'QObject' + pointer_suffix),
>                     ('null',   'null',    'QNull' + pointer_suffix)]:
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v12 09/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)
  2019-10-15  5:40               ` Tao Xu
@ 2019-10-17 14:17                 ` Igor Mammedov
  0 siblings, 0 replies; 34+ messages in thread
From: Igor Mammedov @ 2019-10-17 14:17 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, jonathan.cameron,
	Williams, Dan J

On Tue, 15 Oct 2019 13:40:54 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> On 10/15/2019 8:59 AM, Tao Xu wrote:
> > On 10/14/2019 5:00 PM, Igor Mammedov wrote:  
> >> On Sat, 12 Oct 2019 11:04:03 +0800
> >> Tao Xu <tao3.xu@intel.com> wrote:
> >>  
> >>> On 10/11/2019 10:08 PM, Igor Mammedov wrote:  
> >>>> On Thu, 10 Oct 2019 14:53:56 +0800
> >>>> Tao Xu <tao3.xu@intel.com> wrote:  
> >>>>> On 10/3/2019 10:41 PM, Igor Mammedov wrote:  
> >>>>>> On Fri, 20 Sep 2019 15:43:47 +0800
> >>>>>> Tao Xu <tao3.xu@intel.com> wrote:  
> >>>>>>> From: Liu Jingqi <jingqi.liu@intel.com>
> >>>>>>>
> >>>>>>> This structure describes the memory access latency and bandwidth
> >>>>>>> information from various memory access initiator proximity domains.
> >>>>>>> The latency and bandwidth numbers represented in this structure
> >>>>>>> correspond to rated latency and bandwidth for the platform.
> >>>>>>> The software could use this information as hint for optimization.
> >>>>>>>
> >>>>>>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> >>>>>>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> >>>>>>> ---
> >>>>>>>
> >>>>>>> Changes in v12:
> >>>>>>>        - Fix a bug that if HMAT is enabled and without hmat-lb 
> >>>>>>> setting,
> >>>>>>>          QEMU will crash. (reported by Danmei Wei)
> >>>>>>>
> >>>>>>> Changes in v11:
> >>>>>>>        - Calculate base in build_hmat_lb().
> >>>>>>> ---
> >>>>>>>     hw/acpi/hmat.c | 126 
> >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++-
> >>>>>>>     hw/acpi/hmat.h |   2 +
> >>>>>>>     2 files changed, 127 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
> >>>>>>> index 1368fce7ee..e7be849581 100644
> >>>>>>> --- a/hw/acpi/hmat.c
> >>>>>>> +++ b/hw/acpi/hmat.c
> >>>>>>> @@ -27,6 +27,7 @@
> >>>>>>>     #include "qemu/osdep.h"
> >>>>>>>     #include "sysemu/numa.h"
> >>>>>>>     #include "hw/acpi/hmat.h"
> >>>>>>> +#include "qemu/error-report.h"
> >>>>>>>     /*
> >>>>>>>      * ACPI 6.3:
> >>>>>>> @@ -67,11 +68,105 @@ static void build_hmat_mpda(GArray 
> >>>>>>> *table_data, uint16_t flags, int initiator,
> >>>>>>>         build_append_int_noprefix(table_data, 0, 8);
> >>>>>>>     }
> >>>>>>> +static bool entry_overflow(uint64_t *lb_data, uint64_t base, int 
> >>>>>>> len)
> >>>>>>> +{
> >>>>>>> +    int i;
> >>>>>>> +
> >>>>>>> +    for (i = 0; i < len; i++) {
> >>>>>>> +        if (lb_data[i] / base >= UINT16_MAX) {
> >>>>>>> +            return true;
> >>>>>>> +        }
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>> +    return false;
> >>>>>>> +}  
> >>>>>> I suggest to do this check at CLI parsing time  
> >>>>>>> +/*
> >>>>>>> + * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth 
> >>>>>>> Information
> >>>>>>> + * Structure: Table 5-146
> >>>>>>> + */
> >>>>>>> +static void build_hmat_lb(GArray *table_data, HMAT_LB_Info 
> >>>>>>> *hmat_lb,
> >>>>>>> +                          uint32_t num_initiator, uint32_t 
> >>>>>>> num_target,
> >>>>>>> +                          uint32_t *initiator_list, int type)
> >>>>>>> +{
> >>>>>>> +    uint8_t mask = 0x0f;
> >>>>>>> +    uint32_t s = num_initiator;
> >>>>>>> +    uint32_t t = num_target;  
> >>>>>> drop this locals and use arguments directly  
> >>>>>>> +    uint64_t base = 1;
> >>>>>>> +    uint64_t *lb_data;
> >>>>>>> +    int i, unit;
> >>>>>>> +
> >>>>>>> +    /* Type */
> >>>>>>> +    build_append_int_noprefix(table_data, 1, 2);
> >>>>>>> +    /* Reserved */
> >>>>>>> +    build_append_int_noprefix(table_data, 0, 2);
> >>>>>>> +    /* Length */
> >>>>>>> +    build_append_int_noprefix(table_data, 32 + 4 * s + 4 * t + 2 
> >>>>>>> * s * t, 4);  
> >>>>>>                                                 ^^^^
> >>>>>> to me above looks like /dev/random output, absolutely unreadable.
> >>>>>> Suggest to use local var (like: lb_length) for expression with 
> >>>>>> comments
> >>>>>> beside magic numbers.  
> >>>>>>> +    /* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
> >>>>>>> +    build_append_int_noprefix(table_data, hmat_lb->hierarchy & 
> >>>>>>> mask, 1);  
> >>>>>>
> >>>>>> why do you need to use mask here?  
> >>>>> Because Bits[7:4] Reserved, so I use mask to keep it reserved.  
> >>>>
> >>>> these bits are not user provided and set to 0, if they get set it's
> >>>> programming error and instead of masking problem out QEMU should abort,
> >>>> I suggest replace masking with assert(!foo>>x).  
> >>>>>>> +    /* Data Type */
> >>>>>>> +    build_append_int_noprefix(table_data, hmat_lb->data_type, 1);  
> >>>>>>
> >>>>>> Isn't hmat_lb->data_type and passed argument 'type' the same?  
> >>>>> Yes, I will drop 'type'.  
> >>>>>>> +    /* Reserved */
> >>>>>>> +    build_append_int_noprefix(table_data, 0, 2);
> >>>>>>> +    /* Number of Initiator Proximity Domains (s) */
> >>>>>>> +    build_append_int_noprefix(table_data, s, 4);
> >>>>>>> +    /* Number of Target Proximity Domains (t) */
> >>>>>>> +    build_append_int_noprefix(table_data, t, 4);
> >>>>>>> +    /* Reserved */
> >>>>>>> +    build_append_int_noprefix(table_data, 0, 4);
> >>>>>>> +
> >>>>>>> +    if (HMAT_IS_LATENCY(type)) {
> >>>>>>> +        unit = 1000;
> >>>>>>> +        lb_data = hmat_lb->latency;
> >>>>>>> +    } else {
> >>>>>>> +        unit = 1024;
> >>>>>>> +        lb_data = hmat_lb->bandwidth;
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>> +    while (entry_overflow(lb_data, base, s * t)) {
> >>>>>>> +        for (i = 0; i < s * t; i++) {
> >>>>>>> +            if (!QEMU_IS_ALIGNED(lb_data[i], unit * base)) {
> >>>>>>> +                error_report("Invalid latency/bandwidth input, 
> >>>>>>> all "
> >>>>>>> +                "latencies/bandwidths should be specified in the 
> >>>>>>> same units.");
> >>>>>>> +                exit(1);
> >>>>>>> +            }
> >>>>>>> +        }
> >>>>>>> +        base *= unit;
> >>>>>>> +    }  
> >>>>>> Can you clarify what you are trying to check here?  
> >>>>> This part I use entry_overflow() to check if uint16 can store 
> >>>>> entry. If
> >>>>> can't store and the entries matrix can be divisible by unit * base, 
> >>>>> then
> >>>>> base will be unit * base.
> >>>>>
> >>>>> For example, if lb_data[i] are 1048576(1TB/s) and 1024(1GB/s), unit is
> >>>>> 1024, so 1048576 is bigger than UINT16_MAX, and can be divisible by 
> >>>>> 1024
> >>>>> * 1, so base is 1024 and entries are 1024 and 1 (see entry =
> >>>>> hmat_lb->latency[i] / base;). The benefit is even user input different
> >>>>> unit(TB/s vs GB/s), we can still store the data as far as possible.  
> >>>>
> >>>> Is it possible instead of doing multiple iterations over lb_data
> >>>> until it finds valid base, just go over lb_data once to find MIN/MAX
> >>>> and then calculate base using it. Error out with max/min offending
> >>>> values if it's not possible to compress the range into uint16_t?  
> >>>
> >>> Although we tell user input same unit data, such as use 1GB/s 3GB/s. If
> >>> user input data such as 1048575, 1048576(1TB/s) and 1024(1GB/s), then we
> >>> will get 1024 * (1023 1024 1). I am wondering if it is appropriate
> >>> because we lose a float number(0.999020). But in our codes, it will
> >>> raise error.  
> >> I do not understand what you are trying to say here, could you rephrase
> >> it, so the problem would be more clear, please?
> >>  
> > Sorry, I mean how we treat the data cannot be divisible if we use 
> > max/min as base. For another example, If user input the data(including 3 
> > bandwidths) : 9GB/s 5GB/s 3GB/s. Then max/min result is 3. But entries 
> > should be uint16, (5GB/s)/3 we can only get 1GB/s, then we should raise 
> > error(overflow).
> > But if this patch, we will get the base is 1GB/s.  
> I understand the MIN/MAX means, in the case above, we get MAX is 9GB/s, 
> MIN is 3GB/s, then I use code below to calculate :
> 
>      while (max_data >= UINT16_MAX) {
>          if (!QEMU_IS_ALIGNED(max_data, unit * base) ||
>              !QEMU_IS_ALIGNED(min_data, unit * base) {
>                  error_report("Invalid latency/bandwidth input.");
>                  exit(1);
>          }
>          base *= unit;
>      }
this check won't cover, entries in between min and max.
Maybe using range bitmap the time of parsing bandwidth/latency CLI option
would work:

   parse_numa_hmat_lb(...) {
      ...
      if (bw && !ALIGNED(value, 1MB))
          error fatal("should be 1MB aligned")

      sub_table->range_bitmap |= value;

      last_bit = find_last_bit(sub_table->range_bitmap)
      first_bit = find_first_bit(sub_table->range_bitmap)
      if ((last_bit - first_bit) > UINT16_BITS)
          error_fatal("value (%d) should not differ from
                      previously entered values on more that UNINT16_MAX")

      sub_table->base = bit_2_base(first_bit)
      sub_table[x] = value
      ...
   }

it should
  1: error out at the first option which value deviates too
     much from previously parsed options for sub-table
  2: recalculate 'base' value for sub-table


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2019-10-17 15:19 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-20  7:43 [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
2019-09-20  7:43 ` [PATCH v12 01/11] util/cutils: Add qemu_strtotime_ps() Tao Xu
2019-09-20  7:43 ` [PATCH v12 02/11] tests/cutils: Add test for qemu_strtotime_ps() Tao Xu
2019-09-20  7:43 ` [PATCH v12 03/11] qapi: Add builtin type time Tao Xu
2019-10-15  6:22   ` Tao Xu
2019-09-20  7:43 ` [PATCH v12 04/11] tests: Add test for QAPI " Tao Xu
2019-09-20  7:43 ` [PATCH v12 05/11] numa: Extend CLI to provide initiator information for numa nodes Tao Xu
2019-09-30 11:25   ` Igor Mammedov
2019-09-20  7:43 ` [PATCH v12 06/11] numa: Extend CLI to provide memory latency and bandwidth information Tao Xu
2019-10-02 15:16   ` Igor Mammedov
2019-10-09  6:39     ` Tao Xu
2019-10-11 13:56       ` Igor Mammedov
2019-10-12  2:54         ` Tao Xu
2019-09-20  7:43 ` [PATCH v12 07/11] numa: Extend CLI to provide memory side cache information Tao Xu
2019-10-03 11:19   ` Igor Mammedov
2019-10-09  7:54     ` Tao Xu
2019-10-11 14:10       ` Igor Mammedov
2019-09-20  7:43 ` [PATCH v12 08/11] hmat acpi: Build Memory Proximity Domain Attributes Structure(s) Tao Xu
2019-10-03 13:44   ` Igor Mammedov
2019-09-20  7:43 ` [PATCH v12 09/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao Xu
2019-10-03 14:41   ` Igor Mammedov
2019-10-10  6:53     ` Tao Xu
2019-10-11 14:08       ` Igor Mammedov
2019-10-12  3:04         ` Tao Xu
2019-10-14  9:00           ` Igor Mammedov
2019-10-15  0:59             ` Tao Xu
2019-10-15  5:40               ` Tao Xu
2019-10-17 14:17                 ` Igor Mammedov
2019-09-20  7:43 ` [PATCH v12 10/11] hmat acpi: Build Memory Side Cache " Tao Xu
2019-10-04  8:01   ` Igor Mammedov
2019-09-20  7:43 ` [PATCH v12 11/11] tests/bios-tables-test: add test cases for ACPI HMAT Tao Xu
2019-10-04  8:08   ` Igor Mammedov
2019-09-21  1:39 ` [PATCH v12 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) no-reply
2019-09-21  1:53 ` no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.