All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
@ 2019-11-15  7:53 Tao Xu
  2019-11-15  7:53 ` [PATCH v16 01/14] util/cutils: Add Add qemu_strtold and qemu_strtold_finite Tao Xu
                   ` (14 more replies)
  0 siblings, 15 replies; 29+ messages in thread
From: Tao Xu @ 2019-11-15  7:53 UTC (permalink / raw)
  To: mst, imammedo, eblake, ehabkost, marcel.apfelbaum, armbru,
	mdroth, thuth, lvivier
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

This series of patches will build Heterogeneous Memory Attribute Table (HMAT)
according to the command line. The ACPI HMAT describes the memory attributes,
such as memory side cache attributes and bandwidth and latency details,
related to the Memory Proximity Domain.
The software is expected to use HMAT information as hint for optimization.

In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

The V15 patches link:
https://patchwork.kernel.org/cover/11231971/

Changelog:
v16:
    - Add and use qemu_strtold_finite to parse size, support full
      64bit precision, modify related test cases (Eduardo and Markus)
    - Simplify struct HMAT_LB_Info and related code, unify latency
      and bandwidth (Igor)
    - Add cross check with hmat_lb data (Igor)
    - Fields in Cache Attributes are promoted to uint32_t before
      shifting (Igor)
    - Add case for QMP build HMAT (Igor)
v15:
    - Add a new patch to refactor do_strtosz() (Eduardo)
    - Make tests without breaking CI (Michael)
v14:
    - Reuse the codes of do_strtosz to build qemu_strtotime_ns
      (Eduardo)
    - Squash patch v13 01/12 and 02/12 together (Daniel and Eduardo)
    - Drop time unit picosecond (Eric)
    - Use qemu ctz64 and clz64 instead of builtin function
v13:
    - Modify some text description
    - Drop "initiator_valid" field in struct NodeInfo
    - Reuse Garray to store the raw bandwidth and bandwidth data
    - Calculate common base unit using range bitmap
    - Add a patch to alculate hmat latency and bandwidth entry list
    - Drop the total_levels option and use readable cache size
    - Remove the unnecessary head file
    - Use decimal notation with appropriate suffix for cache size
v12:
    - Fix a bug that a memory-only node without initiator setting
      doesn't report error. (reported by Danmei Wei)
    - Fix a bug that if HMAT is enabled and without hmat-lb setting,
      QEMU will crash. (reported by Danmei Wei)
v11:
    - Move numa option patches forward.
    - Add num_initiator in Numa_state to record the number of
    initiators.
    - Simplify struct HMAT_LB_Info, use uint64_t array to store data.
    - Drop hmat_get_base().
    - Calculate base in build_hmat_lb().
v10:
    - Add qemu_strtotime_ps() to convert strings with time suffixes
    to numbers, and add some tests for it.
    - Add qapi buildin type time, and add some tests for it.
    - Add machine oprion properties "-machine hmat=on|off" for enabling
    or disabling HMAT in QEMU.

Liu Jingqi (5):
  numa: Extend CLI to provide memory latency and bandwidth information
  numa: Extend CLI to provide memory side cache information
  hmat acpi: Build Memory Proximity Domain Attributes Structure(s)
  hmat acpi: Build System Locality Latency and Bandwidth Information
    Structure(s)
  hmat acpi: Build Memory Side Cache Information Structure(s)

Tao Xu (9):
  util/cutils: Add Add qemu_strtold and qemu_strtold_finite
  util/cutils: Use qemu_strtold_finite to parse size
  util/cutils: refactor do_strtosz() to support suffixes list
  util/cutils: Add qemu_strtotime_ns()
  qapi: Add builtin type time
  tests: Add test for QAPI builtin type time
  numa: Extend CLI to provide initiator information for numa nodes
  tests/numa: Add case for QMP build HMAT
  tests/bios-tables-test: add test cases for ACPI HMAT

 hw/acpi/Kconfig                       |   7 +-
 hw/acpi/Makefile.objs                 |   1 +
 hw/acpi/hmat.c                        | 269 +++++++++++++++++++++++++
 hw/acpi/hmat.h                        |  42 ++++
 hw/core/machine.c                     |  64 ++++++
 hw/core/numa.c                        | 270 ++++++++++++++++++++++++++
 hw/i386/acpi-build.c                  |   5 +
 include/qapi/visitor-impl.h           |   4 +
 include/qapi/visitor.h                |   8 +
 include/qemu/cutils.h                 |   4 +
 include/sysemu/numa.h                 |  84 ++++++++
 qapi/machine.json                     | 178 ++++++++++++++++-
 qapi/opts-visitor.c                   |  22 +++
 qapi/qapi-visit-core.c                |  12 ++
 qapi/qobject-input-visitor.c          |  18 ++
 qapi/trace-events                     |   1 +
 qemu-options.hx                       |  96 ++++++++-
 scripts/qapi/schema.py                |   1 +
 tests/bios-tables-test-allowed-diff.h |   8 +
 tests/bios-tables-test.c              |  44 +++++
 tests/data/acpi/pc/APIC.acpihmat      |   0
 tests/data/acpi/pc/DSDT.acpihmat      |   0
 tests/data/acpi/pc/HMAT.acpihmat      |   0
 tests/data/acpi/pc/SRAT.acpihmat      |   0
 tests/data/acpi/q35/APIC.acpihmat     |   0
 tests/data/acpi/q35/DSDT.acpihmat     |   0
 tests/data/acpi/q35/HMAT.acpihmat     |   0
 tests/data/acpi/q35/SRAT.acpihmat     |   0
 tests/numa-test.c                     |  51 +++++
 tests/test-cutils.c                   | 214 ++++++++++++++++----
 tests/test-keyval.c                   | 136 +++++++++----
 tests/test-qemu-opts.c                |  39 +---
 tests/test-qobject-input-visitor.c    |  29 +++
 util/cutils.c                         | 147 ++++++++++----
 34 files changed, 1596 insertions(+), 158 deletions(-)
 create mode 100644 hw/acpi/hmat.c
 create mode 100644 hw/acpi/hmat.h
 create mode 100644 tests/data/acpi/pc/APIC.acpihmat
 create mode 100644 tests/data/acpi/pc/DSDT.acpihmat
 create mode 100644 tests/data/acpi/pc/HMAT.acpihmat
 create mode 100644 tests/data/acpi/pc/SRAT.acpihmat
 create mode 100644 tests/data/acpi/q35/APIC.acpihmat
 create mode 100644 tests/data/acpi/q35/DSDT.acpihmat
 create mode 100644 tests/data/acpi/q35/HMAT.acpihmat
 create mode 100644 tests/data/acpi/q35/SRAT.acpihmat

-- 
2.20.1



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v16 01/14] util/cutils: Add Add qemu_strtold and qemu_strtold_finite
  2019-11-15  7:53 [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
@ 2019-11-15  7:53 ` Tao Xu
  2019-11-15  7:53 ` [PATCH v16 02/14] util/cutils: Use qemu_strtold_finite to parse size Tao Xu
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Tao Xu @ 2019-11-15  7:53 UTC (permalink / raw)
  To: mst, imammedo, eblake, ehabkost, marcel.apfelbaum, armbru,
	mdroth, thuth, lvivier
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

Work like qemu_strtod() and qemu_strtold_finite, except store long
double.

Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

New patch in v16.
---
 include/qemu/cutils.h |  3 +++
 util/cutils.c         | 48 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
index b54c847e0f..48cf9bf776 100644
--- a/include/qemu/cutils.h
+++ b/include/qemu/cutils.h
@@ -146,6 +146,9 @@ int qemu_strtou64(const char *nptr, const char **endptr, int base,
                   uint64_t *result);
 int qemu_strtod(const char *nptr, const char **endptr, double *result);
 int qemu_strtod_finite(const char *nptr, const char **endptr, double *result);
+int qemu_strtold(const char *nptr, const char **endptr, long double *result);
+int qemu_strtold_finite(const char *nptr, const char **endptr,
+                        long double *result);
 
 int parse_uint(const char *s, unsigned long long *value, char **endptr,
                int base);
diff --git a/util/cutils.c b/util/cutils.c
index fd591cadf0..5db3b2add5 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -553,7 +553,7 @@ int qemu_strtou64(const char *nptr, const char **endptr, int base,
 
 /**
  * Convert string @nptr to a double.
-  *
+ *
  * This is a wrapper around strtod() that is harder to misuse.
  * Semantics of @nptr and @endptr match strtod() with differences
  * noted below.
@@ -616,6 +616,52 @@ int qemu_strtod_finite(const char *nptr, const char **endptr, double *result)
     return ret;
 }
 
+/*
+ * Convert string @nptr to a long double.
+ *
+ * Works like qemu_strtod(), except it stores long double.
+ */
+int qemu_strtold(const char *nptr, const char **endptr, long double *result)
+{
+    char *ep;
+
+    if (!nptr) {
+        if (endptr) {
+            *endptr = nptr;
+        }
+        return -EINVAL;
+    }
+
+    errno = 0;
+    *result = strtold(nptr, &ep);
+    return check_strtox_error(nptr, ep, endptr, errno);
+}
+
+/*
+ * Convert string @nptr to a finite long double.
+ *
+ * Works like qemu_strtod_finite(), except it stores long double.
+ */
+int qemu_strtold_finite(const char *nptr, const char **endptr,
+                        long double *result)
+{
+    long double tmp;
+    int ret;
+
+    ret = qemu_strtold(nptr, endptr, &tmp);
+    if (!ret && !isfinite(tmp)) {
+        if (endptr) {
+            *endptr = nptr;
+        }
+        ret = -EINVAL;
+    }
+
+    if (ret != -EINVAL) {
+        *result = tmp;
+    }
+    return ret;
+}
+
 /**
  * Searches for the first occurrence of 'c' in 's', and returns a pointer
  * to the trailing null byte if none was found.
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v16 02/14] util/cutils: Use qemu_strtold_finite to parse size
  2019-11-15  7:53 [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
  2019-11-15  7:53 ` [PATCH v16 01/14] util/cutils: Add Add qemu_strtold and qemu_strtold_finite Tao Xu
@ 2019-11-15  7:53 ` Tao Xu
  2019-11-15  7:53 ` [PATCH v16 03/14] util/cutils: refactor do_strtosz() to support suffixes list Tao Xu
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Tao Xu @ 2019-11-15  7:53 UTC (permalink / raw)
  To: mst, imammedo, eblake, ehabkost, marcel.apfelbaum, armbru,
	mdroth, thuth, lvivier
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

Support full 64bit precision, modify related test cases.

Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

New patch in v16
---
 tests/test-cutils.c    | 41 +++++-------------------------------
 tests/test-keyval.c    | 47 +++++++-----------------------------------
 tests/test-qemu-opts.c | 39 +++++++----------------------------
 util/cutils.c          | 13 +++++-------
 4 files changed, 24 insertions(+), 116 deletions(-)

diff --git a/tests/test-cutils.c b/tests/test-cutils.c
index 1aa8351520..465514b85f 100644
--- a/tests/test-cutils.c
+++ b/tests/test-cutils.c
@@ -1970,40 +1970,19 @@ static void test_qemu_strtosz_simple(void)
     g_assert_cmpint(err, ==, 0);
     g_assert_cmpint(res, ==, 12345);
 
-    /* Note: precision is 53 bits since we're parsing with strtod() */
+    /* Note: precision is 64 bits (UINT64_MAX) */
 
-    str = "9007199254740991"; /* 2^53-1 */
+    str = "18446744073709551614"; /* UINT64_MAX - 1 */
     err = qemu_strtosz(str, &endptr, &res);
     g_assert_cmpint(err, ==, 0);
-    g_assert_cmpint(res, ==, 0x1fffffffffffff);
-    g_assert(endptr == str + 16);
-
-    str = "9007199254740992"; /* 2^53 */
-    err = qemu_strtosz(str, &endptr, &res);
-    g_assert_cmpint(err, ==, 0);
-    g_assert_cmpint(res, ==, 0x20000000000000);
-    g_assert(endptr == str + 16);
-
-    str = "9007199254740993"; /* 2^53+1 */
-    err = qemu_strtosz(str, &endptr, &res);
-    g_assert_cmpint(err, ==, 0);
-    g_assert_cmpint(res, ==, 0x20000000000000); /* rounded to 53 bits */
-    g_assert(endptr == str + 16);
-
-    str = "18446744073709549568"; /* 0xfffffffffffff800 (53 msbs set) */
-    err = qemu_strtosz(str, &endptr, &res);
-    g_assert_cmpint(err, ==, 0);
-    g_assert_cmpint(res, ==, 0xfffffffffffff800);
+    g_assert_cmpint(res, ==, 0xfffffffffffffffe);
     g_assert(endptr == str + 20);
 
-    str = "18446744073709550591"; /* 0xfffffffffffffbff */
+    str = "18446744073709551615"; /* UINT64_MAX */
     err = qemu_strtosz(str, &endptr, &res);
     g_assert_cmpint(err, ==, 0);
-    g_assert_cmpint(res, ==, 0xfffffffffffff800); /* rounded to 53 bits */
+    g_assert_cmpint(res, ==, 0xffffffffffffffff);
     g_assert(endptr == str + 20);
-
-    /* 0x7ffffffffffffe00..0x7fffffffffffffff get rounded to
-     * 0x8000000000000000, thus -ERANGE; see test_qemu_strtosz_erange() */
 }
 
 static void test_qemu_strtosz_units(void)
@@ -2145,16 +2124,6 @@ static void test_qemu_strtosz_erange(void)
     g_assert_cmpint(err, ==, -ERANGE);
     g_assert(endptr == str + 2);
 
-    str = "18446744073709550592"; /* 0xfffffffffffffc00 */
-    err = qemu_strtosz(str, &endptr, &res);
-    g_assert_cmpint(err, ==, -ERANGE);
-    g_assert(endptr == str + 20);
-
-    str = "18446744073709551615"; /* 2^64-1 */
-    err = qemu_strtosz(str, &endptr, &res);
-    g_assert_cmpint(err, ==, -ERANGE);
-    g_assert(endptr == str + 20);
-
     str = "18446744073709551616"; /* 2^64 */
     err = qemu_strtosz(str, &endptr, &res);
     g_assert_cmpint(err, ==, -ERANGE);
diff --git a/tests/test-keyval.c b/tests/test-keyval.c
index 09b0ae3c68..fad941fcb8 100644
--- a/tests/test-keyval.c
+++ b/tests/test-keyval.c
@@ -383,59 +383,26 @@ static void test_keyval_visit_size(void)
     visit_end_struct(v, NULL);
     visit_free(v);
 
-    /* Note: precision is 53 bits since we're parsing with strtod() */
+    /* Note: precision is 64 bits (UINT64_MAX) */
 
-    /* Around limit of precision: 2^53-1, 2^53, 2^53+1 */
-    qdict = keyval_parse("sz1=9007199254740991,"
-                         "sz2=9007199254740992,"
-                         "sz3=9007199254740993",
+    /* Around limit of precision: UINT64_MAX - 1, UINT64_MAX */
+    qdict = keyval_parse("sz1=18446744073709551614,"
+                         "sz2=18446744073709551615",
                          NULL, &error_abort);
     v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
     qobject_unref(qdict);
     visit_start_struct(v, NULL, NULL, 0, &error_abort);
     visit_type_size(v, "sz1", &sz, &error_abort);
-    g_assert_cmphex(sz, ==, 0x1fffffffffffff);
+    g_assert_cmphex(sz, ==, 0xfffffffffffffffe);
     visit_type_size(v, "sz2", &sz, &error_abort);
-    g_assert_cmphex(sz, ==, 0x20000000000000);
-    visit_type_size(v, "sz3", &sz, &error_abort);
-    g_assert_cmphex(sz, ==, 0x20000000000000);
-    visit_check_struct(v, &error_abort);
-    visit_end_struct(v, NULL);
-    visit_free(v);
-
-    /* Close to signed upper limit 0x7ffffffffffffc00 (53 msbs set) */
-    qdict = keyval_parse("sz1=9223372036854774784," /* 7ffffffffffffc00 */
-                         "sz2=9223372036854775295", /* 7ffffffffffffdff */
-                         NULL, &error_abort);
-    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
-    qobject_unref(qdict);
-    visit_start_struct(v, NULL, NULL, 0, &error_abort);
-    visit_type_size(v, "sz1", &sz, &error_abort);
-    g_assert_cmphex(sz, ==, 0x7ffffffffffffc00);
-    visit_type_size(v, "sz2", &sz, &error_abort);
-    g_assert_cmphex(sz, ==, 0x7ffffffffffffc00);
-    visit_check_struct(v, &error_abort);
-    visit_end_struct(v, NULL);
-    visit_free(v);
-
-    /* Close to actual upper limit 0xfffffffffffff800 (53 msbs set) */
-    qdict = keyval_parse("sz1=18446744073709549568," /* fffffffffffff800 */
-                         "sz2=18446744073709550591", /* fffffffffffffbff */
-                         NULL, &error_abort);
-    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
-    qobject_unref(qdict);
-    visit_start_struct(v, NULL, NULL, 0, &error_abort);
-    visit_type_size(v, "sz1", &sz, &error_abort);
-    g_assert_cmphex(sz, ==, 0xfffffffffffff800);
-    visit_type_size(v, "sz2", &sz, &error_abort);
-    g_assert_cmphex(sz, ==, 0xfffffffffffff800);
+    g_assert_cmphex(sz, ==, 0xffffffffffffffff);
     visit_check_struct(v, &error_abort);
     visit_end_struct(v, NULL);
     visit_free(v);
 
     /* Beyond limits */
     qdict = keyval_parse("sz1=-1,"
-                         "sz2=18446744073709550592", /* fffffffffffffc00 */
+                         "sz2=18446744073709551616", /* 2^64 */
                          NULL, &error_abort);
     v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
     qobject_unref(qdict);
diff --git a/tests/test-qemu-opts.c b/tests/test-qemu-opts.c
index ef96e84aed..1236bf6b7d 100644
--- a/tests/test-qemu-opts.c
+++ b/tests/test-qemu-opts.c
@@ -650,50 +650,25 @@ static void test_opts_parse_size(void)
     g_assert_cmpuint(opts_count(opts), ==, 1);
     g_assert_cmpuint(qemu_opt_get_size(opts, "size1", 1), ==, 0);
 
-    /* Note: precision is 53 bits since we're parsing with strtod() */
+   /* Note: precision is 64 bits (UINT64_MAX) */
 
-    /* Around limit of precision: 2^53-1, 2^53, 2^54 */
+    /* Around limit of precision: UINT64_MAX - 1, UINT64_MAX */
     opts = qemu_opts_parse(&opts_list_02,
-                           "size1=9007199254740991,"
-                           "size2=9007199254740992,"
-                           "size3=9007199254740993",
-                           false, &error_abort);
-    g_assert_cmpuint(opts_count(opts), ==, 3);
-    g_assert_cmphex(qemu_opt_get_size(opts, "size1", 1),
-                     ==, 0x1fffffffffffff);
-    g_assert_cmphex(qemu_opt_get_size(opts, "size2", 1),
-                     ==, 0x20000000000000);
-    g_assert_cmphex(qemu_opt_get_size(opts, "size3", 1),
-                     ==, 0x20000000000000);
-
-    /* Close to signed upper limit 0x7ffffffffffffc00 (53 msbs set) */
-    opts = qemu_opts_parse(&opts_list_02,
-                           "size1=9223372036854774784," /* 7ffffffffffffc00 */
-                           "size2=9223372036854775295", /* 7ffffffffffffdff */
-                           false, &error_abort);
-    g_assert_cmpuint(opts_count(opts), ==, 2);
-    g_assert_cmphex(qemu_opt_get_size(opts, "size1", 1),
-                     ==, 0x7ffffffffffffc00);
-    g_assert_cmphex(qemu_opt_get_size(opts, "size2", 1),
-                     ==, 0x7ffffffffffffc00);
-
-    /* Close to actual upper limit 0xfffffffffffff800 (53 msbs set) */
-    opts = qemu_opts_parse(&opts_list_02,
-                           "size1=18446744073709549568," /* fffffffffffff800 */
-                           "size2=18446744073709550591", /* fffffffffffffbff */
+                           "size1=18446744073709551614,"
+                           "size2=18446744073709551615",
                            false, &error_abort);
     g_assert_cmpuint(opts_count(opts), ==, 2);
     g_assert_cmphex(qemu_opt_get_size(opts, "size1", 1),
-                     ==, 0xfffffffffffff800);
+                     ==, 0xfffffffffffffffe);
     g_assert_cmphex(qemu_opt_get_size(opts, "size2", 1),
-                     ==, 0xfffffffffffff800);
+                     ==, 0xffffffffffffffff);
 
     /* Beyond limits */
     opts = qemu_opts_parse(&opts_list_02, "size1=-1", false, &err);
     error_free_or_abort(&err);
     g_assert(!opts);
     opts = qemu_opts_parse(&opts_list_02,
-                           "size1=18446744073709550592", /* fffffffffffffc00 */
+                           "size1=18446744073709551616", /* 2^64 */
                            false, &err);
     error_free_or_abort(&err);
     g_assert(!opts);
diff --git a/util/cutils.c b/util/cutils.c
index 5db3b2add5..d94a468954 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -216,13 +216,13 @@ static int do_strtosz(const char *nptr, const char **end,
     const char *endptr;
     unsigned char c;
     int mul_required = 0;
-    double val, mul, integral, fraction;
+    long double val, mul, integral, fraction;
 
-    retval = qemu_strtod_finite(nptr, &endptr, &val);
+    retval = qemu_strtold_finite(nptr, &endptr, &val);
     if (retval) {
         goto out;
     }
-    fraction = modf(val, &integral);
+    fraction = modfl(val, &integral);
     if (fraction != 0) {
         mul_required = 1;
     }
@@ -238,11 +238,8 @@ static int do_strtosz(const char *nptr, const char **end,
         retval = -EINVAL;
         goto out;
     }
-    /*
-     * Values >= 0xfffffffffffffc00 overflow uint64_t after their trip
-     * through double (53 bits of precision).
-     */
-    if ((val * mul >= 0xfffffffffffffc00) || val < 0) {
+    /* Values > UINT64_MAX overflow uint64_t */
+    if ((val * mul > UINT64_MAX) || val < 0) {
         retval = -ERANGE;
         goto out;
     }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v16 03/14] util/cutils: refactor do_strtosz() to support suffixes list
  2019-11-15  7:53 [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
  2019-11-15  7:53 ` [PATCH v16 01/14] util/cutils: Add Add qemu_strtold and qemu_strtold_finite Tao Xu
  2019-11-15  7:53 ` [PATCH v16 02/14] util/cutils: Use qemu_strtold_finite to parse size Tao Xu
@ 2019-11-15  7:53 ` Tao Xu
  2019-11-15 12:11   ` Philippe Mathieu-Daudé
  2019-11-15  7:53 ` [PATCH v16 04/14] util/cutils: Add qemu_strtotime_ns() Tao Xu
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Tao Xu @ 2019-11-15  7:53 UTC (permalink / raw)
  To: mst, imammedo, eblake, ehabkost, marcel.apfelbaum, armbru,
	mdroth, thuth, lvivier
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

Add do_strtomul() to convert string according to different suffixes.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v16.

Changes in v15:
    - Add a new patch to refactor do_strtosz() (Eduardo)
---
 util/cutils.c | 72 ++++++++++++++++++++++++++++++---------------------
 1 file changed, 42 insertions(+), 30 deletions(-)

diff --git a/util/cutils.c b/util/cutils.c
index d94a468954..ffef92338a 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -181,41 +181,37 @@ int fcntl_setfl(int fd, int flag)
 }
 #endif
 
-static int64_t suffix_mul(char suffix, int64_t unit)
+static int64_t suffix_mul(const char *suffixes[], int num_suffix,
+                          const char *endptr, int *offset, int64_t unit)
 {
-    switch (qemu_toupper(suffix)) {
-    case 'B':
-        return 1;
-    case 'K':
-        return unit;
-    case 'M':
-        return unit * unit;
-    case 'G':
-        return unit * unit * unit;
-    case 'T':
-        return unit * unit * unit * unit;
-    case 'P':
-        return unit * unit * unit * unit * unit;
-    case 'E':
-        return unit * unit * unit * unit * unit * unit;
+    int i, suffix_len;
+    int64_t mul = 1;
+
+    for (i = 0; i < num_suffix; i++) {
+        suffix_len = strlen(suffixes[i]);
+        if (g_ascii_strncasecmp(suffixes[i], endptr, suffix_len) == 0) {
+            *offset = suffix_len;
+            return mul;
+    }
+        mul *= unit;
     }
+
     return -1;
 }
 
 /*
- * Convert string to bytes, allowing either B/b for bytes, K/k for KB,
- * M/m for MB, G/g for GB or T/t for TB. End pointer will be returned
- * in *end, if not NULL. Return -ERANGE on overflow, and -EINVAL on
- * other error.
+ * Convert string according to different suffixes. End pointer will be returned
+ * in *end, if not NULL. Return -ERANGE on overflow, and -EINVAL on other error.
  */
-static int do_strtosz(const char *nptr, const char **end,
-                      const char default_suffix, int64_t unit,
+static int do_strtomul(const char *nptr, const char **end,
+                       const char *suffixes[], int num_suffix,
+                       const char *default_suffix, int64_t unit,
                       uint64_t *result)
 {
     int retval;
     const char *endptr;
-    unsigned char c;
     int mul_required = 0;
+    int offset = 0;
     long double val, mul, integral, fraction;
 
     retval = qemu_strtold_finite(nptr, &endptr, &val);
@@ -226,12 +222,12 @@ static int do_strtosz(const char *nptr, const char **end,
     if (fraction != 0) {
         mul_required = 1;
     }
-    c = *endptr;
-    mul = suffix_mul(c, unit);
+
+    mul = suffix_mul(suffixes, num_suffix, endptr, &offset, unit);
     if (mul >= 0) {
-        endptr++;
+        endptr += offset;
     } else {
-        mul = suffix_mul(default_suffix, unit);
+        mul = suffix_mul(suffixes, num_suffix, default_suffix, &offset, unit);
         assert(mul >= 0);
     }
     if (mul == 1 && mul_required) {
@@ -256,19 +252,35 @@ out:
     return retval;
 }
 
+/*
+ * Convert string to bytes, allowing either B/b for bytes, K/k for KB,
+ * M/m for MB, G/g for GB or T/t for TB. End pointer will be returned
+ * in *end, if not NULL. Return -ERANGE on overflow, and -EINVAL on
+ * other error.
+ */
+static int do_strtosz(const char *nptr, const char **end,
+                      const char *default_suffix, int64_t unit,
+                      uint64_t *result)
+{
+    static const char *suffixes[] = { "B", "K", "M", "G", "T", "P", "E" };
+
+    return do_strtomul(nptr, end, suffixes, ARRAY_SIZE(suffixes),
+                       default_suffix, unit, result);
+}
+
 int qemu_strtosz(const char *nptr, const char **end, uint64_t *result)
 {
-    return do_strtosz(nptr, end, 'B', 1024, result);
+    return do_strtosz(nptr, end, "B", 1024, result);
 }
 
 int qemu_strtosz_MiB(const char *nptr, const char **end, uint64_t *result)
 {
-    return do_strtosz(nptr, end, 'M', 1024, result);
+    return do_strtosz(nptr, end, "M", 1024, result);
 }
 
 int qemu_strtosz_metric(const char *nptr, const char **end, uint64_t *result)
 {
-    return do_strtosz(nptr, end, 'B', 1000, result);
+    return do_strtosz(nptr, end, "B", 1000, result);
 }
 
 /**
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v16 04/14] util/cutils: Add qemu_strtotime_ns()
  2019-11-15  7:53 [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (2 preceding siblings ...)
  2019-11-15  7:53 ` [PATCH v16 03/14] util/cutils: refactor do_strtosz() to support suffixes list Tao Xu
@ 2019-11-15  7:53 ` Tao Xu
  2019-11-15  7:53 ` [PATCH v16 05/14] qapi: Add builtin type time Tao Xu
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Tao Xu @ 2019-11-15  7:53 UTC (permalink / raw)
  To: mst, imammedo, eblake, ehabkost, marcel.apfelbaum, armbru,
	mdroth, thuth, lvivier
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

To convert strings with time suffixes to numbers, support time unit are
"ns" for nanosecond, "us" for microsecond, "ms" for millisecond or "s"
for second. Add test for qemu_strtotime_ns, test the input of basic,
time suffixes, float, invaild, trailing and overflow.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v16:
    - Update the test because precision is 64 bits

Changes in v15:
    - Add a new patch to refactor do_strtosz() (Eduardo)
    - use ARRAY_SIZE(suffixes) instead of hardcoding the
      suffixes number (Eduardo)
---
 include/qemu/cutils.h |   1 +
 tests/test-cutils.c   | 173 ++++++++++++++++++++++++++++++++++++++++++
 util/cutils.c         |  14 ++++
 3 files changed, 188 insertions(+)

diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
index 48cf9bf776..befa94f2d4 100644
--- a/include/qemu/cutils.h
+++ b/include/qemu/cutils.h
@@ -185,5 +185,6 @@ int uleb128_decode_small(const uint8_t *in, uint32_t *n);
  * *str1 is <, == or > than *str2.
  */
 int qemu_pstrcmp0(const char **str1, const char **str2);
+int qemu_strtotime_ns(const char *nptr, const char **end, uint64_t *result);
 
 #endif
diff --git a/tests/test-cutils.c b/tests/test-cutils.c
index 465514b85f..0ff1d816f1 100644
--- a/tests/test-cutils.c
+++ b/tests/test-cutils.c
@@ -2148,6 +2148,167 @@ static void test_qemu_strtosz_metric(void)
     g_assert(endptr == str + 6);
 }
 
+static void test_qemu_strtotime_ns_simple(void)
+{
+    const char *str;
+    const char *endptr;
+    int err;
+    uint64_t res = 0xbaadf00d;
+
+    str = "0";
+    err = qemu_strtotime_ns(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 0);
+    g_assert(endptr == str + 1);
+
+    str = "56789";
+    err = qemu_strtotime_ns(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 56789);
+    g_assert(endptr == str + 5);
+
+    err = qemu_strtotime_ns(str, NULL, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 56789);
+
+    /* Note: precision is 64 bits (UINT64_MAX) */
+
+    str = "18446744073709551614"; /* UINT64_MAX - 1 */
+    err = qemu_strtotime_ns(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 0xfffffffffffffffe);
+    g_assert(endptr == str + 20);
+
+    str = "18446744073709551615"; /* UINT64_MAX */
+    err = qemu_strtotime_ns(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 0xffffffffffffffff);
+    g_assert(endptr == str + 20);
+}
+
+static void test_qemu_strtotime_ns_units(void)
+{
+    const char *ns = "1ns";
+    const char *us = "1us";
+    const char *ms = "1ms";
+    const char *s = "1s";
+    int err;
+    const char *endptr;
+    uint64_t res = 0xbaadf00d;
+
+    /* default time unit is ns */
+    err = qemu_strtotime_ns(ns, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 1);
+    g_assert(endptr == ns + 3);
+
+    err = qemu_strtotime_ns(us, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 1000);
+    g_assert(endptr == us + 3);
+
+    err = qemu_strtotime_ns(ms, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 1000000);
+    g_assert(endptr == ms + 3);
+
+    err = qemu_strtotime_ns(s, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 1000000000LL);
+    g_assert(endptr == s + 2);
+}
+
+static void test_qemu_strtotime_ns_float(void)
+{
+    const char *str = "56.789us";
+    int err;
+    const char *endptr;
+    uint64_t res = 0xbaadf00d;
+
+    err = qemu_strtotime_ns(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 56.789 * 1000);
+    g_assert(endptr == str + 8);
+}
+
+static void test_qemu_strtotime_ns_invalid(void)
+{
+    const char *str;
+    const char *endptr;
+    int err;
+    uint64_t res = 0xbaadf00d;
+
+    str = "";
+    err = qemu_strtotime_ns(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+    g_assert(endptr == str);
+
+    str = " \t ";
+    err = qemu_strtotime_ns(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+    g_assert(endptr == str);
+
+    str = "crap";
+    err = qemu_strtotime_ns(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+    g_assert(endptr == str);
+
+    str = "inf";
+    err = qemu_strtotime_ns(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+    g_assert(endptr == str);
+
+    str = "NaN";
+    err = qemu_strtotime_ns(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+    g_assert(endptr == str);
+}
+
+static void test_qemu_strtotime_ns_trailing(void)
+{
+    const char *str;
+    const char *endptr;
+    int err;
+    uint64_t res = 0xbaadf00d;
+
+    str = "123xxx";
+
+    err = qemu_strtotime_ns(str, NULL, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+
+    str = "1msxxx";
+    err = qemu_strtotime_ns(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 1000000);
+    g_assert(endptr == str + 3);
+
+    err = qemu_strtotime_ns(str, NULL, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+}
+
+static void test_qemu_strtotime_ns_erange(void)
+{
+    const char *str;
+    const char *endptr;
+    int err;
+    uint64_t res = 0xbaadf00d;
+
+    str = "-1";
+    err = qemu_strtotime_ns(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -ERANGE);
+    g_assert(endptr == str + 2);
+
+    str = "18446744073709551616"; /* 2^64 */
+    err = qemu_strtotime_ns(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -ERANGE);
+    g_assert(endptr == str + 20);
+
+    str = "200000000000000ms";
+    err = qemu_strtotime_ns(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -ERANGE);
+    g_assert(endptr == str + 17);
+}
+
 int main(int argc, char **argv)
 {
     g_test_init(&argc, &argv, NULL);
@@ -2425,5 +2586,17 @@ int main(int argc, char **argv)
     g_test_add_func("/cutils/strtosz/metric",
                     test_qemu_strtosz_metric);
 
+    g_test_add_func("/cutils/strtotime/simple",
+                    test_qemu_strtotime_ns_simple);
+    g_test_add_func("/cutils/strtotime/units",
+                    test_qemu_strtotime_ns_units);
+    g_test_add_func("/cutils/strtotime/float",
+                    test_qemu_strtotime_ns_float);
+    g_test_add_func("/cutils/strtotime/invalid",
+                    test_qemu_strtotime_ns_invalid);
+    g_test_add_func("/cutils/strtotime/trailing",
+                    test_qemu_strtotime_ns_trailing);
+    g_test_add_func("/cutils/strtotime/erange",
+                    test_qemu_strtotime_ns_erange);
     return g_test_run();
 }
diff --git a/util/cutils.c b/util/cutils.c
index ffef92338a..0a885a0a90 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -283,6 +283,20 @@ int qemu_strtosz_metric(const char *nptr, const char **end, uint64_t *result)
     return do_strtosz(nptr, end, "B", 1000, result);
 }
 
+/*
+ * Convert string to time, support time unit are ns for nanosecond, us for
+ * microsecond, ms for millisecond and s for second. End pointer will be
+ * returned in *end, if not NULL. Return -ERANGE on overflow, and -EINVAL on
+ * other error.
+ */
+int qemu_strtotime_ns(const char *nptr, const char **end, uint64_t *result)
+{
+    static const char *suffixes[] = { "ns", "us", "ms", "s" };
+
+    return do_strtomul(nptr, end, suffixes, ARRAY_SIZE(suffixes), "ns", 1000,
+                       result);
+}
+
 /**
  * Helper function for error checking after strtol() and the like
  */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v16 05/14] qapi: Add builtin type time
  2019-11-15  7:53 [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (3 preceding siblings ...)
  2019-11-15  7:53 ` [PATCH v16 04/14] util/cutils: Add qemu_strtotime_ns() Tao Xu
@ 2019-11-15  7:53 ` Tao Xu
  2019-11-15  7:53 ` [PATCH v16 06/14] tests: Add test for QAPI " Tao Xu
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Tao Xu @ 2019-11-15  7:53 UTC (permalink / raw)
  To: mst, imammedo, eblake, ehabkost, marcel.apfelbaum, armbru,
	mdroth, thuth, lvivier
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

Add optional builtin type time, fallback is uint64. This type use
qemu_strtotime_ns() for pre-converting time suffix to numbers.

Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v16.

Changes in v14:
    - Drop time unit picosecond (Eric)
---
 include/qapi/visitor-impl.h  |  4 ++++
 include/qapi/visitor.h       |  8 ++++++++
 qapi/opts-visitor.c          | 22 ++++++++++++++++++++++
 qapi/qapi-visit-core.c       | 12 ++++++++++++
 qapi/qobject-input-visitor.c | 18 ++++++++++++++++++
 qapi/trace-events            |  1 +
 scripts/qapi/schema.py       |  1 +
 7 files changed, 66 insertions(+)

diff --git a/include/qapi/visitor-impl.h b/include/qapi/visitor-impl.h
index 8ccb3b6c20..e0979563c7 100644
--- a/include/qapi/visitor-impl.h
+++ b/include/qapi/visitor-impl.h
@@ -88,6 +88,10 @@ struct Visitor
     void (*type_size)(Visitor *v, const char *name, uint64_t *obj,
                       Error **errp);
 
+    /* Optional; fallback is type_uint64() */
+    void (*type_time)(Visitor *v, const char *name, uint64_t *obj,
+                      Error **errp);
+
     /* Must be set */
     void (*type_bool)(Visitor *v, const char *name, bool *obj, Error **errp);
 
diff --git a/include/qapi/visitor.h b/include/qapi/visitor.h
index c5b23851a1..22242e706f 100644
--- a/include/qapi/visitor.h
+++ b/include/qapi/visitor.h
@@ -554,6 +554,14 @@ void visit_type_int64(Visitor *v, const char *name, int64_t *obj,
 void visit_type_size(Visitor *v, const char *name, uint64_t *obj,
                      Error **errp);
 
+/*
+ * Visit a uint64_t value.
+ * Like visit_type_uint64(), except that some visitors may choose to
+ * recognize numbers with timeunit suffix, such as "ns", "us" "ms" and "s".
+ */
+void visit_type_time(Visitor *v, const char *name, uint64_t *obj,
+                     Error **errp);
+
 /*
  * Visit a boolean value.
  *
diff --git a/qapi/opts-visitor.c b/qapi/opts-visitor.c
index 5fe0276c1c..59b575f0fc 100644
--- a/qapi/opts-visitor.c
+++ b/qapi/opts-visitor.c
@@ -526,6 +526,27 @@ opts_type_size(Visitor *v, const char *name, uint64_t *obj, Error **errp)
     processed(ov, name);
 }
 
+static void
+opts_type_time(Visitor *v, const char *name, uint64_t *obj, Error **errp)
+{
+    OptsVisitor *ov = to_ov(v);
+    const QemuOpt *opt;
+    int err;
+
+    opt = lookup_scalar(ov, name, errp);
+    if (!opt) {
+        return;
+    }
+
+    err = qemu_strtotime_ns(opt->str ? opt->str : "", NULL, obj);
+    if (err < 0) {
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE, opt->name,
+                   "a time value");
+        return;
+    }
+
+    processed(ov, name);
+}
 
 static void
 opts_optional(Visitor *v, const char *name, bool *present)
@@ -573,6 +594,7 @@ opts_visitor_new(const QemuOpts *opts)
     ov->visitor.type_int64  = &opts_type_int64;
     ov->visitor.type_uint64 = &opts_type_uint64;
     ov->visitor.type_size   = &opts_type_size;
+    ov->visitor.type_time   = &opts_type_time;
     ov->visitor.type_bool   = &opts_type_bool;
     ov->visitor.type_str    = &opts_type_str;
 
diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c
index 5365561b07..ac8896455c 100644
--- a/qapi/qapi-visit-core.c
+++ b/qapi/qapi-visit-core.c
@@ -277,6 +277,18 @@ void visit_type_size(Visitor *v, const char *name, uint64_t *obj,
     }
 }
 
+void visit_type_time(Visitor *v, const char *name, uint64_t *obj,
+                     Error **errp)
+{
+    assert(obj);
+    trace_visit_type_time(v, name, obj);
+    if (v->type_time) {
+        v->type_time(v, name, obj, errp);
+    } else {
+        v->type_uint64(v, name, obj, errp);
+    }
+}
+
 void visit_type_bool(Visitor *v, const char *name, bool *obj, Error **errp)
 {
     assert(obj);
diff --git a/qapi/qobject-input-visitor.c b/qapi/qobject-input-visitor.c
index 32236cbcb1..e476fe0d16 100644
--- a/qapi/qobject-input-visitor.c
+++ b/qapi/qobject-input-visitor.c
@@ -627,6 +627,23 @@ static void qobject_input_type_size_keyval(Visitor *v, const char *name,
     }
 }
 
+static void qobject_input_type_time_keyval(Visitor *v, const char *name,
+                                           uint64_t *obj, Error **errp)
+{
+    QObjectInputVisitor *qiv = to_qiv(v);
+    const char *str = qobject_input_get_keyval(qiv, name, errp);
+
+    if (!str) {
+        return;
+    }
+
+    if (qemu_strtotime_ns(str, NULL, obj) < 0) {
+        /* TODO report -ERANGE more nicely */
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+                   full_name(qiv, name), "time");
+    }
+}
+
 static void qobject_input_optional(Visitor *v, const char *name, bool *present)
 {
     QObjectInputVisitor *qiv = to_qiv(v);
@@ -708,6 +725,7 @@ Visitor *qobject_input_visitor_new_keyval(QObject *obj)
     v->visitor.type_any = qobject_input_type_any;
     v->visitor.type_null = qobject_input_type_null;
     v->visitor.type_size = qobject_input_type_size_keyval;
+    v->visitor.type_time = qobject_input_type_time_keyval;
     v->keyval = true;
 
     return &v->visitor;
diff --git a/qapi/trace-events b/qapi/trace-events
index 5eb4afa110..c4605a7ccc 100644
--- a/qapi/trace-events
+++ b/qapi/trace-events
@@ -29,6 +29,7 @@ visit_type_int16(void *v, const char *name, int16_t *obj) "v=%p name=%s obj=%p"
 visit_type_int32(void *v, const char *name, int32_t *obj) "v=%p name=%s obj=%p"
 visit_type_int64(void *v, const char *name, int64_t *obj) "v=%p name=%s obj=%p"
 visit_type_size(void *v, const char *name, uint64_t *obj) "v=%p name=%s obj=%p"
+visit_type_time(void *v, const char *name, uint64_t *obj) "v=%p name=%s obj=%p"
 visit_type_bool(void *v, const char *name, bool *obj) "v=%p name=%s obj=%p"
 visit_type_str(void *v, const char *name, char **obj) "v=%p name=%s obj=%p"
 visit_type_number(void *v, const char *name, void *obj) "v=%p name=%s obj=%p"
diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
index cf0045f34e..210be64909 100644
--- a/scripts/qapi/schema.py
+++ b/scripts/qapi/schema.py
@@ -855,6 +855,7 @@ class QAPISchema(object):
                   ('uint32', 'int',     'uint32_t'),
                   ('uint64', 'int',     'uint64_t'),
                   ('size',   'int',     'uint64_t'),
+                  ('time',   'int',     'uint64_t'),
                   ('bool',   'boolean', 'bool'),
                   ('any',    'value',   'QObject' + pointer_suffix),
                   ('null',   'null',    'QNull' + pointer_suffix)]:
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v16 06/14] tests: Add test for QAPI builtin type time
  2019-11-15  7:53 [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (4 preceding siblings ...)
  2019-11-15  7:53 ` [PATCH v16 05/14] qapi: Add builtin type time Tao Xu
@ 2019-11-15  7:53 ` Tao Xu
  2019-11-15  7:53 ` [PATCH v16 07/14] numa: Extend CLI to provide initiator information for numa nodes Tao Xu
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Tao Xu @ 2019-11-15  7:53 UTC (permalink / raw)
  To: mst, imammedo, eblake, ehabkost, marcel.apfelbaum, armbru,
	mdroth, thuth, lvivier
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

Add tests for time input such as zero, around limit of precision,
signed upper limit, actual upper limit, beyond limits, time suffixes,
and etc.

Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v16:
    - Update the test cases

Changes in v14:
    - Drop time unit picosecond (Eric)
---
 tests/test-keyval.c                | 89 ++++++++++++++++++++++++++++++
 tests/test-qobject-input-visitor.c | 29 ++++++++++
 2 files changed, 118 insertions(+)

diff --git a/tests/test-keyval.c b/tests/test-keyval.c
index fad941fcb8..39443f7e0c 100644
--- a/tests/test-keyval.c
+++ b/tests/test-keyval.c
@@ -457,6 +457,94 @@ static void test_keyval_visit_size(void)
     visit_free(v);
 }
 
+static void test_keyval_visit_time(void)
+{
+    Error *err = NULL;
+    Visitor *v;
+    QDict *qdict;
+    uint64_t time;
+
+    /* Lower limit zero */
+    qdict = keyval_parse("time1=0", NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &error_abort);
+    g_assert_cmpuint(time, ==, 0);
+    visit_check_struct(v, &error_abort);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Around limit of precision: UINT64_MAX - 1, UINT64_MAX */
+    qdict = keyval_parse("time1=18446744073709551614,"
+                         "time2=18446744073709551615",
+                         NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &error_abort);
+    g_assert_cmphex(time, ==, 0xfffffffffffffffe);
+    visit_type_time(v, "time2", &time, &error_abort);
+    g_assert_cmphex(time, ==, 0xffffffffffffffff);
+    visit_check_struct(v, &error_abort);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Beyond limits */
+    qdict = keyval_parse("time1=-1,"
+                         "time2=18446744073709551616", /* 2^64 */
+                         NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &err);
+    error_free_or_abort(&err);
+    visit_type_time(v, "time2", &time, &err);
+    error_free_or_abort(&err);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Suffixes */
+    qdict = keyval_parse("time1=2ns,time2=3.4us,time3=5ms,time4=600s",
+                         NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &error_abort);
+    g_assert_cmpuint(time, ==, 2);
+    visit_type_time(v, "time2", &time, &error_abort);
+    g_assert_cmpuint(time, ==, 3400);
+    visit_type_time(v, "time3", &time, &error_abort);
+    g_assert_cmphex(time, ==, 5 * 1000 * 1000);
+    visit_type_time(v, "time4", &time, &error_abort);
+    g_assert_cmphex(time, ==, 600 * 1000000000LL);
+    visit_check_struct(v, &error_abort);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Beyond limit with suffix */
+    qdict = keyval_parse("time1=1844674407370955s", NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &err);
+    error_free_or_abort(&err);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Trailing crap */
+    qdict = keyval_parse("time1=89ks,time2=ns", NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &err);
+    error_free_or_abort(&err);
+    visit_type_time(v, "time2", &time, &err);;
+    error_free_or_abort(&err);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+}
+
 static void test_keyval_visit_dict(void)
 {
     Error *err = NULL;
@@ -645,6 +733,7 @@ int main(int argc, char *argv[])
     g_test_add_func("/keyval/visit/bool", test_keyval_visit_bool);
     g_test_add_func("/keyval/visit/number", test_keyval_visit_number);
     g_test_add_func("/keyval/visit/size", test_keyval_visit_size);
+    g_test_add_func("/keyval/visit/time", test_keyval_visit_time);
     g_test_add_func("/keyval/visit/dict", test_keyval_visit_dict);
     g_test_add_func("/keyval/visit/list", test_keyval_visit_list);
     g_test_add_func("/keyval/visit/optional", test_keyval_visit_optional);
diff --git a/tests/test-qobject-input-visitor.c b/tests/test-qobject-input-visitor.c
index 6bacabf063..55138042b8 100644
--- a/tests/test-qobject-input-visitor.c
+++ b/tests/test-qobject-input-visitor.c
@@ -366,6 +366,31 @@ static void test_visitor_in_size_str_fail(TestInputVisitorData *data,
     error_free_or_abort(&err);
 }
 
+static void test_visitor_in_time_str_keyval(TestInputVisitorData *data,
+                                            const void *unused)
+{
+    uint64_t res, value = 265 * 1000 * 1000;
+    Visitor *v;
+
+    v = visitor_input_test_init_full(data, true, "\"265ms\"");
+
+    visit_type_time(v, NULL, &res, &error_abort);
+    g_assert_cmpfloat(res, ==, value);
+}
+
+static void test_visitor_in_time_str_fail(TestInputVisitorData *data,
+                                          const void *unused)
+{
+    uint64_t res = 0;
+    Visitor *v;
+    Error *err = NULL;
+
+    v = visitor_input_test_init(data, "\"265ms\"");
+
+    visit_type_time(v, NULL, &res, &err);
+    error_free_or_abort(&err);
+}
+
 static void test_visitor_in_string(TestInputVisitorData *data,
                                    const void *unused)
 {
@@ -1311,6 +1336,10 @@ int main(int argc, char **argv)
                            NULL, test_visitor_in_size_str_keyval);
     input_visitor_test_add("/visitor/input/size_str_fail",
                            NULL, test_visitor_in_size_str_fail);
+    input_visitor_test_add("/visitor/input/time_str_keyval",
+                           NULL, test_visitor_in_time_str_keyval);
+    input_visitor_test_add("/visitor/input/time_str_fail",
+                           NULL, test_visitor_in_time_str_fail);
     input_visitor_test_add("/visitor/input/string",
                            NULL, test_visitor_in_string);
     input_visitor_test_add("/visitor/input/enum",
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v16 07/14] numa: Extend CLI to provide initiator information for numa nodes
  2019-11-15  7:53 [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (5 preceding siblings ...)
  2019-11-15  7:53 ` [PATCH v16 06/14] tests: Add test for QAPI " Tao Xu
@ 2019-11-15  7:53 ` Tao Xu
  2019-11-15  7:53 ` [PATCH v16 08/14] numa: Extend CLI to provide memory latency and bandwidth information Tao Xu
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Tao Xu @ 2019-11-15  7:53 UTC (permalink / raw)
  To: mst, imammedo, eblake, ehabkost, marcel.apfelbaum, armbru,
	mdroth, thuth, lvivier
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron, Dan Williams

In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
The initiator represents processor which access to memory. And in 5.2.27.3
Memory Proximity Domain Attributes Structure, the attached initiator is
defined as where the memory controller responsible for a memory proximity
domain. With attached initiator information, the topology of heterogeneous
memory can be described.

Extend CLI of "-numa node" option to indicate the initiator numa node-id.
In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v16.

Changes in v15:
    - Change the QAPI version tag to 5.0 (Eric)
---
 hw/core/machine.c     | 64 +++++++++++++++++++++++++++++++++++++++++++
 hw/core/numa.c        | 23 ++++++++++++++++
 include/sysemu/numa.h |  5 ++++
 qapi/machine.json     | 10 ++++++-
 qemu-options.hx       | 35 +++++++++++++++++++----
 5 files changed, 131 insertions(+), 6 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 1689ad3bf8..d7d2cfa66d 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -518,6 +518,20 @@ static void machine_set_nvdimm(Object *obj, bool value, Error **errp)
     ms->nvdimms_state->is_enabled = value;
 }
 
+static bool machine_get_hmat(Object *obj, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    return ms->numa_state->hmat_enabled;
+}
+
+static void machine_set_hmat(Object *obj, bool value, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    ms->numa_state->hmat_enabled = value;
+}
+
 static char *machine_get_nvdimm_persistence(Object *obj, Error **errp)
 {
     MachineState *ms = MACHINE(obj);
@@ -645,6 +659,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
                                const CpuInstanceProperties *props, Error **errp)
 {
     MachineClass *mc = MACHINE_GET_CLASS(machine);
+    NodeInfo *numa_info = machine->numa_state->nodes;
     bool match = false;
     int i;
 
@@ -714,6 +729,17 @@ void machine_set_cpu_numa_node(MachineState *machine,
         match = true;
         slot->props.node_id = props->node_id;
         slot->props.has_node_id = props->has_node_id;
+
+        if (machine->numa_state->hmat_enabled) {
+            if ((numa_info[props->node_id].initiator < MAX_NODES) &&
+                (props->node_id != numa_info[props->node_id].initiator)) {
+                error_setg(errp, "The initiator of CPU NUMA node %" PRId64
+                        " should be itself", props->node_id);
+                return;
+            }
+            numa_info[props->node_id].has_cpu = true;
+            numa_info[props->node_id].initiator = props->node_id;
+        }
     }
 
     if (!match) {
@@ -960,6 +986,13 @@ static void machine_initfn(Object *obj)
 
     if (mc->numa_mem_supported) {
         ms->numa_state = g_new0(NumaState, 1);
+        object_property_add_bool(obj, "hmat",
+                                 machine_get_hmat, machine_set_hmat,
+                                 &error_abort);
+        object_property_set_description(obj, "hmat",
+                                        "Set on/off to enable/disable "
+                                        "ACPI Heterogeneous Memory Attribute "
+                                        "Table (HMAT)", NULL);
     }
 
     /* Register notifier when init is done for sysbus sanity checks */
@@ -1048,6 +1081,32 @@ static char *cpu_slot_to_string(const CPUArchId *cpu)
     return g_string_free(s, false);
 }
 
+static void numa_validate_initiator(NumaState *numa_state)
+{
+    int i;
+    NodeInfo *numa_info = numa_state->nodes;
+
+    for (i = 0; i < numa_state->num_nodes; i++) {
+        if (numa_info[i].initiator == MAX_NODES) {
+            error_report("The initiator of NUMA node %d is missing, use "
+                         "'-numa node,initiator' option to declare it", i);
+            exit(1);
+        }
+
+        if (!numa_info[numa_info[i].initiator].present) {
+            error_report("NUMA node %" PRIu16 " is missing, use "
+                         "'-numa node' option to declare it first",
+                         numa_info[i].initiator);
+            exit(1);
+        }
+
+        if (!numa_info[numa_info[i].initiator].has_cpu) {
+            error_report("The initiator of NUMA node %d is invalid", i);
+            exit(1);
+        }
+    }
+}
+
 static void machine_numa_finish_cpu_init(MachineState *machine)
 {
     int i;
@@ -1088,6 +1147,11 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
             machine_set_cpu_numa_node(machine, &props, &error_fatal);
         }
     }
+
+    if (machine->numa_state->hmat_enabled) {
+        numa_validate_initiator(machine->numa_state);
+    }
+
     if (s->len && !qtest_enabled()) {
         warn_report("CPU(s) not present in any NUMA nodes: %s",
                     s->str);
diff --git a/hw/core/numa.c b/hw/core/numa.c
index e3332a984f..e60da99293 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -133,6 +133,29 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
         numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
         numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
     }
+
+    /*
+     * If not set the initiator, set it to MAX_NODES. And if
+     * HMAT is enabled and this node has no cpus, QEMU will raise error.
+     */
+    numa_info[nodenr].initiator = MAX_NODES;
+    if (node->has_initiator) {
+        if (!ms->numa_state->hmat_enabled) {
+            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
+                       "(HMAT) is disabled, enable it with -machine hmat=on "
+                       "before using any of hmat specific options");
+            return;
+        }
+
+        if (node->initiator >= MAX_NODES) {
+            error_report("The initiator id %" PRIu16 " expects an integer "
+                         "between 0 and %d", node->initiator,
+                         MAX_NODES - 1);
+            return;
+        }
+
+        numa_info[nodenr].initiator = node->initiator;
+    }
     numa_info[nodenr].present = true;
     max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
     ms->numa_state->num_nodes++;
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index ae9c41d02b..788cbec7a2 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -18,6 +18,8 @@ struct NodeInfo {
     uint64_t node_mem;
     struct HostMemoryBackend *node_memdev;
     bool present;
+    bool has_cpu;
+    uint16_t initiator;
     uint8_t distance[MAX_NODES];
 };
 
@@ -33,6 +35,9 @@ struct NumaState {
     /* Allow setting NUMA distance for different NUMA nodes */
     bool have_numa_distance;
 
+    /* Detect if HMAT support is enabled. */
+    bool hmat_enabled;
+
     /* NUMA nodes information */
     NodeInfo nodes[MAX_NODES];
 };
diff --git a/qapi/machine.json b/qapi/machine.json
index ca26779f1a..27d0e37534 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -463,6 +463,13 @@
 # @memdev: memory backend object.  If specified for one node,
 #          it must be specified for all nodes.
 #
+# @initiator: defined in ACPI 6.3 Chapter 5.2.27.3 Table 5-145,
+#             points to the nodeid which has the memory controller
+#             responsible for this NUMA node. This field provides
+#             additional information as to the initiator node that
+#             is closest (as in directly attached) to this node, and
+#             therefore has the best performance (since 5.0)
+#
 # Since: 2.1
 ##
 { 'struct': 'NumaNodeOptions',
@@ -470,7 +477,8 @@
    '*nodeid': 'uint16',
    '*cpus':   ['uint16'],
    '*mem':    'size',
-   '*memdev': 'str' }}
+   '*memdev': 'str',
+   '*initiator': 'uint16' }}
 
 ##
 # @NumaDistOptions:
diff --git a/qemu-options.hx b/qemu-options.hx
index 65c9473b73..63f6b33322 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -43,7 +43,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "                suppress-vmdesc=on|off disables self-describing migration (default=off)\n"
     "                nvdimm=on|off controls NVDIMM support (default=off)\n"
     "                enforce-config-section=on|off enforce configuration section migration (default=off)\n"
-    "                memory-encryption=@var{} memory encryption object to use (default=none)\n",
+    "                memory-encryption=@var{} memory encryption object to use (default=none)\n"
+    "                hmat=on|off controls ACPI HMAT support (default=off)\n",
     QEMU_ARCH_ALL)
 STEXI
 @item -machine [type=]@var{name}[,prop=@var{value}[,...]]
@@ -103,6 +104,9 @@ NOTE: this parameter is deprecated. Please use @option{-global}
 @option{migration.send-configuration}=@var{on|off} instead.
 @item memory-encryption=@var{}
 Memory encryption object to use. The default is none.
+@item hmat=on|off
+Enables or disables ACPI Heterogeneous Memory Attribute Table (HMAT) support.
+The default is off.
 @end table
 ETEXI
 
@@ -161,14 +165,14 @@ If any on the three values is given, the total number of CPUs @var{n} can be omi
 ETEXI
 
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
-    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
-    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
+    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
+    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa dist,src=source,dst=destination,val=distance\n"
     "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
     QEMU_ARCH_ALL)
 STEXI
-@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
-@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
+@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
+@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
 @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
 @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
 @findex -numa
@@ -215,6 +219,27 @@ split equally between them.
 @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore,
 if one node uses @samp{memdev}, all of them have to use it.
 
+@samp{initiator} is an additional option that points to an @var{initiator}
+NUMA node that has best performance (the lowest latency or largest bandwidth)
+to this NUMA @var{node}. Note that this option can be set only when
+the machine property 'hmat' is set to 'on'.
+
+Following example creates a machine with 2 NUMA nodes, node 0 has CPU.
+node 1 has only memory, and its initiator is node 0. Note that because
+node 0 has CPU, by default the initiator of node 0 is itself and must be
+itself.
+@example
+-machine hmat=on \
+-m 2G,slots=2,maxmem=4G \
+-object memory-backend-ram,size=1G,id=m0 \
+-object memory-backend-ram,size=1G,id=m1 \
+-numa node,nodeid=0,memdev=m0 \
+-numa node,nodeid=1,memdev=m1,initiator=0 \
+-smp 2,sockets=2,maxcpus=2  \
+-numa cpu,node-id=0,socket-id=0 \
+-numa cpu,node-id=0,socket-id=1
+@end example
+
 @var{source} and @var{destination} are NUMA node IDs.
 @var{distance} is the NUMA distance from @var{source} to @var{destination}.
 The distance from a node to itself is always 10. If any pair of nodes is
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v16 08/14] numa: Extend CLI to provide memory latency and bandwidth information
  2019-11-15  7:53 [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (6 preceding siblings ...)
  2019-11-15  7:53 ` [PATCH v16 07/14] numa: Extend CLI to provide initiator information for numa nodes Tao Xu
@ 2019-11-15  7:53 ` Tao Xu
  2019-11-19 11:03   ` Igor Mammedov
  2019-11-15  7:53 ` [PATCH v16 09/14] numa: Extend CLI to provide memory side cache information Tao Xu
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Tao Xu @ 2019-11-15  7:53 UTC (permalink / raw)
  To: mst, imammedo, eblake, ehabkost, marcel.apfelbaum, armbru,
	mdroth, thuth, lvivier
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

From: Liu Jingqi <jingqi.liu@intel.com>

Add -numa hmat-lb option to provide System Locality Latency and
Bandwidth Information. These memory attributes help to build
System Locality Latency and Bandwidth Information Structure(s)
in ACPI Heterogeneous Memory Attribute Table (HMAT).

Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v16:
    - Initialize HMAT_LB_Data lb_data (Igor)
    - Remove punctuation from error_setg (Igor)
    - Correct some description (Igor)
    - Drop statement about max value (Igor)
    - Simplify struct HMAT_LB_Info and related code, unify latency
      and bandwidth (Igor)

Changes in v15:
    - Change the QAPI version tag to 5.0 (Eric)

Changes in v14:
    - Use qemu ctz64 and clz64 instead of builtin function
    - Improve help message in qemu-options.hx
---
 hw/core/numa.c        | 174 ++++++++++++++++++++++++++++++++++++++++++
 include/sysemu/numa.h |  53 +++++++++++++
 qapi/machine.json     |  94 ++++++++++++++++++++++-
 qemu-options.hx       |  49 +++++++++++-
 4 files changed, 367 insertions(+), 3 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index e60da99293..47c7a96863 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/units.h"
 #include "sysemu/hostmem.h"
 #include "sysemu/numa.h"
 #include "sysemu/sysemu.h"
@@ -198,6 +199,166 @@ void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
     ms->numa_state->have_numa_distance = true;
 }
 
+void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
+                        Error **errp)
+{
+    int i, first_bit, last_bit;
+    uint64_t max_entry, temp_base_la;
+    NodeInfo *numa_info = numa_state->nodes;
+    HMAT_LB_Info *hmat_lb =
+        numa_state->hmat_lb[node->hierarchy][node->data_type];
+    HMAT_LB_Data lb_data = {};
+    HMAT_LB_Data *lb_temp;
+
+    /* Error checking */
+    if (node->initiator > numa_state->num_nodes) {
+        error_setg(errp, "Invalid initiator=%d, it should be less than %d",
+                   node->initiator, numa_state->num_nodes);
+        return;
+    }
+    if (node->target > numa_state->num_nodes) {
+        error_setg(errp, "Invalid target=%d, it should be less than %d",
+                   node->target, numa_state->num_nodes);
+        return;
+    }
+    if (!numa_info[node->initiator].has_cpu) {
+        error_setg(errp, "Invalid initiator=%d, it isn't an "
+                   "initiator proximity domain", node->initiator);
+        return;
+    }
+    if (!numa_info[node->target].present) {
+        error_setg(errp, "The target=%d should point to an existing node",
+                   node->target);
+        return;
+    }
+
+    if (!hmat_lb) {
+        hmat_lb = g_malloc0(sizeof(*hmat_lb));
+        numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
+        hmat_lb->list = g_array_new(false, true, sizeof(HMAT_LB_Data));
+    }
+    hmat_lb->hierarchy = node->hierarchy;
+    hmat_lb->data_type = node->data_type;
+    lb_data.initiator = node->initiator;
+    lb_data.target = node->target;
+
+    if (node->data_type <= HMATLB_DATA_TYPE_WRITE_LATENCY) {
+        /* Input latency data */
+
+        if (!node->has_latency) {
+            error_setg(errp, "Missing 'latency' option");
+            return;
+        }
+        if (node->has_bandwidth) {
+            error_setg(errp, "Invalid option 'bandwidth' since "
+                       "the data type is latency");
+            return;
+        }
+
+        /* Detect duplicate configuration */
+        for (i = 0; i < hmat_lb->list->len; i++) {
+            lb_temp = &g_array_index(hmat_lb->list, HMAT_LB_Data, i);
+
+            if (node->initiator == lb_temp->initiator &&
+                node->target == lb_temp->target) {
+                error_setg(errp, "Duplicate configuration of the latency for "
+                    "initiator=%d and target=%d", node->initiator,
+                    node->target);
+                return;
+            }
+        }
+
+        hmat_lb->base = hmat_lb->base ? hmat_lb->base : UINT64_MAX;
+
+        /* Calculate the temporary base and compressed latency */
+        max_entry = node->latency;
+        temp_base_la = 1;
+        while (QEMU_IS_ALIGNED(max_entry, 10)) {
+            max_entry /= 10;
+            temp_base_la *= 10;
+        }
+
+        /* Calculate the max compressed latency */
+        hmat_lb->base = MIN(hmat_lb->base, temp_base_la);
+        max_entry = node->latency / hmat_lb->base;
+        hmat_lb->range_bitmap = MAX(hmat_lb->range_bitmap, max_entry);
+
+        /*
+         * For latency hmat_lb->range_bitmap record the max compressed latency
+         * which should be less than 0xFFFF (UINT16_MAX)
+         */
+        if (hmat_lb->range_bitmap >= UINT16_MAX) {
+            error_setg(errp, "Latency %" PRIu64 " between initiator=%d and "
+                       "target=%d should not differ from previously entered "
+                       "min or max values on more than %d", node->latency,
+                       node->initiator, node->target, UINT16_MAX - 1);
+            return;
+        }
+
+        /* Set lb_info_provided bit 0 as 1, latency information is provided */
+        numa_info[node->target].lb_info_provided |= BIT(0);
+        lb_data.data = node->latency;
+    } else if (node->data_type >= HMATLB_DATA_TYPE_ACCESS_BANDWIDTH) {
+        /* Input bandwidth data */
+
+        if (!node->has_bandwidth) {
+            error_setg(errp, "Missing 'bandwidth' option");
+            return;
+        }
+        if (node->has_latency) {
+            error_setg(errp, "Invalid option 'latency' since "
+                       "the data type is bandwidth");
+            return;
+        }
+        if (!QEMU_IS_ALIGNED(node->bandwidth, MiB)) {
+            error_setg(errp, "Bandwidth %" PRIu64 " between initiator=%d and "
+                       "target=%d should be 1MB aligned", node->bandwidth,
+                       node->initiator, node->target);
+            return;
+        }
+
+        /* Detect duplicate configuration */
+        for (i = 0; i < hmat_lb->list->len; i++) {
+            lb_temp = &g_array_index(hmat_lb->list, HMAT_LB_Data, i);
+
+            if (node->initiator == lb_temp->initiator &&
+                node->target == lb_temp->target) {
+                error_setg(errp, "Duplicate configuration of the bandwidth for "
+                    "initiator=%d and target=%d", node->initiator,
+                    node->target);
+                return;
+            }
+        }
+
+        hmat_lb->range_bitmap |= node->bandwidth;
+        first_bit = ctz64(hmat_lb->range_bitmap);
+        hmat_lb->base = UINT64_C(1) << first_bit;
+        max_entry = node->bandwidth / hmat_lb->base;
+        last_bit = 64 - clz64(hmat_lb->range_bitmap);
+
+        /*
+         * For bandwidth, first_bit record the base unit of bandwidth bits,
+         * last_bit record the last bit of the max bandwidth. The max compressed
+         * bandwidth should be less than 0xFFFF (UINT16_MAX)
+         */
+        if ((last_bit - first_bit) > UINT16_BITS || max_entry >= UINT16_MAX) {
+            error_setg(errp, "Bandwidth %" PRIu64 " between initiator=%d and "
+                       "target=%d should not differ from previously entered "
+                       "values on more than %d", node->bandwidth,
+                       node->initiator, node->target, UINT16_MAX - 1);
+            return;
+        }
+
+        /* Set lb_info_provided bit 1 as 1, bandwidth information is provided */
+        numa_info[node->target].lb_info_provided |= BIT(1);
+        lb_data.data = node->bandwidth;
+    } else {
+        assert(0);
+    }
+
+    g_array_append_val(hmat_lb->list, lb_data);
+}
+
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
 {
     Error *err = NULL;
@@ -236,6 +397,19 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
         machine_set_cpu_numa_node(ms, qapi_NumaCpuOptions_base(&object->u.cpu),
                                   &err);
         break;
+    case NUMA_OPTIONS_TYPE_HMAT_LB:
+        if (!ms->numa_state->hmat_enabled) {
+            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
+                       "(HMAT) is disabled, enable it with -machine hmat=on "
+                       "before using any of hmat specific options");
+            return;
+        }
+
+        parse_numa_hmat_lb(ms->numa_state, &object->u.hmat_lb, &err);
+        if (err) {
+            goto end;
+        }
+        break;
     default:
         abort();
     }
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 788cbec7a2..70f93c83d7 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -14,11 +14,34 @@ struct CPUArchId;
 #define NUMA_DISTANCE_MAX         254
 #define NUMA_DISTANCE_UNREACHABLE 255
 
+/* the value of AcpiHmatLBInfo flags */
+enum {
+    HMAT_LB_MEM_MEMORY           = 0,
+    HMAT_LB_MEM_CACHE_1ST_LEVEL  = 1,
+    HMAT_LB_MEM_CACHE_2ND_LEVEL  = 2,
+    HMAT_LB_MEM_CACHE_3RD_LEVEL  = 3,
+    HMAT_LB_LEVELS   /* must be the last entry */
+};
+
+/* the value of AcpiHmatLBInfo data type */
+enum {
+    HMAT_LB_DATA_ACCESS_LATENCY   = 0,
+    HMAT_LB_DATA_READ_LATENCY     = 1,
+    HMAT_LB_DATA_WRITE_LATENCY    = 2,
+    HMAT_LB_DATA_ACCESS_BANDWIDTH = 3,
+    HMAT_LB_DATA_READ_BANDWIDTH   = 4,
+    HMAT_LB_DATA_WRITE_BANDWIDTH  = 5,
+    HMAT_LB_TYPES   /* must be the last entry */
+};
+
+#define UINT16_BITS       16
+
 struct NodeInfo {
     uint64_t node_mem;
     struct HostMemoryBackend *node_memdev;
     bool present;
     bool has_cpu;
+    uint8_t lb_info_provided;
     uint16_t initiator;
     uint8_t distance[MAX_NODES];
 };
@@ -28,6 +51,31 @@ struct NumaNodeMem {
     uint64_t node_plugged_mem;
 };
 
+struct HMAT_LB_Data {
+    uint8_t     initiator;
+    uint8_t     target;
+    uint64_t    data;
+};
+typedef struct HMAT_LB_Data HMAT_LB_Data;
+
+struct HMAT_LB_Info {
+    /* Indicates it's memory or the specified level memory side cache. */
+    uint8_t     hierarchy;
+
+    /* Present the type of data, access/read/write latency or bandwidth. */
+    uint8_t     data_type;
+
+    /* The range bitmap of bandwidth for calculating common base */
+    uint64_t    range_bitmap;
+
+    /* The common base unit for latencies or bandwidths */
+    uint64_t    base;
+
+    /* Array to store the latencies or bandwidths */
+    GArray      *list;
+};
+typedef struct HMAT_LB_Info HMAT_LB_Info;
+
 struct NumaState {
     /* Number of NUMA nodes */
     int num_nodes;
@@ -40,11 +88,16 @@ struct NumaState {
 
     /* NUMA nodes information */
     NodeInfo nodes[MAX_NODES];
+
+    /* NUMA nodes HMAT Locality Latency and Bandwidth Information */
+    HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES];
 };
 typedef struct NumaState NumaState;
 
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
 void parse_numa_opts(MachineState *ms);
+void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
+                        Error **errp);
 void numa_complete_configuration(MachineState *ms);
 void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
 extern QemuOptsList qemu_numa_opts;
diff --git a/qapi/machine.json b/qapi/machine.json
index 27d0e37534..67f5910400 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -426,10 +426,12 @@
 #
 # @cpu: property based CPU(s) to node mapping (Since: 2.10)
 #
+# @hmat-lb: memory latency and bandwidth information (Since: 5.0)
+#
 # Since: 2.1
 ##
 { 'enum': 'NumaOptionsType',
-  'data': [ 'node', 'dist', 'cpu' ] }
+  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
 
 ##
 # @NumaOptions:
@@ -444,7 +446,8 @@
   'data': {
     'node': 'NumaNodeOptions',
     'dist': 'NumaDistOptions',
-    'cpu': 'NumaCpuOptions' }}
+    'cpu': 'NumaCpuOptions',
+    'hmat-lb': 'NumaHmatLBOptions' }}
 
 ##
 # @NumaNodeOptions:
@@ -557,6 +560,93 @@
    'base': 'CpuInstanceProperties',
    'data' : {} }
 
+##
+# @HmatLBMemoryHierarchy:
+#
+# The memory hierarchy in the System Locality Latency
+# and Bandwidth Information Structure of HMAT (Heterogeneous
+# Memory Attribute Table)
+#
+# For more information about @HmatLBMemoryHierarchy see
+# the chapter 5.2.27.4: Table 5-146: Field "Flags" of ACPI 6.3 spec.
+#
+# @memory: the structure represents the memory performance
+#
+# @first-level: first level of memory side cache
+#
+# @second-level: second level of memory side cache
+#
+# @third-level: third level of memory side cache
+#
+# Since: 5.0
+##
+{ 'enum': 'HmatLBMemoryHierarchy',
+  'data': [ 'memory', 'first-level', 'second-level', 'third-level' ] }
+
+##
+# @HmatLBDataType:
+#
+# Data type in the System Locality Latency
+# and Bandwidth Information Structure of HMAT (Heterogeneous
+# Memory Attribute Table)
+#
+# For more information about @HmatLBDataType see
+# the chapter 5.2.27.4: Table 5-146:  Field "Data Type" of ACPI 6.3 spec.
+#
+# @access-latency: access latency (nanoseconds)
+#
+# @read-latency: read latency (nanoseconds)
+#
+# @write-latency: write latency (nanoseconds)
+#
+# @access-bandwidth: access bandwidth (B/s)
+#
+# @read-bandwidth: read bandwidth (B/s)
+#
+# @write-bandwidth: write bandwidth (B/s)
+#
+# Since: 5.0
+##
+{ 'enum': 'HmatLBDataType',
+  'data': [ 'access-latency', 'read-latency', 'write-latency',
+            'access-bandwidth', 'read-bandwidth', 'write-bandwidth' ] }
+
+##
+# @NumaHmatLBOptions:
+#
+# Set the system locality latency and bandwidth information
+# between Initiator and Target proximity Domains.
+#
+# For more information about @NumaHmatLBOptions see
+# the chapter 5.2.27.4: Table 5-146 of ACPI 6.3 spec.
+#
+# @initiator: the Initiator Proximity Domain.
+#
+# @target: the Target Proximity Domain.
+#
+# @hierarchy: the Memory Hierarchy. Indicates the performance
+#             of memory or side cache.
+#
+# @data-type: presents the type of data, access/read/write
+#             latency or hit latency.
+#
+# @latency: the value of latency from @initiator to @target proximity domain,
+#           the latency unit is "ns(nanosecond)".
+#
+# @bandwidth: the value of bandwidth between @initiator and @target proximity
+#             domain, the bandwidth unit is "B(/s)".
+#
+# Since: 5.0
+##
+{ 'struct': 'NumaHmatLBOptions',
+    'data': {
+    'initiator': 'uint16',
+    'target': 'uint16',
+    'hierarchy': 'HmatLBMemoryHierarchy',
+    'data-type': 'HmatLBDataType',
+    '*latency': 'time',
+    '*bandwidth': 'size' }}
+
 ##
 # @HostMemPolicy:
 #
diff --git a/qemu-options.hx b/qemu-options.hx
index 63f6b33322..929d275450 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -168,16 +168,19 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
     "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa dist,src=source,dst=destination,val=distance\n"
-    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
+    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
+    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
     QEMU_ARCH_ALL)
 STEXI
 @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
 @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
 @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
 @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
+@itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{hierarchy},data-type=@var{tpye}[,latency=@var{lat}][,bandwidth=@var{bw}]
 @findex -numa
 Define a NUMA node and assign RAM and VCPUs to it.
 Set the NUMA distance from a source node to a destination node.
+Set the ACPI Heterogeneous Memory Attributes for the given nodes.
 
 Legacy VCPU assignment uses @samp{cpus} option where
 @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
@@ -256,6 +259,50 @@ specified resources, it just assigns existing resources to NUMA
 nodes. This means that one still has to use the @option{-m},
 @option{-smp} options to allocate RAM and VCPUs respectively.
 
+Use @samp{hmat-lb} to set System Locality Latency and Bandwidth Information
+between initiator and target NUMA nodes in ACPI Heterogeneous Attribute Memory Table (HMAT).
+Initiator NUMA node can create memory requests, usually it has one or more processors.
+Target NUMA node contains addressable memory.
+
+In @samp{hmat-lb} option, @var{node} are NUMA node IDs. @var{hierarchy} is the memory
+hierarchy of the target NUMA node: if @var{hierarchy} is 'memory', the structure
+represents the memory performance; if @var{hierarchy} is 'first-level|second-level|third-level',
+this structure represents aggregated performance of memory side caches for each domain.
+@var{type} of 'data-type' is type of data represented by this structure instance:
+if 'hierarchy' is 'memory', 'data-type' is 'access|read|write' latency or 'access|read|write'
+bandwidth of the target memory; if 'hierarchy' is 'first-level|second-level|third-level',
+'data-type' is 'access|read|write' hit latency or 'access|read|write' hit bandwidth of the
+target memory side cache.
+
+@var{lat} is latency value, the possible value and units are NUM[ns|us|ms]
+(nanosecond|microsecond|millisecond), the recommended unit is 'ns'. @var{bw} is
+bandwidth value, the possible value and units are NUM[M|G|T], mean that the
+bandwidth value are NUM byte (or MB/s, GB/s or TB/s depending on used suffix).
+Note that if NUM is 0, means the corresponding latency or bandwidth information
+is not provided. And if input numbers without any unit, the latency unit will be
+'ns' and the bandwidth will be B/s.
+
+For example, the following options describe 2 NUMA nodes. Node 0 has 2 cpus and
+a ram, node 1 has only a ram. The processors in node 0 access memory in node
+0 with access-latency 5 nanoseconds, access-bandwidth is 200 MB/s;
+The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
+nanoseconds, access-bandwidth is 100 MB/s.
+@example
+-machine hmat=on \
+-m 2G \
+-object memory-backend-ram,size=1G,id=m0 \
+-object memory-backend-ram,size=1G,id=m1 \
+-smp 2 \
+-numa node,nodeid=0,memdev=m0 \
+-numa node,nodeid=1,memdev=m1,initiator=0 \
+-numa cpu,node-id=0,socket-id=0 \
+-numa cpu,node-id=0,socket-id=1 \
+-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
+-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
+-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
+-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M
+@end example
+
 ETEXI
 
 DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v16 09/14] numa: Extend CLI to provide memory side cache information
  2019-11-15  7:53 [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (7 preceding siblings ...)
  2019-11-15  7:53 ` [PATCH v16 08/14] numa: Extend CLI to provide memory latency and bandwidth information Tao Xu
@ 2019-11-15  7:53 ` Tao Xu
  2019-11-19 11:47   ` Igor Mammedov
  2019-11-15  7:53 ` [PATCH v16 10/14] hmat acpi: Build Memory Proximity Domain Attributes Structure(s) Tao Xu
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Tao Xu @ 2019-11-15  7:53 UTC (permalink / raw)
  To: mst, imammedo, eblake, ehabkost, marcel.apfelbaum, armbru,
	mdroth, thuth, lvivier
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, Daniel Black, jonathan.cameron

From: Liu Jingqi <jingqi.liu@intel.com>

Add -numa hmat-cache option to provide Memory Side Cache Information.
These memory attributes help to build Memory Side Cache Information
Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT).

Reviewed-by: Daniel Black <daniel@linux.ibm.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v16:
    - Add cross check with hmat_lb data (Igor)
    - Drop total_levels in struct HMAT_Cache_Info (Igor)
    - Correct the error table number (Igor)

Changes in v15:
    - Change the QAPI version tag to 5.0 (Eric)

Changes in v13:
    - Drop the total_levels option.
    - Use readable cache size (Igor)
---
 hw/core/numa.c        | 73 ++++++++++++++++++++++++++++++++++++++++
 include/sysemu/numa.h | 26 +++++++++++++++
 qapi/machine.json     | 78 +++++++++++++++++++++++++++++++++++++++++--
 qemu-options.hx       | 16 +++++++--
 4 files changed, 189 insertions(+), 4 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 47c7a96863..4d7af60bfd 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -359,6 +359,66 @@ void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
     g_array_append_val(hmat_lb->list, lb_data);
 }
 
+void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
+                           Error **errp)
+{
+    int nb_numa_nodes = ms->numa_state->num_nodes;
+    NodeInfo *numa_info = ms->numa_state->nodes;
+    HMAT_Cache_Info *hmat_cache = NULL;
+
+    if (node->node_id >= nb_numa_nodes) {
+        error_setg(errp, "Invalid node-id=%" PRIu32 ", it should be less "
+                   "than %d", node->node_id, nb_numa_nodes);
+        return;
+    }
+
+    if (numa_info[node->node_id].lb_info_provided != (BIT(0) | BIT(1))) {
+        error_setg(errp, "The latency and bandwidth information of "
+                   "node-id=%" PRIu32 " should be provided before configure "
+                   "memory side cache attributes",
+                   node->node_id);
+        return;
+    }
+
+    if (node->level >= HMAT_LB_LEVELS) {
+        error_setg(errp, "Invalid level=%" PRIu8 ", it should be less than or "
+                   "equal to %d", node->level, HMAT_LB_LEVELS - 1);
+        return;
+    }
+    assert(node->assoc < HMAT_CACHE_ASSOCIATIVITY__MAX);
+    assert(node->policy < HMAT_CACHE_WRITE_POLICY__MAX);
+    if (ms->numa_state->hmat_cache[node->node_id][node->level]) {
+        error_setg(errp, "Duplicate configuration of the side cache for "
+                   "node-id=%" PRIu32 " and level=%" PRIu8,
+                   node->node_id, node->level);
+        return;
+    }
+
+    if ((node->level > 1) &&
+        ms->numa_state->hmat_cache[node->node_id][node->level - 1] &&
+        (node->size >=
+            ms->numa_state->hmat_cache[node->node_id][node->level - 1]->size)) {
+        error_setg(errp, "Invalid size=0x%" PRIx64 ", the size of level=%" PRIu8
+                   " should be less than the size(0x%" PRIx64 ") of "
+                   "level=%" PRIu8, node->size, node->level,
+                   ms->numa_state->hmat_cache[node->node_id]
+                                             [node->level - 1]->size,
+                   node->level - 1);
+        return;
+    }
+
+    hmat_cache = g_malloc0(sizeof(*hmat_cache));
+
+    hmat_cache->proximity = node->node_id;
+    hmat_cache->size = node->size;
+    hmat_cache->level = node->level;
+    hmat_cache->associativity = node->assoc;
+    hmat_cache->write_policy = node->policy;
+    hmat_cache->line_size = node->line;
+
+    ms->numa_state->hmat_cache[node->node_id][node->level] = hmat_cache;
+}
+
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
 {
     Error *err = NULL;
@@ -410,6 +470,19 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
             goto end;
         }
         break;
+    case NUMA_OPTIONS_TYPE_HMAT_CACHE:
+        if (!ms->numa_state->hmat_enabled) {
+            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
+                       "(HMAT) is disabled, enable it with -machine hmat=on "
+                       "before using any of hmat specific options");
+            return;
+        }
+
+        parse_numa_hmat_cache(ms, &object->u.hmat_cache, &err);
+        if (err) {
+            goto end;
+        }
+        break;
     default:
         abort();
     }
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 70f93c83d7..b415550678 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -76,6 +76,27 @@ struct HMAT_LB_Info {
 };
 typedef struct HMAT_LB_Info HMAT_LB_Info;
 
+struct HMAT_Cache_Info {
+    /* The memory proximity domain to which the memory belongs. */
+    uint32_t    proximity;
+
+    /* Size of memory side cache in bytes. */
+    uint64_t    size;
+
+    /* Cache level described in this structure. */
+    uint8_t     level;
+
+    /* Cache Associativity: None/Direct Mapped/Comple Cache Indexing */
+    uint8_t     associativity;
+
+    /* Write Policy: None/Write Back(WB)/Write Through(WT) */
+    uint8_t     write_policy;
+
+    /* Cache Line size in bytes. */
+    uint16_t    line_size;
+};
+typedef struct HMAT_Cache_Info HMAT_Cache_Info;
+
 struct NumaState {
     /* Number of NUMA nodes */
     int num_nodes;
@@ -91,6 +112,9 @@ struct NumaState {
 
     /* NUMA nodes HMAT Locality Latency and Bandwidth Information */
     HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES];
+
+    /* Memory Side Cache Information Structure */
+    HMAT_Cache_Info *hmat_cache[MAX_NODES][HMAT_LB_LEVELS];
 };
 typedef struct NumaState NumaState;
 
@@ -98,6 +122,8 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
 void parse_numa_opts(MachineState *ms);
 void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
                         Error **errp);
+void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
+                           Error **errp);
 void numa_complete_configuration(MachineState *ms);
 void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
 extern QemuOptsList qemu_numa_opts;
diff --git a/qapi/machine.json b/qapi/machine.json
index 67f5910400..999235bc1b 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -428,10 +428,12 @@
 #
 # @hmat-lb: memory latency and bandwidth information (Since: 5.0)
 #
+# @hmat-cache: memory side cache information (Since: 5.0)
+#
 # Since: 2.1
 ##
 { 'enum': 'NumaOptionsType',
-  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
+  'data': [ 'node', 'dist', 'cpu', 'hmat-lb', 'hmat-cache' ] }
 
 ##
 # @NumaOptions:
@@ -447,7 +449,8 @@
     'node': 'NumaNodeOptions',
     'dist': 'NumaDistOptions',
     'cpu': 'NumaCpuOptions',
-    'hmat-lb': 'NumaHmatLBOptions' }}
+    'hmat-lb': 'NumaHmatLBOptions',
+    'hmat-cache': 'NumaHmatCacheOptions' }}
 
 ##
 # @NumaNodeOptions:
@@ -647,6 +650,77 @@
     '*latency': 'time',
     '*bandwidth': 'size' }}
 
+##
+# @HmatCacheAssociativity:
+#
+# Cache associativity in the Memory Side Cache
+# Information Structure of HMAT
+#
+# For more information of @HmatCacheAssociativity see
+# the chapter 5.2.27.5: Table 5-147 of ACPI 6.3 spec.
+#
+# @none: None
+#
+# @direct: Direct Mapped
+#
+# @complex: Complex Cache Indexing (implementation specific)
+#
+# Since: 5.0
+##
+{ 'enum': 'HmatCacheAssociativity',
+  'data': [ 'none', 'direct', 'complex' ] }
+
+##
+# @HmatCacheWritePolicy:
+#
+# Cache write policy in the Memory Side Cache
+# Information Structure of HMAT
+#
+# For more information of @HmatCacheWritePolicy see
+# the chapter 5.2.27.5: Table 5-147: Field "Cache Attributes" of ACPI 6.3 spec.
+#
+# @none: None
+#
+# @write-back: Write Back (WB)
+#
+# @write-through: Write Through (WT)
+#
+# Since: 5.0
+##
+{ 'enum': 'HmatCacheWritePolicy',
+  'data': [ 'none', 'write-back', 'write-through' ] }
+
+##
+# @NumaHmatCacheOptions:
+#
+# Set the memory side cache information for a given memory domain.
+#
+# For more information of @NumaHmatCacheOptions see
+# the chapter 5.2.27.5: Table 5-147: Field "Cache Attributes" of ACPI 6.3 spec.
+#
+# @node-id: the memory proximity domain to which the memory belongs.
+#
+# @size: the size of memory side cache in bytes.
+#
+# @level: the cache level described in this structure.
+#
+# @assoc: the cache associativity, none/direct-mapped/complex(complex cache indexing).
+#
+# @policy: the write policy, none/write-back/write-through.
+#
+# @line: the cache Line size in bytes.
+#
+# Since: 5.0
+##
+{ 'struct': 'NumaHmatCacheOptions',
+  'data': {
+   'node-id': 'uint32',
+   'size': 'size',
+   'level': 'uint8',
+   'assoc': 'HmatCacheAssociativity',
+   'policy': 'HmatCacheWritePolicy',
+   'line': 'uint16' }}
+
 ##
 # @HostMemPolicy:
 #
diff --git a/qemu-options.hx b/qemu-options.hx
index 929d275450..ad0e5aa190 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -169,7 +169,8 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
     "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa dist,src=source,dst=destination,val=distance\n"
     "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
-    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
+    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n"
+    "-numa hmat-cache,node-id=node,size=size,level=level[,assoc=none|direct|complex][,policy=none|write-back|write-through][,line=size]\n",
     QEMU_ARCH_ALL)
 STEXI
 @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
@@ -177,6 +178,7 @@ STEXI
 @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
 @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
 @itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{hierarchy},data-type=@var{tpye}[,latency=@var{lat}][,bandwidth=@var{bw}]
+@itemx -numa hmat-cache,node-id=@var{node},size=@var{size},level=@var{level}[,assoc=@var{str}][,policy=@var{str}][,line=@var{size}]
 @findex -numa
 Define a NUMA node and assign RAM and VCPUs to it.
 Set the NUMA distance from a source node to a destination node.
@@ -282,11 +284,19 @@ Note that if NUM is 0, means the corresponding latency or bandwidth information
 is not provided. And if input numbers without any unit, the latency unit will be
 'ns' and the bandwidth will be B/s.
 
+In @samp{hmat-cache} option, @var{node-id} is the NUMA-id of the memory belongs.
+@var{size} is the size of memory side cache in bytes. @var{level} is the cache
+level described in this structure. @var{assoc} is the cache associativity,
+the possible value is 'none/direct(direct-mapped)/complex(complex cache indexing)'.
+@var{policy} is the write policy. @var{line} is the cache Line size in bytes.
+
 For example, the following options describe 2 NUMA nodes. Node 0 has 2 cpus and
 a ram, node 1 has only a ram. The processors in node 0 access memory in node
 0 with access-latency 5 nanoseconds, access-bandwidth is 200 MB/s;
 The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
 nanoseconds, access-bandwidth is 100 MB/s.
+And for memory side cache information, NUMA node 0 and 1 both have 1 level memory
+cache, size is 10KB, policy is write-back, the cache Line size is 8 bytes:
 @example
 -machine hmat=on \
 -m 2G \
@@ -300,7 +310,9 @@ nanoseconds, access-bandwidth is 100 MB/s.
 -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
 -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
 -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
--numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M
+-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \
+-numa hmat-cache,node-id=0,size=10K,level=1,assoc=direct,policy=write-back,line=8 \
+-numa hmat-cache,node-id=1,size=10K,level=1,assoc=direct,policy=write-back,line=8
 @end example
 
 ETEXI
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v16 10/14] hmat acpi: Build Memory Proximity Domain Attributes Structure(s)
  2019-11-15  7:53 [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (8 preceding siblings ...)
  2019-11-15  7:53 ` [PATCH v16 09/14] numa: Extend CLI to provide memory side cache information Tao Xu
@ 2019-11-15  7:53 ` Tao Xu
  2019-11-15  7:53 ` [PATCH v16 11/14] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao Xu
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Tao Xu @ 2019-11-15  7:53 UTC (permalink / raw)
  To: mst, imammedo, eblake, ehabkost, marcel.apfelbaum, armbru,
	mdroth, thuth, lvivier
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, Daniel Black, Jonathan Cameron

From: Liu Jingqi <jingqi.liu@intel.com>

HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
(HMAT). The specification references below link:
http://www.uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf

It describes the memory attributes, such as memory side cache
attributes and bandwidth and latency details, related to the
Memory Proximity Domain. The software is
expected to use this information as hint for optimization.

This structure describes Memory Proximity Domain Attributes by memory
subsystem and its associativity with processor proximity domain as well as
hint for memory usage.

In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Daniel Black <daniel@linux.ibm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v16:
    - Use uint32_t for initiator and mem_node

Changes in v13:
    - Remove the unnecessary head file.
---
 hw/acpi/Kconfig       |  7 ++-
 hw/acpi/Makefile.objs |  1 +
 hw/acpi/hmat.c        | 99 +++++++++++++++++++++++++++++++++++++++++++
 hw/acpi/hmat.h        | 42 ++++++++++++++++++
 hw/i386/acpi-build.c  |  5 +++
 5 files changed, 152 insertions(+), 2 deletions(-)
 create mode 100644 hw/acpi/hmat.c
 create mode 100644 hw/acpi/hmat.h

diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 12e3f1e86e..54209c6f2f 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -7,6 +7,7 @@ config ACPI_X86
     select ACPI_NVDIMM
     select ACPI_CPU_HOTPLUG
     select ACPI_MEMORY_HOTPLUG
+    select ACPI_HMAT
 
 config ACPI_X86_ICH
     bool
@@ -23,6 +24,10 @@ config ACPI_NVDIMM
     bool
     depends on ACPI
 
+config ACPI_HMAT
+    bool
+    depends on ACPI
+
 config ACPI_PCI
     bool
     depends on ACPI && PCI
@@ -33,5 +38,3 @@ config ACPI_VMGENID
     depends on PC
 
 config ACPI_HW_REDUCED
-    bool
-    depends on ACPI
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 655a9c1973..517bd88704 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -7,6 +7,7 @@ common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
 common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
 common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
+common-obj-$(CONFIG_ACPI_HMAT) += hmat.o
 common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
 
 common-obj-y += acpi_interface.o
diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
new file mode 100644
index 0000000000..9ff79308a4
--- /dev/null
+++ b/hw/acpi/hmat.c
@@ -0,0 +1,99 @@
+/*
+ * HMAT ACPI Implementation
+ *
+ * Copyright(C) 2019 Intel Corporation.
+ *
+ * Author:
+ *  Liu jingqi <jingqi.liu@linux.intel.com>
+ *  Tao Xu <tao3.xu@intel.com>
+ *
+ * HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
+ * (HMAT)
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/numa.h"
+#include "hw/acpi/hmat.h"
+
+/*
+ * ACPI 6.3:
+ * 5.2.27.3 Memory Proximity Domain Attributes Structure: Table 5-145
+ */
+static void build_hmat_mpda(GArray *table_data, uint16_t flags,
+                            uint32_t initiator, uint32_t mem_node)
+{
+
+    /* Memory Proximity Domain Attributes Structure */
+    /* Type */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Length */
+    build_append_int_noprefix(table_data, 40, 4);
+    /* Flags */
+    build_append_int_noprefix(table_data, flags, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Proximity Domain for the Attached Initiator */
+    build_append_int_noprefix(table_data, initiator, 4);
+    /* Proximity Domain for the Memory */
+    build_append_int_noprefix(table_data, mem_node, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 4);
+    /*
+     * Reserved:
+     * Previously defined as the Start Address of the System Physical
+     * Address Range. Deprecated since ACPI Spec 6.3.
+     */
+    build_append_int_noprefix(table_data, 0, 8);
+    /*
+     * Reserved:
+     * Previously defined as the Range Length of the region in bytes.
+     * Deprecated since ACPI Spec 6.3.
+     */
+    build_append_int_noprefix(table_data, 0, 8);
+}
+
+/* Build HMAT sub table structures */
+static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
+{
+    uint16_t flags;
+    int i;
+
+    for (i = 0; i < numa_state->num_nodes; i++) {
+        flags = 0;
+
+        if (numa_state->nodes[i].initiator < MAX_NODES) {
+            flags |= HMAT_PROXIMITY_INITIATOR_VALID;
+        }
+
+        build_hmat_mpda(table_data, flags, numa_state->nodes[i].initiator, i);
+    }
+}
+
+void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *numa_state)
+{
+    int hmat_start = table_data->len;
+
+    /* reserve space for HMAT header  */
+    acpi_data_push(table_data, 40);
+
+    hmat_build_table_structs(table_data, numa_state);
+
+    build_header(linker, table_data,
+                 (void *)(table_data->data + hmat_start),
+                 "HMAT", table_data->len - hmat_start, 2, NULL, NULL);
+}
diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
new file mode 100644
index 0000000000..437dbc6872
--- /dev/null
+++ b/hw/acpi/hmat.h
@@ -0,0 +1,42 @@
+/*
+ * HMAT ACPI Implementation Header
+ *
+ * Copyright(C) 2019 Intel Corporation.
+ *
+ * Author:
+ *  Liu jingqi <jingqi.liu@linux.intel.com>
+ *  Tao Xu <tao3.xu@intel.com>
+ *
+ * HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
+ * (HMAT)
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#ifndef HMAT_H
+#define HMAT_H
+
+#include "hw/acpi/aml-build.h"
+
+/*
+ * ACPI 6.3: 5.2.27.3 Memory Proximity Domain Attributes Structure,
+ * Table 5-145, Field "flag", Bit [0]: set to 1 to indicate that data in
+ * the Proximity Domain for the Attached Initiator field is valid.
+ * Other bits reserved.
+ */
+#define HMAT_PROXIMITY_INITIATOR_VALID  0x1
+
+void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *numa_state);
+
+#endif
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 12ff55fcfb..90a9c2ce6f 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -67,6 +67,7 @@
 #include "hw/i386/intel_iommu.h"
 
 #include "hw/acpi/ipmi.h"
+#include "hw/acpi/hmat.h"
 
 /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
  * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
@@ -2834,6 +2835,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
             acpi_add_table(table_offsets, tables_blob);
             build_slit(tables_blob, tables->linker, machine);
         }
+        if (machine->numa_state->hmat_enabled) {
+            acpi_add_table(table_offsets, tables_blob);
+            build_hmat(tables_blob, tables->linker, machine->numa_state);
+        }
     }
     if (acpi_get_mcfg(&mcfg)) {
         acpi_add_table(table_offsets, tables_blob);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v16 11/14] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)
  2019-11-15  7:53 [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (9 preceding siblings ...)
  2019-11-15  7:53 ` [PATCH v16 10/14] hmat acpi: Build Memory Proximity Domain Attributes Structure(s) Tao Xu
@ 2019-11-15  7:53 ` Tao Xu
  2019-11-20 10:09   ` Igor Mammedov
  2019-11-15  7:53 ` [PATCH v16 12/14] hmat acpi: Build Memory Side Cache " Tao Xu
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Tao Xu @ 2019-11-15  7:53 UTC (permalink / raw)
  To: mst, imammedo, eblake, ehabkost, marcel.apfelbaum, armbru,
	mdroth, thuth, lvivier
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

From: Liu Jingqi <jingqi.liu@intel.com>

This structure describes the memory access latency and bandwidth
information from various memory access initiator proximity domains.
The latency and bandwidth numbers represented in this structure
correspond to rated latency and bandwidth for the platform.
The software could use this information as hint for optimization.

Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v16:
    - Add more description for lb_length (Igor)
    - Drop entry_list and calculate entries in this patch (Igor)

Changes in v13:
    - Calculate the entries in a new patch.
---
 hw/acpi/hmat.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 104 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index 9ff79308a4..ed19ebed2f 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -25,8 +25,10 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/units.h"
 #include "sysemu/numa.h"
 #include "hw/acpi/hmat.h"
+#include "qemu/error-report.h"
 
 /*
  * ACPI 6.3:
@@ -67,11 +69,89 @@ static void build_hmat_mpda(GArray *table_data, uint16_t flags,
     build_append_int_noprefix(table_data, 0, 8);
 }
 
+/*
+ * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+ * Structure: Table 5-146
+ */
+static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
+                          uint32_t num_initiator, uint32_t num_target,
+                          uint32_t *initiator_list)
+{
+    int i, index;
+    HMAT_LB_Data *lb_data;
+    uint16_t *entry_list;
+    uint32_t base;
+    /* Length in bytes for entire structure */
+    uint32_t lb_length
+        = 32 /* Table length upto and including Entry Base Unit */
+        + 4 * num_initiator /* Initiator Proximity Domain List */
+        + 4 * num_target /* Target Proximity Domain List */
+        + 2 * num_initiator * num_target; /* Latency or Bandwidth Entries */
+
+    /* Type */
+    build_append_int_noprefix(table_data, 1, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Length */
+    build_append_int_noprefix(table_data, lb_length, 4);
+    /* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
+    assert(!(hmat_lb->hierarchy >> 4));
+    build_append_int_noprefix(table_data, hmat_lb->hierarchy, 1);
+    /* Data Type */
+    build_append_int_noprefix(table_data, hmat_lb->data_type, 1);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Number of Initiator Proximity Domains (s) */
+    build_append_int_noprefix(table_data, num_initiator, 4);
+    /* Number of Target Proximity Domains (t) */
+    build_append_int_noprefix(table_data, num_target, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 4);
+
+    /* Entry Base Unit */
+    if (hmat_lb->data_type <= HMAT_LB_DATA_WRITE_LATENCY) {
+        /* Convert latency base from nanoseconds to picosecond */
+        base = hmat_lb->base * 1000;
+    } else {
+        /* Convert bandwidth base from Byte to Megabyte */
+        base = hmat_lb->base / MiB;
+    }
+    build_append_int_noprefix(table_data, base, 8);
+
+    /* Initiator Proximity Domain List */
+    for (i = 0; i < num_initiator; i++) {
+        build_append_int_noprefix(table_data, initiator_list[i], 4);
+    }
+
+    /* Target Proximity Domain List */
+    for (i = 0; i < num_target; i++) {
+        build_append_int_noprefix(table_data, i, 4);
+    }
+
+    /* Latency or Bandwidth Entries */
+    entry_list = g_malloc0(hmat_lb->list->len * sizeof(uint16_t));
+    for (i = 0; i < hmat_lb->list->len; i++) {
+        lb_data = &g_array_index(hmat_lb->list, HMAT_LB_Data, i);
+        index = lb_data->initiator * num_target + lb_data->target;
+
+        entry_list[index] = (uint16_t)(lb_data->data / hmat_lb->base);
+    }
+
+    for (i = 0; i < num_initiator * num_target; i++) {
+        build_append_int_noprefix(table_data, entry_list[i], 2);
+    }
+
+    g_free(entry_list);
+}
+
 /* Build HMAT sub table structures */
 static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
 {
     uint16_t flags;
-    int i;
+    uint32_t num_initiator = 0;
+    uint32_t initiator_list[MAX_NODES];
+    int i, hierarchy, type;
+    HMAT_LB_Info *hmat_lb;
 
     for (i = 0; i < numa_state->num_nodes; i++) {
         flags = 0;
@@ -82,6 +162,29 @@ static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
 
         build_hmat_mpda(table_data, flags, numa_state->nodes[i].initiator, i);
     }
+
+    for (i = 0; i < numa_state->num_nodes; i++) {
+        if (numa_state->nodes[i].has_cpu) {
+            initiator_list[num_initiator++] = i;
+        }
+    }
+
+    /*
+     * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+     * Structure: Table 5-146
+     */
+    for (hierarchy = HMAT_LB_MEM_MEMORY;
+         hierarchy <= HMAT_LB_MEM_CACHE_3RD_LEVEL; hierarchy++) {
+        for (type = HMAT_LB_DATA_ACCESS_LATENCY;
+             type <= HMAT_LB_DATA_WRITE_BANDWIDTH; type++) {
+            hmat_lb = numa_state->hmat_lb[hierarchy][type];
+
+            if (hmat_lb) {
+                build_hmat_lb(table_data, hmat_lb, num_initiator,
+                              numa_state->num_nodes, initiator_list);
+            }
+        }
+    }
 }
 
 void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *numa_state)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v16 12/14] hmat acpi: Build Memory Side Cache Information Structure(s)
  2019-11-15  7:53 [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (10 preceding siblings ...)
  2019-11-15  7:53 ` [PATCH v16 11/14] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao Xu
@ 2019-11-15  7:53 ` Tao Xu
  2019-11-20 12:50   ` Igor Mammedov
  2019-11-15  7:53 ` [PATCH v16 13/14] tests/numa: Add case for QMP build HMAT Tao Xu
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Tao Xu @ 2019-11-15  7:53 UTC (permalink / raw)
  To: mst, imammedo, eblake, ehabkost, marcel.apfelbaum, armbru,
	mdroth, thuth, lvivier
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, Daniel Black, Jonathan Cameron

From: Liu Jingqi <jingqi.liu@intel.com>

This structure describes memory side cache information for memory
proximity domains if the memory side cache is present and the
physical device forms the memory side cache.
The software could use this information to effectively place
the data in memory to maximize the performance of the system
memory that use the memory side cache.

Reviewed-by: Daniel Black <daniel@linux.ibm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v16:
    - Use checks and assert to replace masks (Igor)
    - Fields in Cache Attributes are promoted to uint32_t before
      shifting (Igor)
    - Drop cpu_to_le32() (Igor)

Changes in v13:
    - rename level as cache_level
---
 hw/acpi/hmat.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 68 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index ed19ebed2f..2b4f760e0e 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -144,14 +144,62 @@ static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
     g_free(entry_list);
 }
 
+/* ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure: Table 5-147 */
+static void build_hmat_cache(GArray *table_data, HMAT_Cache_Info *hmat_cache,
+                             uint8_t total_levels)
+{
+    /*
+     * Cache Attributes: Bits [3:0] – Total Cache Levels
+     * for this Memory Proximity Domain
+     */
+    uint32_t cache_attr = total_levels;
+
+    /* Bits [7:4] : Cache Level described in this structure */
+    cache_attr |= (uint32_t) hmat_cache->level << 4;
+
+    /* Bits [11:8] - Cache Associativity */
+    cache_attr |= (uint32_t) hmat_cache->associativity << 8;
+
+    /* Bits [15:12] - Write Policy */
+    cache_attr |= (uint32_t) hmat_cache->write_policy << 12;
+
+    /* Bits [31:16] - Cache Line size in bytes */
+    cache_attr |= (uint32_t) hmat_cache->line_size << 16;
+
+    /* Type */
+    build_append_int_noprefix(table_data, 2, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Length */
+    build_append_int_noprefix(table_data, 32, 4);
+    /* Proximity Domain for the Memory */
+    build_append_int_noprefix(table_data, hmat_cache->proximity, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 4);
+    /* Memory Side Cache Size */
+    build_append_int_noprefix(table_data, hmat_cache->size, 8);
+    /* Cache Attributes */
+    build_append_int_noprefix(table_data, cache_attr, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /*
+     * Number of SMBIOS handles (n)
+     * Linux kernel uses Memory Side Cache Information Structure
+     * without SMBIOS entries for now, so set Number of SMBIOS handles
+     * as 0.
+     */
+    build_append_int_noprefix(table_data, 0, 2);
+}
+
 /* Build HMAT sub table structures */
 static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
 {
     uint16_t flags;
     uint32_t num_initiator = 0;
     uint32_t initiator_list[MAX_NODES];
-    int i, hierarchy, type;
+    int i, hierarchy, type, cache_level, total_levels;
     HMAT_LB_Info *hmat_lb;
+    HMAT_Cache_Info *hmat_cache;
 
     for (i = 0; i < numa_state->num_nodes; i++) {
         flags = 0;
@@ -185,6 +233,25 @@ static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
             }
         }
     }
+
+    /*
+     * ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure:
+     * Table 5-147
+     */
+    for (i = 0; i < numa_state->num_nodes; i++) {
+        total_levels = 0;
+        for (cache_level = 1; cache_level < HMAT_LB_LEVELS; cache_level++) {
+            if (numa_state->hmat_cache[i][cache_level]) {
+                total_levels++;
+            }
+        }
+        for (cache_level = 0; cache_level <= total_levels; cache_level++) {
+            hmat_cache = numa_state->hmat_cache[i][cache_level];
+            if (hmat_cache) {
+                build_hmat_cache(table_data, hmat_cache, total_levels);
+            }
+        }
+    }
 }
 
 void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *numa_state)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v16 13/14] tests/numa: Add case for QMP build HMAT
  2019-11-15  7:53 [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (11 preceding siblings ...)
  2019-11-15  7:53 ` [PATCH v16 12/14] hmat acpi: Build Memory Side Cache " Tao Xu
@ 2019-11-15  7:53 ` Tao Xu
  2019-11-20 12:32   ` Igor Mammedov
  2019-11-15  7:53 ` [PATCH v16 14/14] tests/bios-tables-test: add test cases for ACPI HMAT Tao Xu
  2019-11-15  8:58 ` [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) no-reply
  14 siblings, 1 reply; 29+ messages in thread
From: Tao Xu @ 2019-11-15  7:53 UTC (permalink / raw)
  To: mst, imammedo, eblake, ehabkost, marcel.apfelbaum, armbru,
	mdroth, thuth, lvivier
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

Check configuring HMAT usecase

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

New patch in v16.
---
 tests/numa-test.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/tests/numa-test.c b/tests/numa-test.c
index 8de8581231..15889c26c0 100644
--- a/tests/numa-test.c
+++ b/tests/numa-test.c
@@ -327,6 +327,56 @@ static void pc_dynamic_cpu_cfg(const void *data)
     qtest_quit(qs);
 }
 
+static void pc_build_hmat_cfg(const void *data)
+{
+    QTestState *qs;
+
+    qs = qtest_initf("%s -nodefaults --preconfig -machine hmat=on "
+                     "-smp 2,sockets=2 "
+                     "-m 128M,slots=2,maxmem=1G "
+                     "-object memory-backend-ram,size=64M,id=m0 "
+                     "-object memory-backend-ram,size=64M,id=m1 "
+                     "-numa node,nodeid=0,memdev=m0 "
+                     "-numa node,nodeid=1,memdev=m1,initiator=0 "
+                     "-numa cpu,node-id=0,socket-id=0 "
+                     "-numa cpu,node-id=0,socket-id=1",
+                     data ? (char *)data : "");
+
+    /* Configuring HMAT bandwidth and latency details */
+    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+        " 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 0,"
+        " 'hierarchy': \"memory\", 'data-type': \"access-latency\","
+        " 'latency': 5 } }")));
+    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+        " 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 0,"
+        " 'hierarchy': \"memory\", 'data-type': \"access-bandwidth\","
+        " 'bandwidth': 524288000 } }")));
+    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+        " 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 1,"
+        " 'hierarchy': \"memory\", 'data-type': \"access-latency\","
+        " 'latency': 10 } }")));
+    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+        " 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 1,"
+        " 'hierarchy': \"memory\", 'data-type': \"access-bandwidth\","
+        " 'bandwidth': 104857600 } }")));
+
+    /* Configuring HMAT memory side cache attributes */
+    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+        " 'arguments': { 'type': 'hmat-cache', 'node-id': 0, 'size': 10240,"
+        " 'level': 1, 'assoc': \"direct\", 'policy': \"write-back\","
+        " 'line': 8 } }")));
+    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+        " 'arguments': { 'type': 'hmat-cache', 'node-id': 1, 'size': 10240,"
+        " 'level': 1, 'assoc': \"direct\", 'policy': \"write-back\","
+        " 'line': 8 } }")));
+
+    /* let machine initialization to complete and run */
+    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'x-exit-preconfig' }")));
+    qtest_qmp_eventwait(qs, "RESUME");
+
+    qtest_quit(qs);
+}
+
 int main(int argc, char **argv)
 {
     const char *args = NULL;
@@ -346,6 +396,7 @@ int main(int argc, char **argv)
     if (!strcmp(arch, "i386") || !strcmp(arch, "x86_64")) {
         qtest_add_data_func("/numa/pc/cpu/explicit", args, pc_numa_cpu);
         qtest_add_data_func("/numa/pc/dynamic/cpu", args, pc_dynamic_cpu_cfg);
+        qtest_add_data_func("/numa/pc/build/hmat", args, pc_build_hmat_cfg);
     }
 
     if (!strcmp(arch, "ppc64")) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v16 14/14] tests/bios-tables-test: add test cases for ACPI HMAT
  2019-11-15  7:53 [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (12 preceding siblings ...)
  2019-11-15  7:53 ` [PATCH v16 13/14] tests/numa: Add case for QMP build HMAT Tao Xu
@ 2019-11-15  7:53 ` Tao Xu
  2019-11-15  8:58 ` [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) no-reply
  14 siblings, 0 replies; 29+ messages in thread
From: Tao Xu @ 2019-11-15  7:53 UTC (permalink / raw)
  To: mst, imammedo, eblake, ehabkost, marcel.apfelbaum, armbru,
	mdroth, thuth, lvivier
  Cc: Jingqi Liu, tao3.xu, fan.du, qemu-devel, Daniel Black, jonathan.cameron

ACPI table HMAT has been introduced, QEMU now builds HMAT tables for
Heterogeneous Memory with boot option '-numa node'.

Add test cases on PC and Q35 machines with 2 numa nodes.
Because HMAT is generated when system enable numa, the
following tables need to be added for this test:
    tests/data/acpi/pc/APIC.acpihmat
    tests/data/acpi/pc/SRAT.acpihmat
    tests/data/acpi/pc/HMAT.acpihmat
    tests/data/acpi/pc/DSDT.acpihmat
    tests/data/acpi/q35/APIC.acpihmat
    tests/data/acpi/q35/SRAT.acpihmat
    tests/data/acpi/q35/HMAT.acpihmat
    tests/data/acpi/q35/DSDT.acpihmat

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Daniel Black <daniel@linux.ibm.com>
Reviewed-by: Jingqi Liu <Jingqi.liu@intel.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v16.

Changes in v15:
    - Make tests without breaking CI (Michael)

Changes in v13:
    - Use decimal notation with appropriate suffix for cache size
---
 tests/bios-tables-test-allowed-diff.h |  8 +++++
 tests/bios-tables-test.c              | 44 +++++++++++++++++++++++++++
 tests/data/acpi/pc/APIC.acpihmat      |  0
 tests/data/acpi/pc/DSDT.acpihmat      |  0
 tests/data/acpi/pc/HMAT.acpihmat      |  0
 tests/data/acpi/pc/SRAT.acpihmat      |  0
 tests/data/acpi/q35/APIC.acpihmat     |  0
 tests/data/acpi/q35/DSDT.acpihmat     |  0
 tests/data/acpi/q35/HMAT.acpihmat     |  0
 tests/data/acpi/q35/SRAT.acpihmat     |  0
 10 files changed, 52 insertions(+)
 create mode 100644 tests/data/acpi/pc/APIC.acpihmat
 create mode 100644 tests/data/acpi/pc/DSDT.acpihmat
 create mode 100644 tests/data/acpi/pc/HMAT.acpihmat
 create mode 100644 tests/data/acpi/pc/SRAT.acpihmat
 create mode 100644 tests/data/acpi/q35/APIC.acpihmat
 create mode 100644 tests/data/acpi/q35/DSDT.acpihmat
 create mode 100644 tests/data/acpi/q35/HMAT.acpihmat
 create mode 100644 tests/data/acpi/q35/SRAT.acpihmat

diff --git a/tests/bios-tables-test-allowed-diff.h b/tests/bios-tables-test-allowed-diff.h
index dfb8523c8b..3c9e0c979b 100644
--- a/tests/bios-tables-test-allowed-diff.h
+++ b/tests/bios-tables-test-allowed-diff.h
@@ -1 +1,9 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/pc/APIC.acpihmat",
+"tests/data/acpi/pc/SRAT.acpihmat",
+"tests/data/acpi/pc/HMAT.acpihmat",
+"tests/data/acpi/pc/DSDT.acpihmat",
+"tests/data/acpi/q35/APIC.acpihmat",
+"tests/data/acpi/q35/SRAT.acpihmat",
+"tests/data/acpi/q35/HMAT.acpihmat",
+"tests/data/acpi/q35/DSDT.acpihmat",
diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index 0b33fb265f..96803c1f20 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -947,6 +947,48 @@ static void test_acpi_virt_tcg_numamem(void)
 
 }
 
+static void test_acpi_tcg_acpi_hmat(const char *machine)
+{
+    test_data data;
+
+    memset(&data, 0, sizeof(data));
+    data.machine = machine;
+    data.variant = ".acpihmat";
+    test_acpi_one(" -machine hmat=on"
+                  " -smp 2,sockets=2"
+                  " -m 128M,slots=2,maxmem=1G"
+                  " -object memory-backend-ram,size=64M,id=m0"
+                  " -object memory-backend-ram,size=64M,id=m1"
+                  " -numa node,nodeid=0,memdev=m0"
+                  " -numa node,nodeid=1,memdev=m1,initiator=0"
+                  " -numa cpu,node-id=0,socket-id=0"
+                  " -numa cpu,node-id=0,socket-id=1"
+                  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+                  "data-type=access-latency,latency=5ns"
+                  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=500M"
+                  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
+                  "data-type=access-latency,latency=10ns"
+                  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=100M"
+                  " -numa hmat-cache,node-id=0,size=10K,level=1,assoc=direct,"
+                  "policy=write-back,line=8"
+                  " -numa hmat-cache,node-id=1,size=10K,level=1,assoc=direct,"
+                  "policy=write-back,line=8",
+                  &data);
+    free_test_data(&data);
+}
+
+static void test_acpi_q35_tcg_acpi_hmat(void)
+{
+    test_acpi_tcg_acpi_hmat(MACHINE_Q35);
+}
+
+static void test_acpi_piix4_tcg_acpi_hmat(void)
+{
+    test_acpi_tcg_acpi_hmat(MACHINE_PC);
+}
+
 static void test_acpi_virt_tcg(void)
 {
     test_data data = {
@@ -991,6 +1033,8 @@ int main(int argc, char *argv[])
         qtest_add_func("acpi/q35/numamem", test_acpi_q35_tcg_numamem);
         qtest_add_func("acpi/piix4/dimmpxm", test_acpi_piix4_tcg_dimm_pxm);
         qtest_add_func("acpi/q35/dimmpxm", test_acpi_q35_tcg_dimm_pxm);
+        qtest_add_func("acpi/piix4/acpihmat", test_acpi_piix4_tcg_acpi_hmat);
+        qtest_add_func("acpi/q35/acpihmat", test_acpi_q35_tcg_acpi_hmat);
     } else if (strcmp(arch, "aarch64") == 0) {
         qtest_add_func("acpi/virt", test_acpi_virt_tcg);
         qtest_add_func("acpi/virt/numamem", test_acpi_virt_tcg_numamem);
diff --git a/tests/data/acpi/pc/APIC.acpihmat b/tests/data/acpi/pc/APIC.acpihmat
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/tests/data/acpi/pc/DSDT.acpihmat b/tests/data/acpi/pc/DSDT.acpihmat
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/tests/data/acpi/pc/HMAT.acpihmat b/tests/data/acpi/pc/HMAT.acpihmat
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/tests/data/acpi/pc/SRAT.acpihmat b/tests/data/acpi/pc/SRAT.acpihmat
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/tests/data/acpi/q35/APIC.acpihmat b/tests/data/acpi/q35/APIC.acpihmat
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/tests/data/acpi/q35/DSDT.acpihmat b/tests/data/acpi/q35/DSDT.acpihmat
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/tests/data/acpi/q35/HMAT.acpihmat b/tests/data/acpi/q35/HMAT.acpihmat
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/tests/data/acpi/q35/SRAT.acpihmat b/tests/data/acpi/q35/SRAT.acpihmat
new file mode 100644
index 0000000000..e69de29bb2
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
  2019-11-15  7:53 [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (13 preceding siblings ...)
  2019-11-15  7:53 ` [PATCH v16 14/14] tests/bios-tables-test: add test cases for ACPI HMAT Tao Xu
@ 2019-11-15  8:58 ` no-reply
  14 siblings, 0 replies; 29+ messages in thread
From: no-reply @ 2019-11-15  8:58 UTC (permalink / raw)
  To: tao3.xu
  Cc: lvivier, thuth, ehabkost, mst, qemu-devel, jingqi.liu, tao3.xu,
	fan.du, armbru, mdroth, jonathan.cameron, imammedo

Patchew URL: https://patchew.org/QEMU/20191115075352.17734-1-tao3.xu@intel.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
Type: series
Message-id: 20191115075352.17734-1-tao3.xu@intel.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/20191115083345.22638-1-tao3.xu@intel.com -> patchew/20191115083345.22638-1-tao3.xu@intel.com
Switched to a new branch 'test'
d5da20c tests/bios-tables-test: add test cases for ACPI HMAT
486185d tests/numa: Add case for QMP build HMAT
eb32946 hmat acpi: Build Memory Side Cache Information Structure(s)
e497fe7 hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)
255487a hmat acpi: Build Memory Proximity Domain Attributes Structure(s)
3af0c0b numa: Extend CLI to provide memory side cache information
56d1c80 numa: Extend CLI to provide memory latency and bandwidth information
b607b84 numa: Extend CLI to provide initiator information for numa nodes
184dd62 tests: Add test for QAPI builtin type time
b7f1b2c qapi: Add builtin type time
2018f91 util/cutils: Add qemu_strtotime_ns()
006728b util/cutils: refactor do_strtosz() to support suffixes list
8a57751 util/cutils: Use qemu_strtold_finite to parse size
eb2d1fd util/cutils: Add Add qemu_strtold and qemu_strtold_finite

=== OUTPUT BEGIN ===
1/14 Checking commit eb2d1fdb2cf0 (util/cutils: Add Add qemu_strtold and qemu_strtold_finite)
ERROR: consider using qemu_strtold in preference to strtold
#61: FILE: util/cutils.c:636:
+    *result = strtold(nptr, &ep);

total: 1 errors, 0 warnings, 69 lines checked

Patch 1/14 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

2/14 Checking commit 8a57751c17ed (util/cutils: Use qemu_strtold_finite to parse size)
3/14 Checking commit 006728bb0b4b (util/cutils: refactor do_strtosz() to support suffixes list)
4/14 Checking commit 2018f917c391 (util/cutils: Add qemu_strtotime_ns())
5/14 Checking commit b7f1b2cc1e96 (qapi: Add builtin type time)
6/14 Checking commit 184dd629c2fd (tests: Add test for QAPI builtin type time)
7/14 Checking commit b607b8499f73 (numa: Extend CLI to provide initiator information for numa nodes)
8/14 Checking commit 56d1c80175ad (numa: Extend CLI to provide memory latency and bandwidth information)
9/14 Checking commit 3af0c0b8fe71 (numa: Extend CLI to provide memory side cache information)
10/14 Checking commit 255487a44a8b (hmat acpi: Build Memory Proximity Domain Attributes Structure(s))
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#72: 
new file mode 100644

total: 0 errors, 1 warnings, 185 lines checked

Patch 10/14 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
11/14 Checking commit e497fe7c8adf (hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s))
12/14 Checking commit eb32946cae4f (hmat acpi: Build Memory Side Cache Information Structure(s))
13/14 Checking commit 486185d6f70e (tests/numa: Add case for QMP build HMAT)
WARNING: line over 80 characters
#65: FILE: tests/numa-test.c:374:
+    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'x-exit-preconfig' }")));

total: 0 errors, 1 warnings, 63 lines checked

Patch 13/14 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
14/14 Checking commit d5da20cf0514 (tests/bios-tables-test: add test cases for ACPI HMAT)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#106: 
new file mode 100644

total: 0 errors, 1 warnings, 65 lines checked

Patch 14/14 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20191115075352.17734-1-tao3.xu@intel.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v16 03/14] util/cutils: refactor do_strtosz() to support suffixes list
  2019-11-15  7:53 ` [PATCH v16 03/14] util/cutils: refactor do_strtosz() to support suffixes list Tao Xu
@ 2019-11-15 12:11   ` Philippe Mathieu-Daudé
  2019-11-18  7:35     ` Tao Xu
  0 siblings, 1 reply; 29+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-11-15 12:11 UTC (permalink / raw)
  To: Tao Xu, mst, imammedo, eblake, ehabkost, marcel.apfelbaum,
	armbru, mdroth, thuth, lvivier, Stefan Weil, Markus Armbruster
  Cc: jingqi.liu, fan.du, qemu-devel, jonathan.cameron

Cc'ing Markus & Stefan.

On 11/15/19 8:53 AM, Tao Xu wrote:
> Add do_strtomul() to convert string according to different suffixes.
> 
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> No changes in v16.
> 
> Changes in v15:
>      - Add a new patch to refactor do_strtosz() (Eduardo)
> ---
>   util/cutils.c | 72 ++++++++++++++++++++++++++++++---------------------
>   1 file changed, 42 insertions(+), 30 deletions(-)
> 
> diff --git a/util/cutils.c b/util/cutils.c
> index d94a468954..ffef92338a 100644
> --- a/util/cutils.c
> +++ b/util/cutils.c
> @@ -181,41 +181,37 @@ int fcntl_setfl(int fd, int flag)
>   }
>   #endif
>   
> -static int64_t suffix_mul(char suffix, int64_t unit)
> +static int64_t suffix_mul(const char *suffixes[], int num_suffix,
> +                          const char *endptr, int *offset, int64_t unit)
>   {
> -    switch (qemu_toupper(suffix)) {
> -    case 'B':
> -        return 1;
> -    case 'K':
> -        return unit;
> -    case 'M':
> -        return unit * unit;
> -    case 'G':
> -        return unit * unit * unit;
> -    case 'T':
> -        return unit * unit * unit * unit;
> -    case 'P':
> -        return unit * unit * unit * unit * unit;
> -    case 'E':
> -        return unit * unit * unit * unit * unit * unit;
> +    int i, suffix_len;
> +    int64_t mul = 1;
> +
> +    for (i = 0; i < num_suffix; i++) {
> +        suffix_len = strlen(suffixes[i]);
> +        if (g_ascii_strncasecmp(suffixes[i], endptr, suffix_len) == 0) {
> +            *offset = suffix_len;

So now we can parse "8kB" and "8Kb", and this might be confusing when 
parsing bit units.

   https://en.wikipedia.org/wiki/Kilobyte#Definitions_and_usage:

   IEC 80000-13 standard uses the term 'byte' to mean
   eight bits (1 B = 8 bit).

At some point we'll need to add the IEC suffix parsing to this function.

   https://en.wikipedia.org/wiki/Kibibyte#Definition

Meanwhile, can you keep it to upper case suffix only?

> +            return mul;
> +    }
> +        mul *= unit;
>       }
> +
>       return -1;
>   }
>   
>   /*
> - * Convert string to bytes, allowing either B/b for bytes, K/k for KB,
> - * M/m for MB, G/g for GB or T/t for TB. End pointer will be returned
> - * in *end, if not NULL. Return -ERANGE on overflow, and -EINVAL on
> - * other error.
> + * Convert string according to different suffixes. End pointer will be returned
> + * in *end, if not NULL. Return -ERANGE on overflow, and -EINVAL on other error.
>    */
> -static int do_strtosz(const char *nptr, const char **end,
> -                      const char default_suffix, int64_t unit,
> +static int do_strtomul(const char *nptr, const char **end,
> +                       const char *suffixes[], int num_suffix,
> +                       const char *default_suffix, int64_t unit,
>                         uint64_t *result)
>   {
>       int retval;
>       const char *endptr;
> -    unsigned char c;
>       int mul_required = 0;
> +    int offset = 0;
>       long double val, mul, integral, fraction;
>   
>       retval = qemu_strtold_finite(nptr, &endptr, &val);
> @@ -226,12 +222,12 @@ static int do_strtosz(const char *nptr, const char **end,
>       if (fraction != 0) {
>           mul_required = 1;
>       }
> -    c = *endptr;
> -    mul = suffix_mul(c, unit);
> +
> +    mul = suffix_mul(suffixes, num_suffix, endptr, &offset, unit);
>       if (mul >= 0) {
> -        endptr++;
> +        endptr += offset;
>       } else {
> -        mul = suffix_mul(default_suffix, unit);
> +        mul = suffix_mul(suffixes, num_suffix, default_suffix, &offset, unit);
>           assert(mul >= 0);
>       }
>       if (mul == 1 && mul_required) {
> @@ -256,19 +252,35 @@ out:
>       return retval;
>   }
>   
> +/*
> + * Convert string to bytes, allowing either B/b for bytes, K/k for KB,

Then also fix here "B/b for bytes".

> + * M/m for MB, G/g for GB or T/t for TB. End pointer will be returned

Shouldn't we refuse m/g/t? (m is the 'milli' suffix)

Thanks,

Phil.

> + * in *end, if not NULL. Return -ERANGE on overflow, and -EINVAL on
> + * other error.
> + */
> +static int do_strtosz(const char *nptr, const char **end,
> +                      const char *default_suffix, int64_t unit,
> +                      uint64_t *result)
> +{
> +    static const char *suffixes[] = { "B", "K", "M", "G", "T", "P", "E" };
> +
> +    return do_strtomul(nptr, end, suffixes, ARRAY_SIZE(suffixes),
> +                       default_suffix, unit, result);
> +}
> +
>   int qemu_strtosz(const char *nptr, const char **end, uint64_t *result)
>   {
> -    return do_strtosz(nptr, end, 'B', 1024, result);
> +    return do_strtosz(nptr, end, "B", 1024, result);
>   }
>   
>   int qemu_strtosz_MiB(const char *nptr, const char **end, uint64_t *result)
>   {
> -    return do_strtosz(nptr, end, 'M', 1024, result);
> +    return do_strtosz(nptr, end, "M", 1024, result);
>   }
>   
>   int qemu_strtosz_metric(const char *nptr, const char **end, uint64_t *result)
>   {
> -    return do_strtosz(nptr, end, 'B', 1000, result);
> +    return do_strtosz(nptr, end, "B", 1000, result);
>   }
>   
>   /**
> 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v16 03/14] util/cutils: refactor do_strtosz() to support suffixes list
  2019-11-15 12:11   ` Philippe Mathieu-Daudé
@ 2019-11-18  7:35     ` Tao Xu
  0 siblings, 0 replies; 29+ messages in thread
From: Tao Xu @ 2019-11-18  7:35 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, armbru
  Cc: lvivier, thuth, ehabkost, mst, jonathan.cameron, Stefan Weil, Du,
	Fan, mdroth, qemu-devel, Liu, Jingqi, imammedo

On 11/15/2019 8:11 PM, Philippe Mathieu-Daudé wrote:
> Cc'ing Markus & Stefan.
> 
> On 11/15/19 8:53 AM, Tao Xu wrote:
>> Add do_strtomul() to convert string according to different suffixes.
>>
>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
>>
>> No changes in v16.
>>
>> Changes in v15:
>>       - Add a new patch to refactor do_strtosz() (Eduardo)
>> ---
>>    util/cutils.c | 72 ++++++++++++++++++++++++++++++---------------------
>>    1 file changed, 42 insertions(+), 30 deletions(-)
>>
>> diff --git a/util/cutils.c b/util/cutils.c
>> index d94a468954..ffef92338a 100644
>> --- a/util/cutils.c
>> +++ b/util/cutils.c
>> @@ -181,41 +181,37 @@ int fcntl_setfl(int fd, int flag)
>>    }
>>    #endif
>>    
>> -static int64_t suffix_mul(char suffix, int64_t unit)
>> +static int64_t suffix_mul(const char *suffixes[], int num_suffix,
>> +                          const char *endptr, int *offset, int64_t unit)
>>    {
>> -    switch (qemu_toupper(suffix)) {
>> -    case 'B':
>> -        return 1;
>> -    case 'K':
>> -        return unit;
>> -    case 'M':
>> -        return unit * unit;
>> -    case 'G':
>> -        return unit * unit * unit;
>> -    case 'T':
>> -        return unit * unit * unit * unit;
>> -    case 'P':
>> -        return unit * unit * unit * unit * unit;
>> -    case 'E':
>> -        return unit * unit * unit * unit * unit * unit;
>> +    int i, suffix_len;
>> +    int64_t mul = 1;
>> +
>> +    for (i = 0; i < num_suffix; i++) {
>> +        suffix_len = strlen(suffixes[i]);
>> +        if (g_ascii_strncasecmp(suffixes[i], endptr, suffix_len) == 0) {
>> +            *offset = suffix_len;
> 
> So now we can parse "8kB" and "8Kb", and this might be confusing when
> parsing bit units.
> 
>     https://en.wikipedia.org/wiki/Kilobyte#Definitions_and_usage:
> 
>     IEC 80000-13 standard uses the term 'byte' to mean
>     eight bits (1 B = 8 bit).
> 
> At some point we'll need to add the IEC suffix parsing to this function.
> 
>     https://en.wikipedia.org/wiki/Kibibyte#Definition
> 
> Meanwhile, can you keep it to upper case suffix only?

Here I use g_ascii_strncasecmp() because qemu originally
use qemu_toupper(). This will not cause compatibility issue, because 
qemu use B/b for bytes, K/k for KB, M/m for MB, G/g for GB or T/t for TB 
for a long time.

I am wondering if we can add a new do_strtosz_iec() for upper case 
suffix only.
> 
>> +            return mul;
>> +    }
>> +        mul *= unit;
>>        }
>> +
>>        return -1;
>>    }
>>    
>>    /*
>> - * Convert string to bytes, allowing either B/b for bytes, K/k for KB,
>> - * M/m for MB, G/g for GB or T/t for TB. End pointer will be returned
>> - * in *end, if not NULL. Return -ERANGE on overflow, and -EINVAL on
>> - * other error.
>> + * Convert string according to different suffixes. End pointer will be returned
>> + * in *end, if not NULL. Return -ERANGE on overflow, and -EINVAL on other error.
>>     */
>> -static int do_strtosz(const char *nptr, const char **end,
>> -                      const char default_suffix, int64_t unit,
>> +static int do_strtomul(const char *nptr, const char **end,
>> +                       const char *suffixes[], int num_suffix,
>> +                       const char *default_suffix, int64_t unit,
>>                          uint64_t *result)
>>    {
>>        int retval;
>>        const char *endptr;
>> -    unsigned char c;
>>        int mul_required = 0;
>> +    int offset = 0;
>>        long double val, mul, integral, fraction;
>>    
>>        retval = qemu_strtold_finite(nptr, &endptr, &val);
>> @@ -226,12 +222,12 @@ static int do_strtosz(const char *nptr, const char **end,
>>        if (fraction != 0) {
>>            mul_required = 1;
>>        }
>> -    c = *endptr;
>> -    mul = suffix_mul(c, unit);
>> +
>> +    mul = suffix_mul(suffixes, num_suffix, endptr, &offset, unit);
>>        if (mul >= 0) {
>> -        endptr++;
>> +        endptr += offset;
>>        } else {
>> -        mul = suffix_mul(default_suffix, unit);
>> +        mul = suffix_mul(suffixes, num_suffix, default_suffix, &offset, unit);
>>            assert(mul >= 0);
>>        }
>>        if (mul == 1 && mul_required) {
>> @@ -256,19 +252,35 @@ out:
>>        return retval;
>>    }
>>    
>> +/*
>> + * Convert string to bytes, allowing either B/b for bytes, K/k for KB,
> 
> Then also fix here "B/b for bytes".
> 
>> + * M/m for MB, G/g for GB or T/t for TB. End pointer will be returned
> 
> Shouldn't we refuse m/g/t? (m is the 'milli' suffix)
> 
> Thanks,
> 
> Phil.
> 
>> + * in *end, if not NULL. Return -ERANGE on overflow, and -EINVAL on
>> + * other error.
>> + */
>> +static int do_strtosz(const char *nptr, const char **end,
>> +                      const char *default_suffix, int64_t unit,
>> +                      uint64_t *result)
>> +{
>> +    static const char *suffixes[] = { "B", "K", "M", "G", "T", "P", "E" };
>> +
>> +    return do_strtomul(nptr, end, suffixes, ARRAY_SIZE(suffixes),
>> +                       default_suffix, unit, result);
>> +}
>> +
>>    int qemu_strtosz(const char *nptr, const char **end, uint64_t *result)
>>    {
>> -    return do_strtosz(nptr, end, 'B', 1024, result);
>> +    return do_strtosz(nptr, end, "B", 1024, result);
>>    }
>>    
>>    int qemu_strtosz_MiB(const char *nptr, const char **end, uint64_t *result)
>>    {
>> -    return do_strtosz(nptr, end, 'M', 1024, result);
>> +    return do_strtosz(nptr, end, "M", 1024, result);
>>    }
>>    
>>    int qemu_strtosz_metric(const char *nptr, const char **end, uint64_t *result)
>>    {
>> -    return do_strtosz(nptr, end, 'B', 1000, result);
>> +    return do_strtosz(nptr, end, "B", 1000, result);
>>    }
>>    
>>    /**
>>
> 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v16 08/14] numa: Extend CLI to provide memory latency and bandwidth information
  2019-11-15  7:53 ` [PATCH v16 08/14] numa: Extend CLI to provide memory latency and bandwidth information Tao Xu
@ 2019-11-19 11:03   ` Igor Mammedov
  2019-11-20  7:55     ` Tao Xu
  0 siblings, 1 reply; 29+ messages in thread
From: Igor Mammedov @ 2019-11-19 11:03 UTC (permalink / raw)
  To: Tao Xu
  Cc: lvivier, thuth, ehabkost, mst, qemu-devel, jingqi.liu, fan.du,
	mdroth, armbru, jonathan.cameron

On Fri, 15 Nov 2019 15:53:46 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> From: Liu Jingqi <jingqi.liu@intel.com>
> 
> Add -numa hmat-lb option to provide System Locality Latency and
> Bandwidth Information. These memory attributes help to build
> System Locality Latency and Bandwidth Information Structure(s)
> in ACPI Heterogeneous Memory Attribute Table (HMAT).
> 
> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>

looks good to me, so

Reviewed-by: Igor Mammedov <imammedo@redhat.com>


PS:
also see question below

> ---
> 
> Changes in v16:
>     - Initialize HMAT_LB_Data lb_data (Igor)
>     - Remove punctuation from error_setg (Igor)
>     - Correct some description (Igor)
>     - Drop statement about max value (Igor)
>     - Simplify struct HMAT_LB_Info and related code, unify latency
>       and bandwidth (Igor)
> 
> Changes in v15:
>     - Change the QAPI version tag to 5.0 (Eric)
> 
> Changes in v14:
>     - Use qemu ctz64 and clz64 instead of builtin function
>     - Improve help message in qemu-options.hx
> ---
>  hw/core/numa.c        | 174 ++++++++++++++++++++++++++++++++++++++++++
>  include/sysemu/numa.h |  53 +++++++++++++
>  qapi/machine.json     |  94 ++++++++++++++++++++++-
>  qemu-options.hx       |  49 +++++++++++-
>  4 files changed, 367 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index e60da99293..47c7a96863 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -23,6 +23,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/units.h"
>  #include "sysemu/hostmem.h"
>  #include "sysemu/numa.h"
>  #include "sysemu/sysemu.h"
> @@ -198,6 +199,166 @@ void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
>      ms->numa_state->have_numa_distance = true;
>  }
>  
> +void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
> +                        Error **errp)
> +{
> +    int i, first_bit, last_bit;
> +    uint64_t max_entry, temp_base_la;
> +    NodeInfo *numa_info = numa_state->nodes;
> +    HMAT_LB_Info *hmat_lb =
> +        numa_state->hmat_lb[node->hierarchy][node->data_type];
> +    HMAT_LB_Data lb_data = {};
> +    HMAT_LB_Data *lb_temp;
> +
> +    /* Error checking */
> +    if (node->initiator > numa_state->num_nodes) {
> +        error_setg(errp, "Invalid initiator=%d, it should be less than %d",
> +                   node->initiator, numa_state->num_nodes);
> +        return;
> +    }
> +    if (node->target > numa_state->num_nodes) {
> +        error_setg(errp, "Invalid target=%d, it should be less than %d",
> +                   node->target, numa_state->num_nodes);
> +        return;
> +    }
> +    if (!numa_info[node->initiator].has_cpu) {
> +        error_setg(errp, "Invalid initiator=%d, it isn't an "
> +                   "initiator proximity domain", node->initiator);
> +        return;
> +    }
> +    if (!numa_info[node->target].present) {
> +        error_setg(errp, "The target=%d should point to an existing node",
> +                   node->target);
> +        return;
> +    }
> +
> +    if (!hmat_lb) {
> +        hmat_lb = g_malloc0(sizeof(*hmat_lb));
> +        numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
> +        hmat_lb->list = g_array_new(false, true, sizeof(HMAT_LB_Data));
> +    }
> +    hmat_lb->hierarchy = node->hierarchy;
> +    hmat_lb->data_type = node->data_type;
> +    lb_data.initiator = node->initiator;
> +    lb_data.target = node->target;
> +
> +    if (node->data_type <= HMATLB_DATA_TYPE_WRITE_LATENCY) {
> +        /* Input latency data */
> +
> +        if (!node->has_latency) {
> +            error_setg(errp, "Missing 'latency' option");
> +            return;
> +        }
> +        if (node->has_bandwidth) {
> +            error_setg(errp, "Invalid option 'bandwidth' since "
> +                       "the data type is latency");
> +            return;
> +        }
> +
> +        /* Detect duplicate configuration */
> +        for (i = 0; i < hmat_lb->list->len; i++) {
> +            lb_temp = &g_array_index(hmat_lb->list, HMAT_LB_Data, i);
> +
> +            if (node->initiator == lb_temp->initiator &&
> +                node->target == lb_temp->target) {
> +                error_setg(errp, "Duplicate configuration of the latency for "
> +                    "initiator=%d and target=%d", node->initiator,
> +                    node->target);
> +                return;
> +            }
> +        }
> +
> +        hmat_lb->base = hmat_lb->base ? hmat_lb->base : UINT64_MAX;
> +
> +        /* Calculate the temporary base and compressed latency */
> +        max_entry = node->latency;
> +        temp_base_la = 1;
> +        while (QEMU_IS_ALIGNED(max_entry, 10)) {
> +            max_entry /= 10;
> +            temp_base_la *= 10;
> +        }
> +
> +        /* Calculate the max compressed latency */
> +        hmat_lb->base = MIN(hmat_lb->base, temp_base_la);
> +        max_entry = node->latency / hmat_lb->base;
> +        hmat_lb->range_bitmap = MAX(hmat_lb->range_bitmap, max_entry);
> +
> +        /*
> +         * For latency hmat_lb->range_bitmap record the max compressed latency
> +         * which should be less than 0xFFFF (UINT16_MAX)
> +         */
> +        if (hmat_lb->range_bitmap >= UINT16_MAX) {
> +            error_setg(errp, "Latency %" PRIu64 " between initiator=%d and "
> +                       "target=%d should not differ from previously entered "
> +                       "min or max values on more than %d", node->latency,
> +                       node->initiator, node->target, UINT16_MAX - 1);
> +            return;
> +        }
> +
> +        /* Set lb_info_provided bit 0 as 1, latency information is provided */
> +        numa_info[node->target].lb_info_provided |= BIT(0);
> +        lb_data.data = node->latency;
> +    } else if (node->data_type >= HMATLB_DATA_TYPE_ACCESS_BANDWIDTH) {
> +        /* Input bandwidth data */
> +
> +        if (!node->has_bandwidth) {
> +            error_setg(errp, "Missing 'bandwidth' option");
> +            return;
> +        }
> +        if (node->has_latency) {
> +            error_setg(errp, "Invalid option 'latency' since "
> +                       "the data type is bandwidth");
> +            return;
> +        }
> +        if (!QEMU_IS_ALIGNED(node->bandwidth, MiB)) {
> +            error_setg(errp, "Bandwidth %" PRIu64 " between initiator=%d and "
> +                       "target=%d should be 1MB aligned", node->bandwidth,
> +                       node->initiator, node->target);
> +            return;
> +        }
> +
> +        /* Detect duplicate configuration */
> +        for (i = 0; i < hmat_lb->list->len; i++) {
> +            lb_temp = &g_array_index(hmat_lb->list, HMAT_LB_Data, i);
> +
> +            if (node->initiator == lb_temp->initiator &&
> +                node->target == lb_temp->target) {
> +                error_setg(errp, "Duplicate configuration of the bandwidth for "
> +                    "initiator=%d and target=%d", node->initiator,
> +                    node->target);
> +                return;
> +            }
> +        }
> +
> +        hmat_lb->range_bitmap |= node->bandwidth;
> +        first_bit = ctz64(hmat_lb->range_bitmap);
> +        hmat_lb->base = UINT64_C(1) << first_bit;
> +        max_entry = node->bandwidth / hmat_lb->base;
> +        last_bit = 64 - clz64(hmat_lb->range_bitmap);
> +
> +        /*
> +         * For bandwidth, first_bit record the base unit of bandwidth bits,
> +         * last_bit record the last bit of the max bandwidth. The max compressed
> +         * bandwidth should be less than 0xFFFF (UINT16_MAX)
> +         */
> +        if ((last_bit - first_bit) > UINT16_BITS || max_entry >= UINT16_MAX) {
                                                       ^^^^^^^^^^^^^^^^^^^
what bandwidth combination is going to trigger above condition?

> +            error_setg(errp, "Bandwidth %" PRIu64 " between initiator=%d and "
> +                       "target=%d should not differ from previously entered "
> +                       "values on more than %d", node->bandwidth,
> +                       node->initiator, node->target, UINT16_MAX - 1);
> +            return;
> +        }
> +
> +        /* Set lb_info_provided bit 1 as 1, bandwidth information is provided */
> +        numa_info[node->target].lb_info_provided |= BIT(1);
> +        lb_data.data = node->bandwidth;
> +    } else {
> +        assert(0);
> +    }
> +
> +    g_array_append_val(hmat_lb->list, lb_data);
> +}
> +
>  void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>  {
>      Error *err = NULL;
> @@ -236,6 +397,19 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>          machine_set_cpu_numa_node(ms, qapi_NumaCpuOptions_base(&object->u.cpu),
>                                    &err);
>          break;
> +    case NUMA_OPTIONS_TYPE_HMAT_LB:
> +        if (!ms->numa_state->hmat_enabled) {
> +            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
> +                       "(HMAT) is disabled, enable it with -machine hmat=on "
> +                       "before using any of hmat specific options");
> +            return;
> +        }
> +
> +        parse_numa_hmat_lb(ms->numa_state, &object->u.hmat_lb, &err);
> +        if (err) {
> +            goto end;
> +        }
> +        break;
>      default:
>          abort();
>      }
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index 788cbec7a2..70f93c83d7 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -14,11 +14,34 @@ struct CPUArchId;
>  #define NUMA_DISTANCE_MAX         254
>  #define NUMA_DISTANCE_UNREACHABLE 255
>  
> +/* the value of AcpiHmatLBInfo flags */
> +enum {
> +    HMAT_LB_MEM_MEMORY           = 0,
> +    HMAT_LB_MEM_CACHE_1ST_LEVEL  = 1,
> +    HMAT_LB_MEM_CACHE_2ND_LEVEL  = 2,
> +    HMAT_LB_MEM_CACHE_3RD_LEVEL  = 3,
> +    HMAT_LB_LEVELS   /* must be the last entry */
> +};
> +
> +/* the value of AcpiHmatLBInfo data type */
> +enum {
> +    HMAT_LB_DATA_ACCESS_LATENCY   = 0,
> +    HMAT_LB_DATA_READ_LATENCY     = 1,
> +    HMAT_LB_DATA_WRITE_LATENCY    = 2,
> +    HMAT_LB_DATA_ACCESS_BANDWIDTH = 3,
> +    HMAT_LB_DATA_READ_BANDWIDTH   = 4,
> +    HMAT_LB_DATA_WRITE_BANDWIDTH  = 5,
> +    HMAT_LB_TYPES   /* must be the last entry */
> +};
> +
> +#define UINT16_BITS       16
> +
>  struct NodeInfo {
>      uint64_t node_mem;
>      struct HostMemoryBackend *node_memdev;
>      bool present;
>      bool has_cpu;
> +    uint8_t lb_info_provided;
>      uint16_t initiator;
>      uint8_t distance[MAX_NODES];
>  };
> @@ -28,6 +51,31 @@ struct NumaNodeMem {
>      uint64_t node_plugged_mem;
>  };
>  
> +struct HMAT_LB_Data {
> +    uint8_t     initiator;
> +    uint8_t     target;
> +    uint64_t    data;
> +};
> +typedef struct HMAT_LB_Data HMAT_LB_Data;
> +
> +struct HMAT_LB_Info {
> +    /* Indicates it's memory or the specified level memory side cache. */
> +    uint8_t     hierarchy;
> +
> +    /* Present the type of data, access/read/write latency or bandwidth. */
> +    uint8_t     data_type;
> +
> +    /* The range bitmap of bandwidth for calculating common base */
> +    uint64_t    range_bitmap;
> +
> +    /* The common base unit for latencies or bandwidths */
> +    uint64_t    base;
> +
> +    /* Array to store the latencies or bandwidths */
> +    GArray      *list;
> +};
> +typedef struct HMAT_LB_Info HMAT_LB_Info;
> +
>  struct NumaState {
>      /* Number of NUMA nodes */
>      int num_nodes;
> @@ -40,11 +88,16 @@ struct NumaState {
>  
>      /* NUMA nodes information */
>      NodeInfo nodes[MAX_NODES];
> +
> +    /* NUMA nodes HMAT Locality Latency and Bandwidth Information */
> +    HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES];
>  };
>  typedef struct NumaState NumaState;
>  
>  void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
>  void parse_numa_opts(MachineState *ms);
> +void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
> +                        Error **errp);
>  void numa_complete_configuration(MachineState *ms);
>  void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
>  extern QemuOptsList qemu_numa_opts;
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 27d0e37534..67f5910400 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -426,10 +426,12 @@
>  #
>  # @cpu: property based CPU(s) to node mapping (Since: 2.10)
>  #
> +# @hmat-lb: memory latency and bandwidth information (Since: 5.0)
> +#
>  # Since: 2.1
>  ##
>  { 'enum': 'NumaOptionsType',
> -  'data': [ 'node', 'dist', 'cpu' ] }
> +  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
>  
>  ##
>  # @NumaOptions:
> @@ -444,7 +446,8 @@
>    'data': {
>      'node': 'NumaNodeOptions',
>      'dist': 'NumaDistOptions',
> -    'cpu': 'NumaCpuOptions' }}
> +    'cpu': 'NumaCpuOptions',
> +    'hmat-lb': 'NumaHmatLBOptions' }}
>  
>  ##
>  # @NumaNodeOptions:
> @@ -557,6 +560,93 @@
>     'base': 'CpuInstanceProperties',
>     'data' : {} }
>  
> +##
> +# @HmatLBMemoryHierarchy:
> +#
> +# The memory hierarchy in the System Locality Latency
> +# and Bandwidth Information Structure of HMAT (Heterogeneous
> +# Memory Attribute Table)
> +#
> +# For more information about @HmatLBMemoryHierarchy see
> +# the chapter 5.2.27.4: Table 5-146: Field "Flags" of ACPI 6.3 spec.
> +#
> +# @memory: the structure represents the memory performance
> +#
> +# @first-level: first level of memory side cache
> +#
> +# @second-level: second level of memory side cache
> +#
> +# @third-level: third level of memory side cache
> +#
> +# Since: 5.0
> +##
> +{ 'enum': 'HmatLBMemoryHierarchy',
> +  'data': [ 'memory', 'first-level', 'second-level', 'third-level' ] }
> +
> +##
> +# @HmatLBDataType:
> +#
> +# Data type in the System Locality Latency
> +# and Bandwidth Information Structure of HMAT (Heterogeneous
> +# Memory Attribute Table)
> +#
> +# For more information about @HmatLBDataType see
> +# the chapter 5.2.27.4: Table 5-146:  Field "Data Type" of ACPI 6.3 spec.
> +#
> +# @access-latency: access latency (nanoseconds)
> +#
> +# @read-latency: read latency (nanoseconds)
> +#
> +# @write-latency: write latency (nanoseconds)
> +#
> +# @access-bandwidth: access bandwidth (B/s)
> +#
> +# @read-bandwidth: read bandwidth (B/s)
> +#
> +# @write-bandwidth: write bandwidth (B/s)
> +#
> +# Since: 5.0
> +##
> +{ 'enum': 'HmatLBDataType',
> +  'data': [ 'access-latency', 'read-latency', 'write-latency',
> +            'access-bandwidth', 'read-bandwidth', 'write-bandwidth' ] }
> +
> +##
> +# @NumaHmatLBOptions:
> +#
> +# Set the system locality latency and bandwidth information
> +# between Initiator and Target proximity Domains.
> +#
> +# For more information about @NumaHmatLBOptions see
> +# the chapter 5.2.27.4: Table 5-146 of ACPI 6.3 spec.
> +#
> +# @initiator: the Initiator Proximity Domain.
> +#
> +# @target: the Target Proximity Domain.
> +#
> +# @hierarchy: the Memory Hierarchy. Indicates the performance
> +#             of memory or side cache.
> +#
> +# @data-type: presents the type of data, access/read/write
> +#             latency or hit latency.
> +#
> +# @latency: the value of latency from @initiator to @target proximity domain,
> +#           the latency unit is "ns(nanosecond)".
> +#
> +# @bandwidth: the value of bandwidth between @initiator and @target proximity
> +#             domain, the bandwidth unit is "B(/s)".
> +#
> +# Since: 5.0
> +##
> +{ 'struct': 'NumaHmatLBOptions',
> +    'data': {
> +    'initiator': 'uint16',
> +    'target': 'uint16',
> +    'hierarchy': 'HmatLBMemoryHierarchy',
> +    'data-type': 'HmatLBDataType',
> +    '*latency': 'time',
> +    '*bandwidth': 'size' }}
> +
>  ##
>  # @HostMemPolicy:
>  #
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 63f6b33322..929d275450 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -168,16 +168,19 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>      "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>      "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>      "-numa dist,src=source,dst=destination,val=distance\n"
> -    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
> +    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
> +    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
>      QEMU_ARCH_ALL)
>  STEXI
>  @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>  @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>  @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
>  @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
> +@itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{hierarchy},data-type=@var{tpye}[,latency=@var{lat}][,bandwidth=@var{bw}]
>  @findex -numa
>  Define a NUMA node and assign RAM and VCPUs to it.
>  Set the NUMA distance from a source node to a destination node.
> +Set the ACPI Heterogeneous Memory Attributes for the given nodes.
>  
>  Legacy VCPU assignment uses @samp{cpus} option where
>  @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
> @@ -256,6 +259,50 @@ specified resources, it just assigns existing resources to NUMA
>  nodes. This means that one still has to use the @option{-m},
>  @option{-smp} options to allocate RAM and VCPUs respectively.
>  
> +Use @samp{hmat-lb} to set System Locality Latency and Bandwidth Information
> +between initiator and target NUMA nodes in ACPI Heterogeneous Attribute Memory Table (HMAT).
> +Initiator NUMA node can create memory requests, usually it has one or more processors.
> +Target NUMA node contains addressable memory.
> +
> +In @samp{hmat-lb} option, @var{node} are NUMA node IDs. @var{hierarchy} is the memory
> +hierarchy of the target NUMA node: if @var{hierarchy} is 'memory', the structure
> +represents the memory performance; if @var{hierarchy} is 'first-level|second-level|third-level',
> +this structure represents aggregated performance of memory side caches for each domain.
> +@var{type} of 'data-type' is type of data represented by this structure instance:
> +if 'hierarchy' is 'memory', 'data-type' is 'access|read|write' latency or 'access|read|write'
> +bandwidth of the target memory; if 'hierarchy' is 'first-level|second-level|third-level',
> +'data-type' is 'access|read|write' hit latency or 'access|read|write' hit bandwidth of the
> +target memory side cache.
> +
> +@var{lat} is latency value, the possible value and units are NUM[ns|us|ms]
> +(nanosecond|microsecond|millisecond), the recommended unit is 'ns'. @var{bw} is
> +bandwidth value, the possible value and units are NUM[M|G|T], mean that the
> +bandwidth value are NUM byte (or MB/s, GB/s or TB/s depending on used suffix).
> +Note that if NUM is 0, means the corresponding latency or bandwidth information
> +is not provided. And if input numbers without any unit, the latency unit will be
> +'ns' and the bandwidth will be B/s.
> +
> +For example, the following options describe 2 NUMA nodes. Node 0 has 2 cpus and
> +a ram, node 1 has only a ram. The processors in node 0 access memory in node
> +0 with access-latency 5 nanoseconds, access-bandwidth is 200 MB/s;
> +The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
> +nanoseconds, access-bandwidth is 100 MB/s.
> +@example
> +-machine hmat=on \
> +-m 2G \
> +-object memory-backend-ram,size=1G,id=m0 \
> +-object memory-backend-ram,size=1G,id=m1 \
> +-smp 2 \
> +-numa node,nodeid=0,memdev=m0 \
> +-numa node,nodeid=1,memdev=m1,initiator=0 \
> +-numa cpu,node-id=0,socket-id=0 \
> +-numa cpu,node-id=0,socket-id=1 \
> +-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
> +-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M
> +@end example
> +
>  ETEXI
>  
>  DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v16 09/14] numa: Extend CLI to provide memory side cache information
  2019-11-15  7:53 ` [PATCH v16 09/14] numa: Extend CLI to provide memory side cache information Tao Xu
@ 2019-11-19 11:47   ` Igor Mammedov
  2019-11-20  6:51     ` Tao Xu
  0 siblings, 1 reply; 29+ messages in thread
From: Igor Mammedov @ 2019-11-19 11:47 UTC (permalink / raw)
  To: Tao Xu
  Cc: lvivier, thuth, ehabkost, mst, qemu-devel, jingqi.liu, fan.du,
	mdroth, Daniel Black, armbru, jonathan.cameron

On Fri, 15 Nov 2019 15:53:47 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> From: Liu Jingqi <jingqi.liu@intel.com>
> 
> Add -numa hmat-cache option to provide Memory Side Cache Information.
> These memory attributes help to build Memory Side Cache Information
> Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT).
> 
> Reviewed-by: Daniel Black <daniel@linux.ibm.com>
> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> Changes in v16:
>     - Add cross check with hmat_lb data (Igor)
>     - Drop total_levels in struct HMAT_Cache_Info (Igor)
>     - Correct the error table number (Igor)
> 
> Changes in v15:
>     - Change the QAPI version tag to 5.0 (Eric)
> 
> Changes in v13:
>     - Drop the total_levels option.
>     - Use readable cache size (Igor)
> ---
>  hw/core/numa.c        | 73 ++++++++++++++++++++++++++++++++++++++++
>  include/sysemu/numa.h | 26 +++++++++++++++
>  qapi/machine.json     | 78 +++++++++++++++++++++++++++++++++++++++++--
>  qemu-options.hx       | 16 +++++++--
>  4 files changed, 189 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index 47c7a96863..4d7af60bfd 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -359,6 +359,66 @@ void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
>      g_array_append_val(hmat_lb->list, lb_data);
>  }
>  
> +void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
> +                           Error **errp)
> +{
> +    int nb_numa_nodes = ms->numa_state->num_nodes;
> +    NodeInfo *numa_info = ms->numa_state->nodes;
> +    HMAT_Cache_Info *hmat_cache = NULL;
> +
> +    if (node->node_id >= nb_numa_nodes) {
> +        error_setg(errp, "Invalid node-id=%" PRIu32 ", it should be less "
> +                   "than %d", node->node_id, nb_numa_nodes);
> +        return;
> +    }
> +
> +    if (numa_info[node->node_id].lb_info_provided != (BIT(0) | BIT(1))) {
> +        error_setg(errp, "The latency and bandwidth information of "
> +                   "node-id=%" PRIu32 " should be provided before configure "

s/configure//

> +                   "memory side cache attributes",
> +                   node->node_id);
> +        return;
> +    }
> +
> +    if (node->level >= HMAT_LB_LEVELS) {
> +        error_setg(errp, "Invalid level=%" PRIu8 ", it should be less than or "
> +                   "equal to %d", node->level, HMAT_LB_LEVELS - 1);
> +        return;
> +    }
> +    assert(node->assoc < HMAT_CACHE_ASSOCIATIVITY__MAX);
> +    assert(node->policy < HMAT_CACHE_WRITE_POLICY__MAX);
> +    if (ms->numa_state->hmat_cache[node->node_id][node->level]) {
> +        error_setg(errp, "Duplicate configuration of the side cache for "
> +                   "node-id=%" PRIu32 " and level=%" PRIu8,
> +                   node->node_id, node->level);
> +        return;
> +    }
> +
> +    if ((node->level > 1) &&
> +        ms->numa_state->hmat_cache[node->node_id][node->level - 1] &&
will this check work as expected if I define on CLI "L+1" and then after it "L"?

> +        (node->size >=
> +            ms->numa_state->hmat_cache[node->node_id][node->level - 1]->size)) {
> +        error_setg(errp, "Invalid size=0x%" PRIx64 ", the size of level=%" PRIu8
> +                   " should be less than the size(0x%" PRIx64 ") of "
> +                   "level=%" PRIu8, node->size, node->level,
> +                   ms->numa_state->hmat_cache[node->node_id]
> +                                             [node->level - 1]->size,
> +                   node->level - 1);
> +        return;
> +    }
> +
> +    hmat_cache = g_malloc0(sizeof(*hmat_cache));
> +
> +    hmat_cache->proximity = node->node_id;
> +    hmat_cache->size = node->size;
> +    hmat_cache->level = node->level;
> +    hmat_cache->associativity = node->assoc;
> +    hmat_cache->write_policy = node->policy;
> +    hmat_cache->line_size = node->line;
> +
> +    ms->numa_state->hmat_cache[node->node_id][node->level] = hmat_cache;
> +}
> +
>  void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>  {
>      Error *err = NULL;
> @@ -410,6 +470,19 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>              goto end;
>          }
>          break;
> +    case NUMA_OPTIONS_TYPE_HMAT_CACHE:
> +        if (!ms->numa_state->hmat_enabled) {
> +            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
> +                       "(HMAT) is disabled, enable it with -machine hmat=on "
> +                       "before using any of hmat specific options");
> +            return;
> +        }
> +
> +        parse_numa_hmat_cache(ms, &object->u.hmat_cache, &err);
> +        if (err) {
> +            goto end;
> +        }
> +        break;
>      default:
>          abort();
>      }
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index 70f93c83d7..b415550678 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -76,6 +76,27 @@ struct HMAT_LB_Info {
>  };
>  typedef struct HMAT_LB_Info HMAT_LB_Info;
>  
> +struct HMAT_Cache_Info {
> +    /* The memory proximity domain to which the memory belongs. */
> +    uint32_t    proximity;
> +
> +    /* Size of memory side cache in bytes. */
> +    uint64_t    size;
> +
> +    /* Cache level described in this structure. */
> +    uint8_t     level;
> +
> +    /* Cache Associativity: None/Direct Mapped/Comple Cache Indexing */
> +    uint8_t     associativity;
> +
> +    /* Write Policy: None/Write Back(WB)/Write Through(WT) */
> +    uint8_t     write_policy;
> +
> +    /* Cache Line size in bytes. */
> +    uint16_t    line_size;
> +};
> +typedef struct HMAT_Cache_Info HMAT_Cache_Info;
Is there a reason why you use this structure instead of QAPI
generated NumaHmatCacheOptions structure?


>  struct NumaState {
>      /* Number of NUMA nodes */
>      int num_nodes;
> @@ -91,6 +112,9 @@ struct NumaState {
>  
>      /* NUMA nodes HMAT Locality Latency and Bandwidth Information */
>      HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES];
> +
> +    /* Memory Side Cache Information Structure */
> +    HMAT_Cache_Info *hmat_cache[MAX_NODES][HMAT_LB_LEVELS];
>  };
>  typedef struct NumaState NumaState;
>  
> @@ -98,6 +122,8 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
>  void parse_numa_opts(MachineState *ms);
>  void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
>                          Error **errp);
> +void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
> +                           Error **errp);
>  void numa_complete_configuration(MachineState *ms);
>  void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
>  extern QemuOptsList qemu_numa_opts;
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 67f5910400..999235bc1b 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -428,10 +428,12 @@
>  #
>  # @hmat-lb: memory latency and bandwidth information (Since: 5.0)
>  #
> +# @hmat-cache: memory side cache information (Since: 5.0)
> +#
>  # Since: 2.1
>  ##
>  { 'enum': 'NumaOptionsType',
> -  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
> +  'data': [ 'node', 'dist', 'cpu', 'hmat-lb', 'hmat-cache' ] }
>  
>  ##
>  # @NumaOptions:
> @@ -447,7 +449,8 @@
>      'node': 'NumaNodeOptions',
>      'dist': 'NumaDistOptions',
>      'cpu': 'NumaCpuOptions',
> -    'hmat-lb': 'NumaHmatLBOptions' }}
> +    'hmat-lb': 'NumaHmatLBOptions',
> +    'hmat-cache': 'NumaHmatCacheOptions' }}
>  
>  ##
>  # @NumaNodeOptions:
> @@ -647,6 +650,77 @@
>      '*latency': 'time',
>      '*bandwidth': 'size' }}
>  
> +##
> +# @HmatCacheAssociativity:
> +#
> +# Cache associativity in the Memory Side Cache
> +# Information Structure of HMAT
> +#
> +# For more information of @HmatCacheAssociativity see
> +# the chapter 5.2.27.5: Table 5-147 of ACPI 6.3 spec.
> +#
> +# @none: None
> +#
> +# @direct: Direct Mapped
> +#
> +# @complex: Complex Cache Indexing (implementation specific)
> +#
> +# Since: 5.0
> +##
> +{ 'enum': 'HmatCacheAssociativity',
> +  'data': [ 'none', 'direct', 'complex' ] }
> +
> +##
> +# @HmatCacheWritePolicy:
> +#
> +# Cache write policy in the Memory Side Cache
> +# Information Structure of HMAT
> +#
> +# For more information of @HmatCacheWritePolicy see
> +# the chapter 5.2.27.5: Table 5-147: Field "Cache Attributes" of ACPI 6.3 spec.
> +#
> +# @none: None
> +#
> +# @write-back: Write Back (WB)
> +#
> +# @write-through: Write Through (WT)
> +#
> +# Since: 5.0
> +##
> +{ 'enum': 'HmatCacheWritePolicy',
> +  'data': [ 'none', 'write-back', 'write-through' ] }
> +
> +##
> +# @NumaHmatCacheOptions:
> +#
> +# Set the memory side cache information for a given memory domain.
> +#
> +# For more information of @NumaHmatCacheOptions see
> +# the chapter 5.2.27.5: Table 5-147: Field "Cache Attributes" of ACPI 6.3 spec.
> +#
> +# @node-id: the memory proximity domain to which the memory belongs.
> +#
> +# @size: the size of memory side cache in bytes.
> +#
> +# @level: the cache level described in this structure.
> +#
> +# @assoc: the cache associativity, none/direct-mapped/complex(complex cache indexing).
> +#
> +# @policy: the write policy, none/write-back/write-through.
> +#
> +# @line: the cache Line size in bytes.
> +#
> +# Since: 5.0
> +##
> +{ 'struct': 'NumaHmatCacheOptions',
> +  'data': {
> +   'node-id': 'uint32',
> +   'size': 'size',
> +   'level': 'uint8',
> +   'assoc': 'HmatCacheAssociativity',
> +   'policy': 'HmatCacheWritePolicy',
> +   'line': 'uint16' }}
> +
>  ##
>  # @HostMemPolicy:
>  #
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 929d275450..ad0e5aa190 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -169,7 +169,8 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>      "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>      "-numa dist,src=source,dst=destination,val=distance\n"
>      "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
> -    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
> +    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n"
> +    "-numa hmat-cache,node-id=node,size=size,level=level[,assoc=none|direct|complex][,policy=none|write-back|write-through][,line=size]\n",
>      QEMU_ARCH_ALL)
>  STEXI
>  @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
> @@ -177,6 +178,7 @@ STEXI
>  @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
>  @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
>  @itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{hierarchy},data-type=@var{tpye}[,latency=@var{lat}][,bandwidth=@var{bw}]
> +@itemx -numa hmat-cache,node-id=@var{node},size=@var{size},level=@var{level}[,assoc=@var{str}][,policy=@var{str}][,line=@var{size}]
>  @findex -numa
>  Define a NUMA node and assign RAM and VCPUs to it.
>  Set the NUMA distance from a source node to a destination node.
> @@ -282,11 +284,19 @@ Note that if NUM is 0, means the corresponding latency or bandwidth information
>  is not provided. And if input numbers without any unit, the latency unit will be
>  'ns' and the bandwidth will be B/s.
>  
> +In @samp{hmat-cache} option, @var{node-id} is the NUMA-id of the memory belongs.
> +@var{size} is the size of memory side cache in bytes. @var{level} is the cache
> +level described in this structure. @var{assoc} is the cache associativity,
> +the possible value is 'none/direct(direct-mapped)/complex(complex cache indexing)'.
> +@var{policy} is the write policy. @var{line} is the cache Line size in bytes.
> +
>  For example, the following options describe 2 NUMA nodes. Node 0 has 2 cpus and
>  a ram, node 1 has only a ram. The processors in node 0 access memory in node
>  0 with access-latency 5 nanoseconds, access-bandwidth is 200 MB/s;
>  The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
>  nanoseconds, access-bandwidth is 100 MB/s.
> +And for memory side cache information, NUMA node 0 and 1 both have 1 level memory
> +cache, size is 10KB, policy is write-back, the cache Line size is 8 bytes:
>  @example
>  -machine hmat=on \
>  -m 2G \
> @@ -300,7 +310,9 @@ nanoseconds, access-bandwidth is 100 MB/s.
>  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
>  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
>  -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
> --numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M
> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \
> +-numa hmat-cache,node-id=0,size=10K,level=1,assoc=direct,policy=write-back,line=8 \
> +-numa hmat-cache,node-id=1,size=10K,level=1,assoc=direct,policy=write-back,line=8
>  @end example
>  
>  ETEXI



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v16 09/14] numa: Extend CLI to provide memory side cache information
  2019-11-19 11:47   ` Igor Mammedov
@ 2019-11-20  6:51     ` Tao Xu
  0 siblings, 0 replies; 29+ messages in thread
From: Tao Xu @ 2019-11-20  6:51 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: lvivier, thuth, ehabkost, mst, qemu-devel, Liu, Jingqi, Du, Fan,
	mdroth, Daniel Black, armbru, jonathan.cameron

On 11/19/2019 7:47 PM, Igor Mammedov wrote:
> On Fri, 15 Nov 2019 15:53:47 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> From: Liu Jingqi <jingqi.liu@intel.com>
>>
>> Add -numa hmat-cache option to provide Memory Side Cache Information.
>> These memory attributes help to build Memory Side Cache Information
>> Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT).
>>
>> Reviewed-by: Daniel Black <daniel@linux.ibm.com>
>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
>>
>> Changes in v16:
>>      - Add cross check with hmat_lb data (Igor)
>>      - Drop total_levels in struct HMAT_Cache_Info (Igor)
>>      - Correct the error table number (Igor)
>>
>> Changes in v15:
>>      - Change the QAPI version tag to 5.0 (Eric)
>>
>> Changes in v13:
>>      - Drop the total_levels option.
>>      - Use readable cache size (Igor)
>> ---
>>   hw/core/numa.c        | 73 ++++++++++++++++++++++++++++++++++++++++
>>   include/sysemu/numa.h | 26 +++++++++++++++
>>   qapi/machine.json     | 78 +++++++++++++++++++++++++++++++++++++++++--
>>   qemu-options.hx       | 16 +++++++--
>>   4 files changed, 189 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/core/numa.c b/hw/core/numa.c
>> index 47c7a96863..4d7af60bfd 100644
>> --- a/hw/core/numa.c
>> +++ b/hw/core/numa.c
>> @@ -359,6 +359,66 @@ void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
>>       g_array_append_val(hmat_lb->list, lb_data);
>>   }
>>   
>> +void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
>> +                           Error **errp)
>> +{
>> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>> +    NodeInfo *numa_info = ms->numa_state->nodes;
>> +    HMAT_Cache_Info *hmat_cache = NULL;
>> +
>> +    if (node->node_id >= nb_numa_nodes) {
>> +        error_setg(errp, "Invalid node-id=%" PRIu32 ", it should be less "
>> +                   "than %d", node->node_id, nb_numa_nodes);
>> +        return;
>> +    }
>> +
>> +    if (numa_info[node->node_id].lb_info_provided != (BIT(0) | BIT(1))) {
>> +        error_setg(errp, "The latency and bandwidth information of "
>> +                   "node-id=%" PRIu32 " should be provided before configure "
> 
> s/configure//
> 
>> +                   "memory side cache attributes",
>> +                   node->node_id);
>> +        return;
>> +    }
>> +
>> +    if (node->level >= HMAT_LB_LEVELS) {
>> +        error_setg(errp, "Invalid level=%" PRIu8 ", it should be less than or "
>> +                   "equal to %d", node->level, HMAT_LB_LEVELS - 1);
>> +        return;
>> +    }
>> +    assert(node->assoc < HMAT_CACHE_ASSOCIATIVITY__MAX);
>> +    assert(node->policy < HMAT_CACHE_WRITE_POLICY__MAX);
>> +    if (ms->numa_state->hmat_cache[node->node_id][node->level]) {
>> +        error_setg(errp, "Duplicate configuration of the side cache for "
>> +                   "node-id=%" PRIu32 " and level=%" PRIu8,
>> +                   node->node_id, node->level);
>> +        return;
>> +    }
>> +
>> +    if ((node->level > 1) &&
>> +        ms->numa_state->hmat_cache[node->node_id][node->level - 1] &&
> will this check work as expected if I define on CLI "L+1" and then after it "L"?
> 

No, I will fix it.
>> +        (node->size >=
>> +            ms->numa_state->hmat_cache[node->node_id][node->level - 1]->size)) {
>> +        error_setg(errp, "Invalid size=0x%" PRIx64 ", the size of level=%" PRIu8
>> +                   " should be less than the size(0x%" PRIx64 ") of "
>> +                   "level=%" PRIu8, node->size, node->level,
>> +                   ms->numa_state->hmat_cache[node->node_id]
>> +                                             [node->level - 1]->size,
>> +                   node->level - 1);
>> +        return;
>> +    }
>> +
>> +    hmat_cache = g_malloc0(sizeof(*hmat_cache));
>> +
>> +    hmat_cache->proximity = node->node_id;
>> +    hmat_cache->size = node->size;
>> +    hmat_cache->level = node->level;
>> +    hmat_cache->associativity = node->assoc;
>> +    hmat_cache->write_policy = node->policy;
>> +    hmat_cache->line_size = node->line;
>> +
>> +    ms->numa_state->hmat_cache[node->node_id][node->level] = hmat_cache;
>> +}
>> +
>>   void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>>   {
>>       Error *err = NULL;
>> @@ -410,6 +470,19 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>>               goto end;
>>           }
>>           break;
>> +    case NUMA_OPTIONS_TYPE_HMAT_CACHE:
>> +        if (!ms->numa_state->hmat_enabled) {
>> +            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
>> +                       "(HMAT) is disabled, enable it with -machine hmat=on "
>> +                       "before using any of hmat specific options");
>> +            return;
>> +        }
>> +
>> +        parse_numa_hmat_cache(ms, &object->u.hmat_cache, &err);
>> +        if (err) {
>> +            goto end;
>> +        }
>> +        break;
>>       default:
>>           abort();
>>       }
>> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
>> index 70f93c83d7..b415550678 100644
>> --- a/include/sysemu/numa.h
>> +++ b/include/sysemu/numa.h
>> @@ -76,6 +76,27 @@ struct HMAT_LB_Info {
>>   };
>>   typedef struct HMAT_LB_Info HMAT_LB_Info;
>>   
>> +struct HMAT_Cache_Info {
>> +    /* The memory proximity domain to which the memory belongs. */
>> +    uint32_t    proximity;
>> +
>> +    /* Size of memory side cache in bytes. */
>> +    uint64_t    size;
>> +
>> +    /* Cache level described in this structure. */
>> +    uint8_t     level;
>> +
>> +    /* Cache Associativity: None/Direct Mapped/Comple Cache Indexing */
>> +    uint8_t     associativity;
>> +
>> +    /* Write Policy: None/Write Back(WB)/Write Through(WT) */
>> +    uint8_t     write_policy;
>> +
>> +    /* Cache Line size in bytes. */
>> +    uint16_t    line_size;
>> +};
>> +typedef struct HMAT_Cache_Info HMAT_Cache_Info;
> Is there a reason why you use this structure instead of QAPI
> generated NumaHmatCacheOptions structure?
> 

Thank you for your suggestion. I will use NumaHmatCacheOptions to 
replace HMAT_Cache_Info.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v16 08/14] numa: Extend CLI to provide memory latency and bandwidth information
  2019-11-19 11:03   ` Igor Mammedov
@ 2019-11-20  7:55     ` Tao Xu
  2019-11-20 12:56       ` Igor Mammedov
  0 siblings, 1 reply; 29+ messages in thread
From: Tao Xu @ 2019-11-20  7:55 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: lvivier, thuth, ehabkost, mst, Liu, Jingqi, Du, Fan, qemu-devel,
	mdroth, jonathan.cameron, armbru

On 11/19/2019 7:03 PM, Igor Mammedov wrote:
> On Fri, 15 Nov 2019 15:53:46 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> From: Liu Jingqi <jingqi.liu@intel.com>
>>
>> Add -numa hmat-lb option to provide System Locality Latency and
>> Bandwidth Information. These memory attributes help to build
>> System Locality Latency and Bandwidth Information Structure(s)
>> in ACPI Heterogeneous Memory Attribute Table (HMAT).
>>
>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> 
> looks good to me, so
> 
> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> 
> 
> PS:
> also see question below
> 
[...]
>> +
>> +        hmat_lb->range_bitmap |= node->bandwidth;
>> +        first_bit = ctz64(hmat_lb->range_bitmap);
>> +        hmat_lb->base = UINT64_C(1) << first_bit;
>> +        max_entry = node->bandwidth / hmat_lb->base;
>> +        last_bit = 64 - clz64(hmat_lb->range_bitmap);
>> +
>> +        /*
>> +         * For bandwidth, first_bit record the base unit of bandwidth bits,
>> +         * last_bit record the last bit of the max bandwidth. The max compressed
>> +         * bandwidth should be less than 0xFFFF (UINT16_MAX)
>> +         */
>> +        if ((last_bit - first_bit) > UINT16_BITS || max_entry >= UINT16_MAX) {
>                                                         ^^^^^^^^^^^^^^^^^^^
> what bandwidth combination is going to trigger above condition?
> 
Only use (last_bit - first_bit) > UINT16_BITS, we can't trigger error if 
the max compressed bandwidth is 0xFFFF. Because in that condition, 
"last_bit - first_bit == UINT16_BITS". So I add "max_entry >= 
UINT16_MAX" to catch 0xFFFF. For example:

Combination 1 (Error):
bandwidth1   = ...0000 1111 1111 1111 1110 0000... (max_entry 32767)
range_bitmap = ...0000 1111 1111 1111 1110 0000... (range is 15 bits)
bandwidth2   = ...0000 1111 1111 1111 1111 0000... (max_entry 65535)
range_bitmap = ...0000 1111 1111 1111 1111 0000... (range is 16 bits)

Combination 2 (Error):
bandwidth1   = ...0000 1111 1111 1111 1110 0000... (max_entry 32767)
range_bitmap = ...0000 1111 1111 1111 1110 0000... (range is 15 bits)
bandwidth2   = ...0001 1111 1111 1111 1110 0000... (max_entry 65535)
range_bitmap = ...0001 1111 1111 1111 1110 0000... (range is 16 bits)

Combination 3 (OK, because bandwidth1 will be compressed to 65534):
bandwidth1   = ...0000 1111 1111 1111 1110 0000... (max_entry 32767)
range_bitmap = ...0000 1111 1111 1111 1110 0000... (range is 15 bits)
bandwidth2   = ...0000 0111 1111 1111 1111 0000... (max_entry 32767)
range_bitmap = ...0000 1111 1111 1111 1111 0000... (range is 16 bits)

Combination 4 (Error):
bandwidth1   = ...0000 1111 1111 1111 1111 0000... (max_entry 65535)
range_bitmap = ...0000 1111 1111 1111 1111 0000... (range is 16 bits)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v16 11/14] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)
  2019-11-15  7:53 ` [PATCH v16 11/14] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao Xu
@ 2019-11-20 10:09   ` Igor Mammedov
  2019-11-21  1:28     ` Tao Xu
  0 siblings, 1 reply; 29+ messages in thread
From: Igor Mammedov @ 2019-11-20 10:09 UTC (permalink / raw)
  To: Tao Xu
  Cc: lvivier, thuth, ehabkost, mst, qemu-devel, jingqi.liu, fan.du,
	mdroth, armbru, jonathan.cameron

On Fri, 15 Nov 2019 15:53:49 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> From: Liu Jingqi <jingqi.liu@intel.com>
> 
> This structure describes the memory access latency and bandwidth
> information from various memory access initiator proximity domains.
> The latency and bandwidth numbers represented in this structure
> correspond to rated latency and bandwidth for the platform.
> The software could use this information as hint for optimization.
> 
> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> Changes in v16:
>     - Add more description for lb_length (Igor)
>     - Drop entry_list and calculate entries in this patch (Igor)
> 
> Changes in v13:
>     - Calculate the entries in a new patch.
> ---
>  hw/acpi/hmat.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 104 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
> index 9ff79308a4..ed19ebed2f 100644
> --- a/hw/acpi/hmat.c
> +++ b/hw/acpi/hmat.c
> @@ -25,8 +25,10 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/units.h"
>  #include "sysemu/numa.h"
>  #include "hw/acpi/hmat.h"
> +#include "qemu/error-report.h"

do you really need this header in this patch?


modulo above nit, patch looks good so
with above fixed (if necessary)

Reviewed-by: Igor Mammedov <imammedo@redhat.com>

>  
>  /*
>   * ACPI 6.3:
> @@ -67,11 +69,89 @@ static void build_hmat_mpda(GArray *table_data, uint16_t flags,
>      build_append_int_noprefix(table_data, 0, 8);
>  }
>  
> +/*
> + * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
> + * Structure: Table 5-146
> + */
> +static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
> +                          uint32_t num_initiator, uint32_t num_target,
> +                          uint32_t *initiator_list)
> +{
> +    int i, index;
> +    HMAT_LB_Data *lb_data;
> +    uint16_t *entry_list;
> +    uint32_t base;
> +    /* Length in bytes for entire structure */
> +    uint32_t lb_length
> +        = 32 /* Table length upto and including Entry Base Unit */
> +        + 4 * num_initiator /* Initiator Proximity Domain List */
> +        + 4 * num_target /* Target Proximity Domain List */
> +        + 2 * num_initiator * num_target; /* Latency or Bandwidth Entries */
> +
> +    /* Type */
> +    build_append_int_noprefix(table_data, 1, 2);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 2);
> +    /* Length */
> +    build_append_int_noprefix(table_data, lb_length, 4);
> +    /* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
> +    assert(!(hmat_lb->hierarchy >> 4));
> +    build_append_int_noprefix(table_data, hmat_lb->hierarchy, 1);
> +    /* Data Type */
> +    build_append_int_noprefix(table_data, hmat_lb->data_type, 1);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 2);
> +    /* Number of Initiator Proximity Domains (s) */
> +    build_append_int_noprefix(table_data, num_initiator, 4);
> +    /* Number of Target Proximity Domains (t) */
> +    build_append_int_noprefix(table_data, num_target, 4);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 4);
> +
> +    /* Entry Base Unit */
> +    if (hmat_lb->data_type <= HMAT_LB_DATA_WRITE_LATENCY) {
> +        /* Convert latency base from nanoseconds to picosecond */
> +        base = hmat_lb->base * 1000;
> +    } else {
> +        /* Convert bandwidth base from Byte to Megabyte */
> +        base = hmat_lb->base / MiB;
> +    }
> +    build_append_int_noprefix(table_data, base, 8);
> +
> +    /* Initiator Proximity Domain List */
> +    for (i = 0; i < num_initiator; i++) {
> +        build_append_int_noprefix(table_data, initiator_list[i], 4);
> +    }
> +
> +    /* Target Proximity Domain List */
> +    for (i = 0; i < num_target; i++) {
> +        build_append_int_noprefix(table_data, i, 4);
> +    }
> +
> +    /* Latency or Bandwidth Entries */
> +    entry_list = g_malloc0(hmat_lb->list->len * sizeof(uint16_t));
> +    for (i = 0; i < hmat_lb->list->len; i++) {
> +        lb_data = &g_array_index(hmat_lb->list, HMAT_LB_Data, i);
> +        index = lb_data->initiator * num_target + lb_data->target;
> +
> +        entry_list[index] = (uint16_t)(lb_data->data / hmat_lb->base);
> +    }
> +
> +    for (i = 0; i < num_initiator * num_target; i++) {
> +        build_append_int_noprefix(table_data, entry_list[i], 2);
> +    }
> +
> +    g_free(entry_list);
> +}
> +
>  /* Build HMAT sub table structures */
>  static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
>  {
>      uint16_t flags;
> -    int i;
> +    uint32_t num_initiator = 0;
> +    uint32_t initiator_list[MAX_NODES];
> +    int i, hierarchy, type;
> +    HMAT_LB_Info *hmat_lb;
>  
>      for (i = 0; i < numa_state->num_nodes; i++) {
>          flags = 0;
> @@ -82,6 +162,29 @@ static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
>  
>          build_hmat_mpda(table_data, flags, numa_state->nodes[i].initiator, i);
>      }
> +
> +    for (i = 0; i < numa_state->num_nodes; i++) {
> +        if (numa_state->nodes[i].has_cpu) {
> +            initiator_list[num_initiator++] = i;
> +        }
> +    }
> +
> +    /*
> +     * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
> +     * Structure: Table 5-146
> +     */
> +    for (hierarchy = HMAT_LB_MEM_MEMORY;
> +         hierarchy <= HMAT_LB_MEM_CACHE_3RD_LEVEL; hierarchy++) {
> +        for (type = HMAT_LB_DATA_ACCESS_LATENCY;
> +             type <= HMAT_LB_DATA_WRITE_BANDWIDTH; type++) {
> +            hmat_lb = numa_state->hmat_lb[hierarchy][type];
> +
> +            if (hmat_lb) {
> +                build_hmat_lb(table_data, hmat_lb, num_initiator,
> +                              numa_state->num_nodes, initiator_list);
> +            }
> +        }
> +    }
>  }
>  
>  void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *numa_state)



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v16 13/14] tests/numa: Add case for QMP build HMAT
  2019-11-15  7:53 ` [PATCH v16 13/14] tests/numa: Add case for QMP build HMAT Tao Xu
@ 2019-11-20 12:32   ` Igor Mammedov
  2019-11-21  0:56     ` Tao Xu
  0 siblings, 1 reply; 29+ messages in thread
From: Igor Mammedov @ 2019-11-20 12:32 UTC (permalink / raw)
  To: Tao Xu
  Cc: lvivier, thuth, ehabkost, mst, qemu-devel, jingqi.liu, fan.du,
	mdroth, armbru, jonathan.cameron

On Fri, 15 Nov 2019 15:53:51 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> Check configuring HMAT usecase
> 
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> New patch in v16.
> ---
>  tests/numa-test.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)

I'd also add X-FAIL variants here, to test fail conditions.
Taking in account that QMP interface returns error without
affecting QEMU state, you can do it within one test case without
restarting it on every fail scenario.
(just add appropriate comments so reader would know that you are
testing this and that failure path)

So I'd first test x-fail variants and then finish test with
valid configuration.

> 
> diff --git a/tests/numa-test.c b/tests/numa-test.c
> index 8de8581231..15889c26c0 100644
> --- a/tests/numa-test.c
> +++ b/tests/numa-test.c
> @@ -327,6 +327,56 @@ static void pc_dynamic_cpu_cfg(const void *data)
>      qtest_quit(qs);
>  }
>  
> +static void pc_build_hmat_cfg(const void *data)
> +{
> +    QTestState *qs;
> +
> +    qs = qtest_initf("%s -nodefaults --preconfig -machine hmat=on "
> +                     "-smp 2,sockets=2 "
> +                     "-m 128M,slots=2,maxmem=1G "
> +                     "-object memory-backend-ram,size=64M,id=m0 "
> +                     "-object memory-backend-ram,size=64M,id=m1 "
> +                     "-numa node,nodeid=0,memdev=m0 "
> +                     "-numa node,nodeid=1,memdev=m1,initiator=0 "
> +                     "-numa cpu,node-id=0,socket-id=0 "
> +                     "-numa cpu,node-id=0,socket-id=1",
> +                     data ? (char *)data : "");
> +
> +    /* Configuring HMAT bandwidth and latency details */
> +    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
> +        " 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 0,"
> +        " 'hierarchy': \"memory\", 'data-type': \"access-latency\","
> +        " 'latency': 5 } }")));
> +    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
> +        " 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 0,"
> +        " 'hierarchy': \"memory\", 'data-type': \"access-bandwidth\","
> +        " 'bandwidth': 524288000 } }")));
> +    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
> +        " 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 1,"
> +        " 'hierarchy': \"memory\", 'data-type': \"access-latency\","
> +        " 'latency': 10 } }")));
> +    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
> +        " 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 1,"
> +        " 'hierarchy': \"memory\", 'data-type': \"access-bandwidth\","
> +        " 'bandwidth': 104857600 } }")));
> +
> +    /* Configuring HMAT memory side cache attributes */
> +    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
> +        " 'arguments': { 'type': 'hmat-cache', 'node-id': 0, 'size': 10240,"
> +        " 'level': 1, 'assoc': \"direct\", 'policy': \"write-back\","
> +        " 'line': 8 } }")));
> +    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
> +        " 'arguments': { 'type': 'hmat-cache', 'node-id': 1, 'size': 10240,"
> +        " 'level': 1, 'assoc': \"direct\", 'policy': \"write-back\","
> +        " 'line': 8 } }")));
> +
> +    /* let machine initialization to complete and run */
> +    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'x-exit-preconfig' }")));
> +    qtest_qmp_eventwait(qs, "RESUME");
> +
> +    qtest_quit(qs);
> +}
> +
>  int main(int argc, char **argv)
>  {
>      const char *args = NULL;
> @@ -346,6 +396,7 @@ int main(int argc, char **argv)
>      if (!strcmp(arch, "i386") || !strcmp(arch, "x86_64")) {
>          qtest_add_data_func("/numa/pc/cpu/explicit", args, pc_numa_cpu);
>          qtest_add_data_func("/numa/pc/dynamic/cpu", args, pc_dynamic_cpu_cfg);
> +        qtest_add_data_func("/numa/pc/build/hmat", args, pc_build_hmat_cfg);
>      }
>  
>      if (!strcmp(arch, "ppc64")) {



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v16 12/14] hmat acpi: Build Memory Side Cache Information Structure(s)
  2019-11-15  7:53 ` [PATCH v16 12/14] hmat acpi: Build Memory Side Cache " Tao Xu
@ 2019-11-20 12:50   ` Igor Mammedov
  0 siblings, 0 replies; 29+ messages in thread
From: Igor Mammedov @ 2019-11-20 12:50 UTC (permalink / raw)
  To: Tao Xu
  Cc: lvivier, thuth, ehabkost, mst, qemu-devel, jingqi.liu, fan.du,
	mdroth, Daniel Black, armbru, jonathan.cameron

On Fri, 15 Nov 2019 15:53:50 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> From: Liu Jingqi <jingqi.liu@intel.com>
> 
> This structure describes memory side cache information for memory
> proximity domains if the memory side cache is present and the
> physical device forms the memory side cache.
> The software could use this information to effectively place
> the data in memory to maximize the performance of the system
> memory that use the memory side cache.
> 
> Reviewed-by: Daniel Black <daniel@linux.ibm.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>

looks good, but I'll skip the patch this round
since it will be changed by HMAT_Cache_Info removal in [9/14]

> ---
> 
> Changes in v16:
>     - Use checks and assert to replace masks (Igor)
>     - Fields in Cache Attributes are promoted to uint32_t before
>       shifting (Igor)
>     - Drop cpu_to_le32() (Igor)
> 
> Changes in v13:
>     - rename level as cache_level
> ---
>  hw/acpi/hmat.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 68 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
> index ed19ebed2f..2b4f760e0e 100644
> --- a/hw/acpi/hmat.c
> +++ b/hw/acpi/hmat.c
> @@ -144,14 +144,62 @@ static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
>      g_free(entry_list);
>  }
>  
> +/* ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure: Table 5-147 */
> +static void build_hmat_cache(GArray *table_data, HMAT_Cache_Info *hmat_cache,
> +                             uint8_t total_levels)
> +{
> +    /*
> +     * Cache Attributes: Bits [3:0] – Total Cache Levels
> +     * for this Memory Proximity Domain
> +     */
> +    uint32_t cache_attr = total_levels;
> +
> +    /* Bits [7:4] : Cache Level described in this structure */
> +    cache_attr |= (uint32_t) hmat_cache->level << 4;
> +
> +    /* Bits [11:8] - Cache Associativity */
> +    cache_attr |= (uint32_t) hmat_cache->associativity << 8;
> +
> +    /* Bits [15:12] - Write Policy */
> +    cache_attr |= (uint32_t) hmat_cache->write_policy << 12;
> +
> +    /* Bits [31:16] - Cache Line size in bytes */
> +    cache_attr |= (uint32_t) hmat_cache->line_size << 16;
> +
> +    /* Type */
> +    build_append_int_noprefix(table_data, 2, 2);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 2);
> +    /* Length */
> +    build_append_int_noprefix(table_data, 32, 4);
> +    /* Proximity Domain for the Memory */
> +    build_append_int_noprefix(table_data, hmat_cache->proximity, 4);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 4);
> +    /* Memory Side Cache Size */
> +    build_append_int_noprefix(table_data, hmat_cache->size, 8);
> +    /* Cache Attributes */
> +    build_append_int_noprefix(table_data, cache_attr, 4);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 2);
> +    /*
> +     * Number of SMBIOS handles (n)
> +     * Linux kernel uses Memory Side Cache Information Structure
> +     * without SMBIOS entries for now, so set Number of SMBIOS handles
> +     * as 0.
> +     */
> +    build_append_int_noprefix(table_data, 0, 2);
> +}
> +
>  /* Build HMAT sub table structures */
>  static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
>  {
>      uint16_t flags;
>      uint32_t num_initiator = 0;
>      uint32_t initiator_list[MAX_NODES];
> -    int i, hierarchy, type;
> +    int i, hierarchy, type, cache_level, total_levels;
>      HMAT_LB_Info *hmat_lb;
> +    HMAT_Cache_Info *hmat_cache;
>  
>      for (i = 0; i < numa_state->num_nodes; i++) {
>          flags = 0;
> @@ -185,6 +233,25 @@ static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
>              }
>          }
>      }
> +
> +    /*
> +     * ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure:
> +     * Table 5-147
> +     */
> +    for (i = 0; i < numa_state->num_nodes; i++) {
> +        total_levels = 0;
> +        for (cache_level = 1; cache_level < HMAT_LB_LEVELS; cache_level++) {
> +            if (numa_state->hmat_cache[i][cache_level]) {
> +                total_levels++;
> +            }
> +        }
> +        for (cache_level = 0; cache_level <= total_levels; cache_level++) {
> +            hmat_cache = numa_state->hmat_cache[i][cache_level];
> +            if (hmat_cache) {
> +                build_hmat_cache(table_data, hmat_cache, total_levels);
> +            }
> +        }
> +    }
>  }
>  
>  void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *numa_state)



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v16 08/14] numa: Extend CLI to provide memory latency and bandwidth information
  2019-11-20  7:55     ` Tao Xu
@ 2019-11-20 12:56       ` Igor Mammedov
  2019-11-21  1:07         ` Tao Xu
  0 siblings, 1 reply; 29+ messages in thread
From: Igor Mammedov @ 2019-11-20 12:56 UTC (permalink / raw)
  To: Tao Xu
  Cc: lvivier, thuth, ehabkost, mst, Liu, Jingqi, Du, Fan, qemu-devel,
	mdroth, jonathan.cameron, armbru

On Wed, 20 Nov 2019 15:55:04 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> On 11/19/2019 7:03 PM, Igor Mammedov wrote:
> > On Fri, 15 Nov 2019 15:53:46 +0800
> > Tao Xu <tao3.xu@intel.com> wrote:
> >   
> >> From: Liu Jingqi <jingqi.liu@intel.com>
> >>
> >> Add -numa hmat-lb option to provide System Locality Latency and
> >> Bandwidth Information. These memory attributes help to build
> >> System Locality Latency and Bandwidth Information Structure(s)
> >> in ACPI Heterogeneous Memory Attribute Table (HMAT).
> >>
> >> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> >> Signed-off-by: Tao Xu <tao3.xu@intel.com>  
> > 
> > looks good to me, so
> > 
> > Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> > 
> > 
> > PS:
> > also see question below
> >   
> [...]
> >> +
> >> +        hmat_lb->range_bitmap |= node->bandwidth;
> >> +        first_bit = ctz64(hmat_lb->range_bitmap);
> >> +        hmat_lb->base = UINT64_C(1) << first_bit;
> >> +        max_entry = node->bandwidth / hmat_lb->base;
> >> +        last_bit = 64 - clz64(hmat_lb->range_bitmap);
> >> +
> >> +        /*
> >> +         * For bandwidth, first_bit record the base unit of bandwidth bits,
> >> +         * last_bit record the last bit of the max bandwidth. The max compressed
> >> +         * bandwidth should be less than 0xFFFF (UINT16_MAX)
> >> +         */
> >> +        if ((last_bit - first_bit) > UINT16_BITS || max_entry >= UINT16_MAX) {  
> >                                                         ^^^^^^^^^^^^^^^^^^^
> > what bandwidth combination is going to trigger above condition?
> >   
> Only use (last_bit - first_bit) > UINT16_BITS, we can't trigger error if 
> the max compressed bandwidth is 0xFFFF. Because in that condition, 
> "last_bit - first_bit == UINT16_BITS". So I add "max_entry >= 
> UINT16_MAX" to catch 0xFFFF. For example:
> 
> Combination 1 (Error):
> bandwidth1   = ...0000 1111 1111 1111 1110 0000... (max_entry 32767)
> range_bitmap = ...0000 1111 1111 1111 1110 0000... (range is 15 bits)
> bandwidth2   = ...0000 1111 1111 1111 1111 0000... (max_entry 65535)
> range_bitmap = ...0000 1111 1111 1111 1111 0000... (range is 16 bits)
> 
> Combination 2 (Error):
> bandwidth1   = ...0000 1111 1111 1111 1110 0000... (max_entry 32767)
> range_bitmap = ...0000 1111 1111 1111 1110 0000... (range is 15 bits)
> bandwidth2   = ...0001 1111 1111 1111 1110 0000... (max_entry 65535)
> range_bitmap = ...0001 1111 1111 1111 1110 0000... (range is 16 bits)
> 
> Combination 3 (OK, because bandwidth1 will be compressed to 65534):
> bandwidth1   = ...0000 1111 1111 1111 1110 0000... (max_entry 32767)
> range_bitmap = ...0000 1111 1111 1111 1110 0000... (range is 15 bits)
> bandwidth2   = ...0000 0111 1111 1111 1111 0000... (max_entry 32767)
> range_bitmap = ...0000 1111 1111 1111 1111 0000... (range is 16 bits)
> 
> Combination 4 (Error):
> bandwidth1   = ...0000 1111 1111 1111 1111 0000... (max_entry 65535)
> range_bitmap = ...0000 1111 1111 1111 1111 0000... (range is 16 bits)

ok, I'd use in max/min possible values in bios-tables-test,
to make sure that we are testing whole range and would be able
to detect a error in case the valid ranges regressed (shrink)
and x-fail tests I've asked for in QMP test should detect
error other way around.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v16 13/14] tests/numa: Add case for QMP build HMAT
  2019-11-20 12:32   ` Igor Mammedov
@ 2019-11-21  0:56     ` Tao Xu
  0 siblings, 0 replies; 29+ messages in thread
From: Tao Xu @ 2019-11-21  0:56 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: lvivier, thuth, ehabkost, mst, qemu-devel, Liu, Jingqi, Du, Fan,
	mdroth, armbru, jonathan.cameron

On 11/20/2019 8:32 PM, Igor Mammedov wrote:
> On Fri, 15 Nov 2019 15:53:51 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> Check configuring HMAT usecase
>>
>> Suggested-by: Igor Mammedov <imammedo@redhat.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
>>
>> New patch in v16.
>> ---
>>   tests/numa-test.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 51 insertions(+)
> 
> I'd also add X-FAIL variants here, to test fail conditions.
> Taking in account that QMP interface returns error without
> affecting QEMU state, you can do it within one test case without
> restarting it on every fail scenario.
> (just add appropriate comments so reader would know that you are
> testing this and that failure path)
> 
> So I'd first test x-fail variants and then finish test with
> valid configuration.
> 

Thank you for your suggestion. I will add it in next version.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v16 08/14] numa: Extend CLI to provide memory latency and bandwidth information
  2019-11-20 12:56       ` Igor Mammedov
@ 2019-11-21  1:07         ` Tao Xu
  0 siblings, 0 replies; 29+ messages in thread
From: Tao Xu @ 2019-11-21  1:07 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: lvivier, thuth, ehabkost, mst, Liu, Jingqi, Du, Fan, qemu-devel,
	mdroth, jonathan.cameron, armbru

On 11/20/2019 8:56 PM, Igor Mammedov wrote:
> On Wed, 20 Nov 2019 15:55:04 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> On 11/19/2019 7:03 PM, Igor Mammedov wrote:
>>> On Fri, 15 Nov 2019 15:53:46 +0800
>>> Tao Xu <tao3.xu@intel.com> wrote:
>>>    
>>>> From: Liu Jingqi <jingqi.liu@intel.com>
>>>>
>>>> Add -numa hmat-lb option to provide System Locality Latency and
>>>> Bandwidth Information. These memory attributes help to build
>>>> System Locality Latency and Bandwidth Information Structure(s)
>>>> in ACPI Heterogeneous Memory Attribute Table (HMAT).
>>>>
>>>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
>>>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>>>
>>> looks good to me, so
>>>
>>> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
>>>
>>>
>>> PS:
>>> also see question below
>>>    
>> [...]
>>>> +
>>>> +        hmat_lb->range_bitmap |= node->bandwidth;
>>>> +        first_bit = ctz64(hmat_lb->range_bitmap);
>>>> +        hmat_lb->base = UINT64_C(1) << first_bit;
>>>> +        max_entry = node->bandwidth / hmat_lb->base;
>>>> +        last_bit = 64 - clz64(hmat_lb->range_bitmap);
>>>> +
>>>> +        /*
>>>> +         * For bandwidth, first_bit record the base unit of bandwidth bits,
>>>> +         * last_bit record the last bit of the max bandwidth. The max compressed
>>>> +         * bandwidth should be less than 0xFFFF (UINT16_MAX)
>>>> +         */
>>>> +        if ((last_bit - first_bit) > UINT16_BITS || max_entry >= UINT16_MAX) {
>>>                                                          ^^^^^^^^^^^^^^^^^^^
>>> what bandwidth combination is going to trigger above condition?
>>>    
>> Only use (last_bit - first_bit) > UINT16_BITS, we can't trigger error if
>> the max compressed bandwidth is 0xFFFF. Because in that condition,
>> "last_bit - first_bit == UINT16_BITS". So I add "max_entry >=
>> UINT16_MAX" to catch 0xFFFF. For example:
>>
>> Combination 1 (Error):
>> bandwidth1   = ...0000 1111 1111 1111 1110 0000... (max_entry 32767)
>> range_bitmap = ...0000 1111 1111 1111 1110 0000... (range is 15 bits)
>> bandwidth2   = ...0000 1111 1111 1111 1111 0000... (max_entry 65535)
>> range_bitmap = ...0000 1111 1111 1111 1111 0000... (range is 16 bits)
>>
>> Combination 2 (Error):
>> bandwidth1   = ...0000 1111 1111 1111 1110 0000... (max_entry 32767)
>> range_bitmap = ...0000 1111 1111 1111 1110 0000... (range is 15 bits)
>> bandwidth2   = ...0001 1111 1111 1111 1110 0000... (max_entry 65535)
>> range_bitmap = ...0001 1111 1111 1111 1110 0000... (range is 16 bits)
>>
>> Combination 3 (OK, because bandwidth1 will be compressed to 65534):
>> bandwidth1   = ...0000 1111 1111 1111 1110 0000... (max_entry 32767)
>> range_bitmap = ...0000 1111 1111 1111 1110 0000... (range is 15 bits)
>> bandwidth2   = ...0000 0111 1111 1111 1111 0000... (max_entry 32767)
>> range_bitmap = ...0000 1111 1111 1111 1111 0000... (range is 16 bits)
>>
>> Combination 4 (Error):
>> bandwidth1   = ...0000 1111 1111 1111 1111 0000... (max_entry 65535)
>> range_bitmap = ...0000 1111 1111 1111 1111 0000... (range is 16 bits)
> 
> ok, I'd use in max/min possible values in bios-tables-test,
> to make sure that we are testing whole range and would be able
> to detect a error in case the valid ranges regressed (shrink)
> and x-fail tests I've asked for in QMP test should detect
> error other way around.
> 
OK I will add these tests.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v16 11/14] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)
  2019-11-20 10:09   ` Igor Mammedov
@ 2019-11-21  1:28     ` Tao Xu
  0 siblings, 0 replies; 29+ messages in thread
From: Tao Xu @ 2019-11-21  1:28 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: lvivier, thuth, ehabkost, mst, qemu-devel, Liu, Jingqi, Du, Fan,
	mdroth, armbru, jonathan.cameron

On 11/20/2019 6:09 PM, Igor Mammedov wrote:
> On Fri, 15 Nov 2019 15:53:49 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> From: Liu Jingqi <jingqi.liu@intel.com>
>>
>> This structure describes the memory access latency and bandwidth
>> information from various memory access initiator proximity domains.
>> The latency and bandwidth numbers represented in this structure
>> correspond to rated latency and bandwidth for the platform.
>> The software could use this information as hint for optimization.
>>
>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
>>
>> Changes in v16:
>>      - Add more description for lb_length (Igor)
>>      - Drop entry_list and calculate entries in this patch (Igor)
>>
>> Changes in v13:
>>      - Calculate the entries in a new patch.
>> ---
>>   hw/acpi/hmat.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 104 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
>> index 9ff79308a4..ed19ebed2f 100644
>> --- a/hw/acpi/hmat.c
>> +++ b/hw/acpi/hmat.c
>> @@ -25,8 +25,10 @@
>>    */
>>   
>>   #include "qemu/osdep.h"
>> +#include "qemu/units.h"
>>   #include "sysemu/numa.h"
>>   #include "hw/acpi/hmat.h"
>> +#include "qemu/error-report.h"
> 
> do you really need this header in this patch?
> 

I will drop this header in next version
> 
> modulo above nit, patch looks good so
> with above fixed (if necessary)
> 
> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> 
>>   
>>   /*
>>    * ACPI 6.3:
>> @@ -67,11 +69,89 @@ static void build_hmat_mpda(GArray *table_data, uint16_t flags,
>>       build_append_int_noprefix(table_data, 0, 8);
>>   }
>>   
>> +/*
>> + * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
>> + * Structure: Table 5-146
>> + */
>> +static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
>> +                          uint32_t num_initiator, uint32_t num_target,
>> +                          uint32_t *initiator_list)
>> +{
>> +    int i, index;
>> +    HMAT_LB_Data *lb_data;
>> +    uint16_t *entry_list;
>> +    uint32_t base;
>> +    /* Length in bytes for entire structure */
>> +    uint32_t lb_length
>> +        = 32 /* Table length upto and including Entry Base Unit */
>> +        + 4 * num_initiator /* Initiator Proximity Domain List */
>> +        + 4 * num_target /* Target Proximity Domain List */
>> +        + 2 * num_initiator * num_target; /* Latency or Bandwidth Entries */
>> +
>> +    /* Type */
>> +    build_append_int_noprefix(table_data, 1, 2);
>> +    /* Reserved */
>> +    build_append_int_noprefix(table_data, 0, 2);
>> +    /* Length */
>> +    build_append_int_noprefix(table_data, lb_length, 4);
>> +    /* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
>> +    assert(!(hmat_lb->hierarchy >> 4));
>> +    build_append_int_noprefix(table_data, hmat_lb->hierarchy, 1);
>> +    /* Data Type */
>> +    build_append_int_noprefix(table_data, hmat_lb->data_type, 1);
>> +    /* Reserved */
>> +    build_append_int_noprefix(table_data, 0, 2);
>> +    /* Number of Initiator Proximity Domains (s) */
>> +    build_append_int_noprefix(table_data, num_initiator, 4);
>> +    /* Number of Target Proximity Domains (t) */
>> +    build_append_int_noprefix(table_data, num_target, 4);
>> +    /* Reserved */
>> +    build_append_int_noprefix(table_data, 0, 4);
>> +
>> +    /* Entry Base Unit */
>> +    if (hmat_lb->data_type <= HMAT_LB_DATA_WRITE_LATENCY) {
>> +        /* Convert latency base from nanoseconds to picosecond */
>> +        base = hmat_lb->base * 1000;
>> +    } else {
>> +        /* Convert bandwidth base from Byte to Megabyte */
>> +        base = hmat_lb->base / MiB;
>> +    }
>> +    build_append_int_noprefix(table_data, base, 8);
>> +
>> +    /* Initiator Proximity Domain List */
>> +    for (i = 0; i < num_initiator; i++) {
>> +        build_append_int_noprefix(table_data, initiator_list[i], 4);
>> +    }
>> +
>> +    /* Target Proximity Domain List */
>> +    for (i = 0; i < num_target; i++) {
>> +        build_append_int_noprefix(table_data, i, 4);
>> +    }
>> +
>> +    /* Latency or Bandwidth Entries */
>> +    entry_list = g_malloc0(hmat_lb->list->len * sizeof(uint16_t));
>> +    for (i = 0; i < hmat_lb->list->len; i++) {
>> +        lb_data = &g_array_index(hmat_lb->list, HMAT_LB_Data, i);
>> +        index = lb_data->initiator * num_target + lb_data->target;
>> +
>> +        entry_list[index] = (uint16_t)(lb_data->data / hmat_lb->base);
>> +    }
>> +
>> +    for (i = 0; i < num_initiator * num_target; i++) {
>> +        build_append_int_noprefix(table_data, entry_list[i], 2);
>> +    }
>> +
>> +    g_free(entry_list);
>> +}
>> +
>>   /* Build HMAT sub table structures */
>>   static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
>>   {
>>       uint16_t flags;
>> -    int i;
>> +    uint32_t num_initiator = 0;
>> +    uint32_t initiator_list[MAX_NODES];
>> +    int i, hierarchy, type;
>> +    HMAT_LB_Info *hmat_lb;
>>   
>>       for (i = 0; i < numa_state->num_nodes; i++) {
>>           flags = 0;
>> @@ -82,6 +162,29 @@ static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
>>   
>>           build_hmat_mpda(table_data, flags, numa_state->nodes[i].initiator, i);
>>       }
>> +
>> +    for (i = 0; i < numa_state->num_nodes; i++) {
>> +        if (numa_state->nodes[i].has_cpu) {
>> +            initiator_list[num_initiator++] = i;
>> +        }
>> +    }
>> +
>> +    /*
>> +     * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
>> +     * Structure: Table 5-146
>> +     */
>> +    for (hierarchy = HMAT_LB_MEM_MEMORY;
>> +         hierarchy <= HMAT_LB_MEM_CACHE_3RD_LEVEL; hierarchy++) {
>> +        for (type = HMAT_LB_DATA_ACCESS_LATENCY;
>> +             type <= HMAT_LB_DATA_WRITE_BANDWIDTH; type++) {
>> +            hmat_lb = numa_state->hmat_lb[hierarchy][type];
>> +
>> +            if (hmat_lb) {
>> +                build_hmat_lb(table_data, hmat_lb, num_initiator,
>> +                              numa_state->num_nodes, initiator_list);
>> +            }
>> +        }
>> +    }
>>   }
>>   
>>   void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *numa_state)
> 



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2019-11-21  1:29 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-15  7:53 [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
2019-11-15  7:53 ` [PATCH v16 01/14] util/cutils: Add Add qemu_strtold and qemu_strtold_finite Tao Xu
2019-11-15  7:53 ` [PATCH v16 02/14] util/cutils: Use qemu_strtold_finite to parse size Tao Xu
2019-11-15  7:53 ` [PATCH v16 03/14] util/cutils: refactor do_strtosz() to support suffixes list Tao Xu
2019-11-15 12:11   ` Philippe Mathieu-Daudé
2019-11-18  7:35     ` Tao Xu
2019-11-15  7:53 ` [PATCH v16 04/14] util/cutils: Add qemu_strtotime_ns() Tao Xu
2019-11-15  7:53 ` [PATCH v16 05/14] qapi: Add builtin type time Tao Xu
2019-11-15  7:53 ` [PATCH v16 06/14] tests: Add test for QAPI " Tao Xu
2019-11-15  7:53 ` [PATCH v16 07/14] numa: Extend CLI to provide initiator information for numa nodes Tao Xu
2019-11-15  7:53 ` [PATCH v16 08/14] numa: Extend CLI to provide memory latency and bandwidth information Tao Xu
2019-11-19 11:03   ` Igor Mammedov
2019-11-20  7:55     ` Tao Xu
2019-11-20 12:56       ` Igor Mammedov
2019-11-21  1:07         ` Tao Xu
2019-11-15  7:53 ` [PATCH v16 09/14] numa: Extend CLI to provide memory side cache information Tao Xu
2019-11-19 11:47   ` Igor Mammedov
2019-11-20  6:51     ` Tao Xu
2019-11-15  7:53 ` [PATCH v16 10/14] hmat acpi: Build Memory Proximity Domain Attributes Structure(s) Tao Xu
2019-11-15  7:53 ` [PATCH v16 11/14] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao Xu
2019-11-20 10:09   ` Igor Mammedov
2019-11-21  1:28     ` Tao Xu
2019-11-15  7:53 ` [PATCH v16 12/14] hmat acpi: Build Memory Side Cache " Tao Xu
2019-11-20 12:50   ` Igor Mammedov
2019-11-15  7:53 ` [PATCH v16 13/14] tests/numa: Add case for QMP build HMAT Tao Xu
2019-11-20 12:32   ` Igor Mammedov
2019-11-21  0:56     ` Tao Xu
2019-11-15  7:53 ` [PATCH v16 14/14] tests/bios-tables-test: add test cases for ACPI HMAT Tao Xu
2019-11-15  8:58 ` [PATCH v16 00/14] Build ACPI Heterogeneous Memory Attribute Table (HMAT) no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.