All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug
@ 2013-06-26  9:13 Hu Tao
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 01/14] qapi: make visit_type_size fallback to v->type_int() Hu Tao
                   ` (15 more replies)
  0 siblings, 16 replies; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vasilis Liaskovitis

It's been quite a while since v4 and lots of changes happend
in qemu and v4 just can't apply anymore. So this series is
basically a rebase. Another purpose is to bring up discussions
to make consensus on some questions since v4, see
http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg01219.html
and http://lists.gnu.org/archive/html/qemu-devel/2013-02/msg05332.html

This series works with seabios counterpart.

changes from v4:

  - rebased on a recent qemu-git
  - based on another series which seperates i440fx refactor from v4.
    http://lists.nongnu.org/archive/html/qemu-devel/2013-06/msg03500.html
  - hot-unplug patches not included, as suggested by Vasilis, since
    hot-unplug has some more complications with refcounting memory regions.
  - fix some copy-paste errors in qapi-schema.json.

v4: http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02693.html

Issues:

- hot-remove needs to only unmap the dimm device from guest's view. Freeing the
  memory should happen when the last user of the device (e.g. virtio-blk) unrefs
  the device. A testcase is needed for this.

- Live Migration: Ramblocks are migrated before qdev VMStates are migrated. So
  the DimmDevice is handled diferrently than other devices. Should this be
  reworked ?( DimmDevice structure currently does not define a VMStateDescription)
  Live migration works as long as the dimm layout (command line args) are
  identical at the source and destination qemu command line, and destination takes
  into account hot-operations that have occured on source. (v3 patch 10/19
  created the DimmDevice that corresponds to an unknown incoming ramblock, e.g.
  for a dimm that was hot-added on source. but has been dropped for the moment).

- A main blocker issue is windows guest functionality. The patchset does not
  work for windows currently.  Testing on win2012 server RC or windows2008
  consumer prerelease, when adding a DIMM, there is a BSOD with ACPI_BIOS_ERROR
  message. After this, the VM keeps rebooting with ACPI_BIOS_ERROR. The windows
  pnpmem driver obviosuly has a problem with the seabios dimm implementation
  (or the seabios dimm implementation is not fully ACPI-compliant). If someone
  can review the seabios patches or has any ideas to debug this, let me know.

- hot-operation notification lists need to be added to migration state.

- If a virtio sg element straddles a ramblock boundary, virtio_map_sg can't
  handle this and qemu exits with "virtio: Trying to map MMIO memory" assertion.
  This was discovered with stress testing in a VM with hotplugged DIMMs. The
  top-most commit in the qemu repo above tries to fix this (803aedf0) but maybe
  you have a better idea. This problem is critical for hot-add upstreaming and
  needs to be solved. Review/discussion on the list/irc is necessary.

- q35 was supposed/rumoured to have native acpi hotplug support, but I haven't
  found it in the spec (and I think other people in the list didn't either). Iiuc
  the plan is to support paravirtual memory hotplug for both piix4 and q35.

Hu Tao (5):
  qapi: make visit_type_size fallback to v->type_int()
  Implement dimm device abstraction
  memory: record below_4g_mem_size, above_4g_mem_size
  memory controller: initialize dram controller.
  pc: Add dimm paravirt SRAT info

Vasilis Liaskovitis (9):
  Add SIZE type to qdev properties
  qemu-option: export parse_option_number
  vl: handle "-device dimm"
  acpi_piix4 : Implement memory device hotplug registers
  acpi_ich9 : Implement memory device hotplug registers
  Introduce paravirt interface QEMU_CFG_PCI_WINDOW
  Implement "info memory" and "query-memory"
  balloon: update with hotplugged memory
  Implement dimm-info

 default-configs/x86_64-softmmu.mak |   1 +
 docs/specs/acpi_mem_hotplug.txt    |  14 ++
 docs/specs/fwcfg.txt               |  28 ++++
 hmp-commands.hx                    |   4 +
 hmp.c                              |  25 ++++
 hmp.h                              |   2 +
 hw/Makefile.objs                   |   1 +
 hw/acpi/ich9.c                     |  56 ++++++-
 hw/acpi/piix4.c                    |  72 ++++++++-
 hw/core/qdev-properties.c          |  61 ++++++++
 hw/i386/pc.c                       |  74 +++++++--
 hw/i386/pc_piix.c                  |   1 +
 hw/i386/pc_q35.c                   |  17 ++-
 hw/mem-hotplug/Makefile.objs       |   1 +
 hw/mem-hotplug/dimm.c              | 298 +++++++++++++++++++++++++++++++++++++
 hw/pci-host/piix.c                 |  13 ++
 hw/virtio/virtio-balloon.c         |  13 +-
 include/hw/acpi/ich9.h             |  10 ++
 include/hw/i386/pc.h               |   8 +
 include/hw/mem-hotplug/dimm.h      |  78 ++++++++++
 include/hw/nvram/fw_cfg.h          |   1 +
 include/hw/qdev-properties.h       |   3 +
 include/qemu/option.h              |   4 +
 include/sysemu/sysemu.h            |   1 +
 monitor.c                          |  14 ++
 qapi-schema.json                   |  40 +++++
 qapi/qapi-visit-core.c             |   6 +-
 qmp-commands.hx                    |  22 +++
 util/qemu-option.c                 |   8 +-
 vl.c                               |  60 ++++++++
 30 files changed, 907 insertions(+), 29 deletions(-)
 create mode 100644 docs/specs/acpi_mem_hotplug.txt
 create mode 100644 docs/specs/fwcfg.txt
 create mode 100644 hw/mem-hotplug/Makefile.objs
 create mode 100644 hw/mem-hotplug/dimm.c
 create mode 100644 include/hw/mem-hotplug/dimm.h

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 01/14] qapi: make visit_type_size fallback to v->type_int()
  2013-06-26  9:13 [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug Hu Tao
@ 2013-06-26  9:13 ` Hu Tao
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 02/14] Add SIZE type to qdev properties Hu Tao
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vasilis Liaskovitis

In case of v->type_uint64 is NULL, fallback to v->type_int, by calling
visit_type_uint64().

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 qapi/qapi-visit-core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c
index 401ee6e..7ae2061 100644
--- a/qapi/qapi-visit-core.c
+++ b/qapi/qapi-visit-core.c
@@ -239,7 +239,11 @@ void visit_type_int64(Visitor *v, int64_t *obj, const char *name, Error **errp)
 void visit_type_size(Visitor *v, uint64_t *obj, const char *name, Error **errp)
 {
     if (!error_is_set(errp)) {
-        (v->type_size ? v->type_size : v->type_uint64)(v, obj, name, errp);
+        if (v->type_size) {
+            v->type_size(v, obj, name, errp);
+        } else {
+            visit_type_uint64(v, obj, name, errp);
+        }
     }
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 02/14] Add SIZE type to qdev properties
  2013-06-26  9:13 [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug Hu Tao
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 01/14] qapi: make visit_type_size fallback to v->type_int() Hu Tao
@ 2013-06-26  9:13 ` Hu Tao
  2013-07-08  9:37   ` Andreas Färber
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 03/14] qemu-option: export parse_option_number Hu Tao
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vasilis Liaskovitis

From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

This patch adds a 'SIZE' type property to qdev.

It will make dimm description more convenient by allowing sizes to be specified
with K,M,G,T prefixes instead of number of bytes e.g.:
-device dimm,id=mem0,size=2G,bus=membus.0

Credits go to Ian Molton for original patch. See:
http://patchwork.ozlabs.org/patch/38835/

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 hw/core/qdev-properties.c    | 61 ++++++++++++++++++++++++++++++++++++++++++++
 include/hw/qdev-properties.h |  3 +++
 include/qemu/option.h        |  2 ++
 util/qemu-option.c           |  4 +--
 4 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 3a324fb..9c34793 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -1134,3 +1134,64 @@ void qdev_prop_set_globals(DeviceState *dev, Error **errp)
         class = object_class_get_parent(class);
     } while (class);
 }
+
+/* --- 64bit unsigned int 'size' type --- */
+
+static void get_size(Object *obj, Visitor *v, void *opaque,
+                     const char *name, Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    Property *prop = opaque;
+    uint64_t *ptr = qdev_get_prop_ptr(dev, prop);
+
+    visit_type_size(v, ptr, name, errp);
+}
+
+static void set_size(Object *obj, Visitor *v, void *opaque,
+                     const char *name, Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    Property *prop = opaque;
+    uint64_t *ptr = qdev_get_prop_ptr(dev, prop);
+
+    if (dev->realized) {
+        qdev_prop_set_after_realize(dev, name, errp);
+        return;
+    }
+
+    visit_type_size(v, ptr, name, errp);
+}
+
+static int parse_size(DeviceState *dev, Property *prop, const char *str)
+{
+    uint64_t *ptr = qdev_get_prop_ptr(dev, prop);
+    Error *errp = NULL;
+
+    if (str != NULL) {
+        parse_option_size(prop->name, str, ptr, &errp);
+    }
+    assert_no_error(errp);
+    return 0;
+}
+
+static int print_size(DeviceState *dev, Property *prop, char *dest, size_t len)
+{
+    uint64_t *ptr = qdev_get_prop_ptr(dev, prop);
+    char suffixes[] = {'T', 'G', 'M', 'K', 'B'};
+    int i = 0;
+    uint64_t div;
+
+    for (div = (long int)1 << 40; !(*ptr / div) ; div >>= 10) {
+        i++;
+    }
+
+    return snprintf(dest, len, "%0.03f%c", (double)*ptr / div, suffixes[i]);
+}
+
+PropertyInfo qdev_prop_size = {
+    .name  = "size",
+    .parse = parse_size,
+    .print = print_size,
+    .get = get_size,
+    .set = set_size,
+};
diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 39448b7..692f82e 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -15,6 +15,7 @@ extern PropertyInfo qdev_prop_uint64;
 extern PropertyInfo qdev_prop_hex8;
 extern PropertyInfo qdev_prop_hex32;
 extern PropertyInfo qdev_prop_hex64;
+extern PropertyInfo qdev_prop_size;
 extern PropertyInfo qdev_prop_string;
 extern PropertyInfo qdev_prop_chr;
 extern PropertyInfo qdev_prop_ptr;
@@ -116,6 +117,8 @@ extern PropertyInfo qdev_prop_arraylen;
     DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_hex32, uint32_t)
 #define DEFINE_PROP_HEX64(_n, _s, _f, _d)                       \
     DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_hex64, uint64_t)
+#define DEFINE_PROP_SIZE(_n, _s, _f, _d)                       \
+    DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_size, uint64_t)
 #define DEFINE_PROP_PCI_DEVFN(_n, _s, _f, _d)                   \
     DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_pci_devfn, int32_t)
 
diff --git a/include/qemu/option.h b/include/qemu/option.h
index a83c700..83c1e84 100644
--- a/include/qemu/option.h
+++ b/include/qemu/option.h
@@ -153,5 +153,7 @@ typedef int (*qemu_opts_loopfunc)(QemuOpts *opts, void *opaque);
 int qemu_opts_print(QemuOpts *opts, void *dummy);
 int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func, void *opaque,
                       int abort_on_failure);
+void parse_option_size(const char *name, const char *value,
+                       uint64_t *ret, Error **errp);
 
 #endif
diff --git a/util/qemu-option.c b/util/qemu-option.c
index 412c425..8c67857 100644
--- a/util/qemu-option.c
+++ b/util/qemu-option.c
@@ -173,8 +173,8 @@ static void parse_option_number(const char *name, const char *value,
     }
 }
 
-static void parse_option_size(const char *name, const char *value,
-                              uint64_t *ret, Error **errp)
+void parse_option_size(const char *name, const char *value,
+                       uint64_t *ret, Error **errp)
 {
     char *postfix;
     double sizef;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 03/14] qemu-option: export parse_option_number
  2013-06-26  9:13 [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug Hu Tao
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 01/14] qapi: make visit_type_size fallback to v->type_int() Hu Tao
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 02/14] Add SIZE type to qdev properties Hu Tao
@ 2013-06-26  9:13 ` Hu Tao
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 04/14] Implement dimm device abstraction Hu Tao
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vasilis Liaskovitis

From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 include/qemu/option.h | 2 ++
 util/qemu-option.c    | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/qemu/option.h b/include/qemu/option.h
index 83c1e84..d7a0bb9 100644
--- a/include/qemu/option.h
+++ b/include/qemu/option.h
@@ -155,5 +155,7 @@ int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func, void *opaque,
                       int abort_on_failure);
 void parse_option_size(const char *name, const char *value,
                        uint64_t *ret, Error **errp);
+void parse_option_number(const char *name, const char *value,
+                         uint64_t *ret, Error **errp);
 
 #endif
diff --git a/util/qemu-option.c b/util/qemu-option.c
index 8c67857..dddabcc 100644
--- a/util/qemu-option.c
+++ b/util/qemu-option.c
@@ -155,8 +155,8 @@ static void parse_option_bool(const char *name, const char *value, bool *ret,
     }
 }
 
-static void parse_option_number(const char *name, const char *value,
-                                uint64_t *ret, Error **errp)
+void parse_option_number(const char *name, const char *value,
+                         uint64_t *ret, Error **errp)
 {
     char *postfix;
     uint64_t number;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 04/14] Implement dimm device abstraction
  2013-06-26  9:13 [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug Hu Tao
                   ` (2 preceding siblings ...)
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 03/14] qemu-option: export parse_option_number Hu Tao
@ 2013-06-26  9:13 ` Hu Tao
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm" Hu Tao
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vasilis Liaskovitis

Each hotplug-able memory slot is a DimmDevice. All DimmDevices are attached
to a new bus called DimmBus. This bus is introduced so that we no longer
depend on hotplug-capability of main system bus (the main bus does not allow
hotplugging). The DimmBus should be attached to a chipset Device (i440fx in case
of the pc)

A hot-add operation for a particular dimm:
- creates a new DimmDevice and attaches it to the DimmBus
- creates a new MemoryRegion of the given physical address offset, size and
node proximity, and attaches it to main system memory as a sub_region.

Hotplug operations are done through normal device_add commands.
Also add properties to DimmDevice.

TODO/FIXME: Dimm configurations are specified at qemu startup at the command
line. Dimms with "populated=off" are actually not created, but their config
is saved in the corresponding bus. If user does not specify the correct bus
for a dimm device during device_add and there are multiple memory buses in the
machine (e.g. right now membus.0 and membus.pv are created in this pachset) the
first bus found in the device tree that matches the device type is picked. This
bus may be the wrong one, and the dimm config is not retrieved, thus creating a
useless dimm device. A solution would be to delete the new dimm if the dimm
config was not found (see commented qmp_device_del in dimm_init) but this does
not look clean. This design needs to be fixed somehow to avoid similar problems
if multiple memory buses are present in the same machine.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 default-configs/x86_64-softmmu.mak |   1 +
 hw/Makefile.objs                   |   1 +
 hw/mem-hotplug/Makefile.objs       |   1 +
 hw/mem-hotplug/dimm.c              | 241 +++++++++++++++++++++++++++++++++++++
 include/hw/mem-hotplug/dimm.h      |  77 ++++++++++++
 5 files changed, 321 insertions(+)
 create mode 100644 hw/mem-hotplug/Makefile.objs
 create mode 100644 hw/mem-hotplug/dimm.c
 create mode 100644 include/hw/mem-hotplug/dimm.h

diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
index 599b630..5348800 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -46,3 +46,4 @@ CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_ICC_BUS=y
 CONFIG_PVPANIC=y
+CONFIG_MEM_HOTPLUG=y
diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 0243d6a..6d3dc73 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -27,6 +27,7 @@ devices-dirs-$(CONFIG_SOFTMMU) += usb/
 devices-dirs-$(CONFIG_VIRTIO) += virtio/
 devices-dirs-$(CONFIG_SOFTMMU) += watchdog/
 devices-dirs-$(CONFIG_SOFTMMU) += xen/
+devices-dirs-$(CONFIG_MEM_HOTPLUG) += mem-hotplug/
 devices-dirs-y += core/
 common-obj-y += $(devices-dirs-y)
 obj-y += $(devices-dirs-y)
diff --git a/hw/mem-hotplug/Makefile.objs b/hw/mem-hotplug/Makefile.objs
new file mode 100644
index 0000000..7563ef5
--- /dev/null
+++ b/hw/mem-hotplug/Makefile.objs
@@ -0,0 +1 @@
+common-obj-$(CONFIG_MEM_HOTPLUG) += dimm.o
diff --git a/hw/mem-hotplug/dimm.c b/hw/mem-hotplug/dimm.c
new file mode 100644
index 0000000..09cfc4b
--- /dev/null
+++ b/hw/mem-hotplug/dimm.c
@@ -0,0 +1,241 @@
+/*
+ * Dimm device for Memory Hotplug
+ *
+ * Copyright ProfitBricks GmbH 2013
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "trace.h"
+#include "hw/qdev.h"
+#include "hw/mem-hotplug/dimm.h"
+#include <time.h>
+#include "qmp-commands.h"
+
+/* the following list is used to hold dimm config info before machine
+ * is initialized. After machine init, the list is not used anymore.*/
+static DimmConfiglist dimmconfig_list =
+       QTAILQ_HEAD_INITIALIZER(dimmconfig_list);
+
+/* the list of memory buses */
+static QLIST_HEAD(, DimmBus) memory_buses;
+
+static void dimmbus_dev_print(Monitor *mon, DeviceState *dev, int indent);
+static char *dimmbus_get_fw_dev_path(DeviceState *dev);
+
+static Property dimm_properties[] = {
+    DEFINE_PROP_UINT64("start", DimmDevice, start, 0),
+    DEFINE_PROP_SIZE("size", DimmDevice, size, DEFAULT_DIMMSIZE),
+    DEFINE_PROP_UINT32("node", DimmDevice, node, 0),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void dimmbus_dev_print(Monitor *mon, DeviceState *dev, int indent)
+{
+}
+
+static char *dimmbus_get_fw_dev_path(DeviceState *dev)
+{
+    char path[40];
+
+    snprintf(path, sizeof(path), "%s", qdev_fw_name(dev));
+    return strdup(path);
+}
+
+static void dimm_bus_class_init(ObjectClass *klass, void *data)
+{
+    BusClass *k = BUS_CLASS(klass);
+
+    k->print_dev = dimmbus_dev_print;
+    k->get_fw_dev_path = dimmbus_get_fw_dev_path;
+}
+
+static void dimm_bus_initfn(Object *obj)
+{
+    DimmBus *bus = DIMM_BUS(obj);
+    QTAILQ_INIT(&bus->dimmconfig_list);
+    QTAILQ_INIT(&bus->dimmlist);
+}
+
+static const TypeInfo dimm_bus_info = {
+    .name = TYPE_DIMM_BUS,
+    .parent = TYPE_BUS,
+    .instance_size = sizeof(DimmBus),
+    .instance_init = dimm_bus_initfn,
+    .class_init = dimm_bus_class_init,
+};
+
+DimmBus *dimm_bus_create(Object *parent, const char *name, uint32_t max_dimms,
+                         dimm_calcoffset_fn pmc_set_offset)
+{
+    DimmBus *memory_bus;
+    DimmConfig *dimm_cfg, *next_cfg;
+    uint32_t num_dimms = 0;
+
+    memory_bus = g_malloc0(dimm_bus_info.instance_size);
+    memory_bus->qbus.name = name ? g_strdup(name) : "membus.0";
+    qbus_create_inplace(&memory_bus->qbus, TYPE_DIMM_BUS, DEVICE(parent),
+                        name);
+
+    QTAILQ_FOREACH_SAFE(dimm_cfg, &dimmconfig_list, nextdimmcfg, next_cfg) {
+        if (!strcmp(memory_bus->qbus.name, dimm_cfg->bus_name)) {
+            if (max_dimms && (num_dimms == max_dimms)) {
+                fprintf(stderr, "Bus %s can only accept %u number of DIMMs\n",
+                        name, max_dimms);
+            }
+            QTAILQ_REMOVE(&dimmconfig_list, dimm_cfg, nextdimmcfg);
+            QTAILQ_INSERT_TAIL(&memory_bus->dimmconfig_list, dimm_cfg,
+                               nextdimmcfg);
+
+            dimm_cfg->start = pmc_set_offset(DEVICE(parent), dimm_cfg->size);
+            num_dimms++;
+        }
+    }
+    QLIST_INSERT_HEAD(&memory_buses, memory_bus, next);
+    return memory_bus;
+}
+
+static void dimm_populate(DimmDevice *s)
+{
+    DeviceState *dev = (DeviceState *)s;
+    MemoryRegion *new = NULL;
+
+    new = g_malloc(sizeof(MemoryRegion));
+    memory_region_init_ram(new, dev->id, s->size);
+    vmstate_register_ram_global(new);
+    memory_region_add_subregion(get_system_memory(), s->start, new);
+    s->mr = new;
+}
+
+void dimm_config_create(char *id, uint64_t size, const char *bus, uint64_t node,
+                        uint32_t dimm_idx)
+{
+    DimmConfig *dimm_cfg;
+    dimm_cfg = (DimmConfig *) g_malloc0(sizeof(DimmConfig));
+    dimm_cfg->name = strdup(id);
+    dimm_cfg->bus_name = strdup(bus);
+    dimm_cfg->idx = dimm_idx;
+    dimm_cfg->start = 0;
+    dimm_cfg->size = size;
+    dimm_cfg->node = node;
+
+    QTAILQ_INSERT_TAIL(&dimmconfig_list, dimm_cfg, nextdimmcfg);
+}
+
+void dimm_bus_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev)
+{
+    DimmBus *bus;
+    QLIST_FOREACH(bus, &memory_buses, next) {
+        assert(bus);
+        bus->qbus.allow_hotplug = 1;
+        bus->dimm_hotplug_qdev = qdev;
+        bus->dimm_hotplug = hotplug;
+    }
+}
+
+static void dimm_plug_device(DimmDevice *slot)
+{
+    DimmBus *bus = DIMM_BUS(qdev_get_parent_bus(&slot->qdev));
+
+    dimm_populate(slot);
+    if (bus->dimm_hotplug) {
+        bus->dimm_hotplug(bus->dimm_hotplug_qdev, slot, 1);
+    }
+}
+
+static int dimm_unplug_device(DeviceState *qdev)
+{
+    return 1;
+}
+
+static DimmConfig *dimmcfg_find_from_name(DimmBus *bus, const char *name)
+{
+    DimmConfig *slot;
+
+    QTAILQ_FOREACH(slot, &bus->dimmconfig_list, nextdimmcfg) {
+        if (!strcmp(slot->name, name)) {
+            return slot;
+        }
+    }
+    return NULL;
+}
+
+void dimm_setup_fwcfg_layout(uint64_t *fw_cfg_slots)
+{
+    DimmConfig *slot;
+    DimmBus *bus;
+
+    QLIST_FOREACH(bus, &memory_buses, next) {
+        QTAILQ_FOREACH(slot, &bus->dimmconfig_list, nextdimmcfg) {
+            assert(slot->start);
+            fw_cfg_slots[3 * slot->idx] = cpu_to_le64(slot->start);
+            fw_cfg_slots[3 * slot->idx + 1] = cpu_to_le64(slot->size);
+            fw_cfg_slots[3 * slot->idx + 2] = cpu_to_le64(slot->node);
+        }
+    }
+}
+
+static void dimm_realize(DeviceState *s, Error **errp)
+{
+    DimmBus *bus = DIMM_BUS(qdev_get_parent_bus(s));
+    DimmDevice *slot;
+    DimmConfig *slotcfg;
+
+    slot = DIMM(s);
+    slot->mr = NULL;
+
+    slotcfg = dimmcfg_find_from_name(bus, s->id);
+
+    if (!slotcfg) {
+        error_setg(errp,
+                   "FIXME no config for slot %s found, dimm should be deleted\n",
+                   s->id);
+        /* qmp_device_del(s->id, NULL); */
+        return;
+    }
+
+    slot->idx = slotcfg->idx;
+    assert(slotcfg->start);
+    slot->start = slotcfg->start;
+    slot->size = slotcfg->size;
+    slot->node = slotcfg->node;
+
+    QTAILQ_INSERT_TAIL(&bus->dimmlist, slot, nextdimm);
+    dimm_plug_device(slot);
+}
+
+
+static void dimm_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->props = dimm_properties;
+    dc->unplug = dimm_unplug_device;
+    dc->realize = dimm_realize;
+    dc->bus_type = TYPE_DIMM_BUS;
+}
+
+static TypeInfo dimm_info = {
+    .name          = TYPE_DIMM,
+    .parent        = TYPE_DEVICE,
+    .instance_size = sizeof(DimmDevice),
+    .class_init    = dimm_class_init,
+};
+
+static void dimm_register_types(void)
+{
+    type_register_static(&dimm_bus_info);
+    type_register_static(&dimm_info);
+}
+
+type_init(dimm_register_types)
diff --git a/include/hw/mem-hotplug/dimm.h b/include/hw/mem-hotplug/dimm.h
new file mode 100644
index 0000000..8625ae6
--- /dev/null
+++ b/include/hw/mem-hotplug/dimm.h
@@ -0,0 +1,77 @@
+#ifndef QEMU_DIMM_H
+#define QEMU_DIMM_H
+
+#include "qemu-common.h"
+#include "exec/memory.h"
+#include "exec/address-spaces.h"
+#include "hw/sysbus.h"
+#include "qapi-types.h"
+#include "qemu-common.h"
+#define MAX_DIMMS 255
+#define DIMM_BITMAP_BYTES ((MAX_DIMMS + 7) / 8)
+#define DEFAULT_DIMMSIZE (1024*1024*1024)
+
+typedef enum {
+    DIMM_REMOVE_SUCCESS = 0,
+    DIMM_REMOVE_FAIL = 1,
+    DIMM_ADD_SUCCESS = 2,
+    DIMM_ADD_FAIL = 3
+} dimm_hp_result_code;
+
+#define TYPE_DIMM "dimm"
+#define DIMM(obj) \
+    OBJECT_CHECK(DimmDevice, (obj), TYPE_DIMM)
+
+typedef struct DimmDevice DimmDevice;
+typedef QTAILQ_HEAD(DimmConfiglist, DimmConfig) DimmConfiglist;
+
+struct DimmDevice {
+    DeviceState qdev;
+    uint32_t idx; /* index in memory hotplug register/bitmap */
+    ram_addr_t start; /* starting physical address */
+    ram_addr_t size;
+    uint32_t node; /* numa node proximity */
+    MemoryRegion *mr; /* MemoryRegion for this slot. !NULL only if populated */
+    QTAILQ_ENTRY(DimmDevice) nextdimm;
+};
+
+typedef struct DimmConfig {
+    const char *name;
+    uint32_t idx; /* index in linear memory hotplug bitmap */
+    const char *bus_name;
+    ram_addr_t start; /* starting physical address */
+    ram_addr_t size;
+    uint32_t node; /* numa node proximity */
+    QTAILQ_ENTRY(DimmConfig) nextdimmcfg;
+} DimmConfig;
+
+typedef int (*dimm_hotplug_fn)(DeviceState *qdev, DimmDevice *dev, int add);
+typedef hwaddr(*dimm_calcoffset_fn)(DeviceState *dev, uint64_t size);
+
+#define TYPE_DIMM_BUS "dimmbus"
+#define DIMM_BUS(obj) OBJECT_CHECK(DimmBus, (obj), TYPE_DIMM_BUS)
+
+typedef struct DimmBus {
+    BusState qbus;
+    DeviceState *dimm_hotplug_qdev;
+    dimm_hotplug_fn dimm_hotplug;
+    DimmConfiglist dimmconfig_list;
+    QTAILQ_HEAD(Dimmlist, DimmDevice) dimmlist;
+    QLIST_ENTRY(DimmBus) next;
+} DimmBus;
+
+struct dimm_hp_result {
+    const char *dimmname;
+    dimm_hp_result_code ret;
+    QTAILQ_ENTRY(dimm_hp_result) next;
+};
+
+void dimm_bus_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev);
+void dimm_setup_fwcfg_layout(uint64_t *fw_cfg_slots);
+int dimm_add(char *id);
+DimmBus *dimm_bus_create(Object *parent, const char *name, uint32_t max_dimms,
+    dimm_calcoffset_fn pmc_set_offset);
+void dimm_config_create(char *id, uint64_t size, const char *bus, uint64_t node,
+        uint32_t dimm_idx);
+
+#endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-06-26  9:13 [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug Hu Tao
                   ` (3 preceding siblings ...)
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 04/14] Implement dimm device abstraction Hu Tao
@ 2013-06-26  9:13 ` Hu Tao
  2013-06-26  9:46   ` Paolo Bonzini
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 06/14] acpi_piix4 : Implement memory device hotplug registers Hu Tao
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vasilis Liaskovitis

From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 vl.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/vl.c b/vl.c
index 767e020..9d88a79 100644
--- a/vl.c
+++ b/vl.c
@@ -170,6 +170,7 @@ int main(int argc, char **argv)
 
 #include "ui/qemu-spice.h"
 #include "qapi/string-input-visitor.h"
+#include "hw/mem-hotplug/dimm.h"
 
 //#define DEBUG_NET
 //#define DEBUG_SLIRP
@@ -252,6 +253,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
 int nb_numa_nodes;
 uint64_t node_mem[MAX_NODES];
 unsigned long *node_cpumask[MAX_NODES];
+int nb_hp_dimms;
 
 uint8_t qemu_uuid[16];
 
@@ -2338,6 +2340,50 @@ static int chardev_init_func(QemuOpts *opts, void *opaque)
     return 0;
 }
 
+static int dimmcfg_init_func(QemuOpts *opts, void *opaque)
+{
+    const char *driver;
+    const char *id;
+    uint64_t node, size;
+    uint32_t populated;
+    const char *buf, *busbuf;
+
+    /* DimmDevice configuration needs to be known in order to initialize chipset
+     * with correct memory and pci ranges. But all devices are created after
+     * chipset / machine initialization. In * order to avoid this problem, we
+     * parse dimm information earlier into dimmcfg structs. */
+
+    driver = qemu_opt_get(opts, "driver");
+    if (!strcmp(driver, "dimm")) {
+
+        id = qemu_opts_id(opts);
+        buf = qemu_opt_get(opts, "size");
+        parse_option_size("size", buf, &size, NULL);
+        buf = qemu_opt_get(opts, "node");
+        parse_option_number("node", buf, &node, NULL);
+        busbuf = qemu_opt_get(opts, "bus");
+        buf = qemu_opt_get(opts, "populated");
+        if (!buf) {
+            populated = 0;
+        } else {
+            populated = strcmp(buf, "on") ? 0 : 1;
+        }
+
+        dimm_config_create((char *)id, size, busbuf ? busbuf : "membus.0",
+                           node, nb_hp_dimms);
+
+        /* if !populated, we just keep the config. The real device
+         * will be created in the future with a normal device_add
+         * command. */
+        if (!populated) {
+            qemu_opts_del(opts);
+        }
+        nb_hp_dimms++;
+    }
+
+    return 0;
+}
+
 #ifdef CONFIG_VIRTFS
 static int fsdev_init_func(QemuOpts *opts, void *opaque)
 {
@@ -4260,6 +4306,11 @@ int main(int argc, char **argv, char **envp)
     }
     qemu_add_globals();
 
+    /* init generic devices */
+    if (qemu_opts_foreach(qemu_find_opts("device"),
+           dimmcfg_init_func, NULL, 1) != 0) {
+        exit(1);
+    }
     qdev_machine_init();
 
     QEMUMachineInitArgs args = { .ram_size = ram_size,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 06/14] acpi_piix4 : Implement memory device hotplug registers
  2013-06-26  9:13 [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug Hu Tao
                   ` (4 preceding siblings ...)
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm" Hu Tao
@ 2013-06-26  9:13 ` Hu Tao
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 07/14] acpi_ich9 " Hu Tao
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vasilis Liaskovitis

From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

A 32-byte register is used to present up to 256 hotplug-able memory devices
to BIOS and OSPM. Hot-add and hot-remove functions trigger an ACPI hotplug
event through these. Only reads are allowed from these registers.

An ACPI hot-remove event but needs to wait for OSPM to eject the device.
We use a single-byte register to know when OSPM has called the _EJ function
for a particular dimm. A write to this byte will depopulate the respective dimm.
Only writes are allowed to this byte.

v1->v2:
mems_sts address moved from 0xaf20 to 0xaf80 (to accomodate more space for
cpu-hotplugging in the future).
_EJ array is reduced to a single byte.
Add documentation in docs/specs/acpi_hotplug.txt

v3->v4: Removed hot-remove functions, will be added separately. Updated for
memory API.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 docs/specs/acpi_mem_hotplug.txt | 14 ++++++++
 hw/acpi/piix4.c                 | 72 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 85 insertions(+), 1 deletion(-)
 create mode 100644 docs/specs/acpi_mem_hotplug.txt

diff --git a/docs/specs/acpi_mem_hotplug.txt b/docs/specs/acpi_mem_hotplug.txt
new file mode 100644
index 0000000..8391713
--- /dev/null
+++ b/docs/specs/acpi_mem_hotplug.txt
@@ -0,0 +1,14 @@
+QEMU<->ACPI BIOS hotplug interface
+--------------------------------------
+This document describes the interface between QEMU and the ACPI BIOS for non-PCI
+space. For the PCI interface please look at docs/specs/acpi_pci_hotplug.txt
+
+QEMU<->ACPI BIOS memory hotplug interface
+--------------------------------------
+
+Memory Dimm status array (IO port 0xaf80-0xaf9f, 1-byte access):
+---------------------------------------------------------------
+Dimm hot-plug notification pending. One bit per slot.
+
+Read by ACPI BIOS GPE.3 handler to notify OS of memory hot-add or hot-remove
+events.  Read-only.
diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 756df3b..2795ab0 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -29,6 +29,7 @@
 #include "exec/ioport.h"
 #include "hw/nvram/fw_cfg.h"
 #include "exec/address-spaces.h"
+#include "hw/mem-hotplug/dimm.h"
 
 //#define DEBUG
 
@@ -50,9 +51,12 @@
 
 #define PIIX4_PROC_BASE 0xaf00
 #define PIIX4_PROC_LEN 32
+#define PIIX4_MEM_BASE 0xaf80
+#define PIIX4_MEM_LEN 32
 
 #define PIIX4_PCI_HOTPLUG_STATUS 2
 #define PIIX4_CPU_HOTPLUG_STATUS 4
+#define PIIX4_MEM_HOTPLUG_STATUS 8
 
 struct pci_status {
     uint32_t up; /* deprecated, maintained for migration compatibility */
@@ -63,6 +67,10 @@ typedef struct CPUStatus {
     uint8_t sts[PIIX4_PROC_LEN];
 } CPUStatus;
 
+typedef struct MemStatus {
+    uint8_t mems_sts[PIIX4_MEM_LEN];
+} MemStatus;
+
 typedef struct PIIX4PMState {
     PCIDevice dev;
 
@@ -70,6 +78,7 @@ typedef struct PIIX4PMState {
     MemoryRegion io_gpe;
     MemoryRegion io_pci;
     MemoryRegion io_cpu;
+    MemoryRegion io_mem;
     ACPIREGS ar;
 
     APMState apm;
@@ -94,6 +103,8 @@ typedef struct PIIX4PMState {
 
     CPUStatus gpe_cpu;
     Notifier cpu_added_notifier;
+
+    MemStatus  gpe_mem;
 } PIIX4PMState;
 
 static void piix4_acpi_system_hot_add_init(MemoryRegion *parent,
@@ -113,7 +124,8 @@ static void pm_update_sci(PIIX4PMState *s)
                    ACPI_BITMASK_GLOBAL_LOCK_ENABLE |
                    ACPI_BITMASK_TIMER_ENABLE)) != 0) ||
         (((s->ar.gpe.sts[0] & s->ar.gpe.en[0]) &
-          (PIIX4_PCI_HOTPLUG_STATUS | PIIX4_CPU_HOTPLUG_STATUS)) != 0);
+          (PIIX4_PCI_HOTPLUG_STATUS | PIIX4_CPU_HOTPLUG_STATUS |
+           PIIX4_MEM_HOTPLUG_STATUS)) != 0);
 
     qemu_set_irq(s->irq, sci_level);
     /* schedule a timer interruption if needed */
@@ -664,8 +676,40 @@ static void piix4_init_cpu_status(CPUState *cpu, void *data)
     g->sts[id / 8] |= (1 << (id % 8));
 }
 
+static uint32_t mem_status_readb(void *opaque, uint32_t addr)
+{
+    PIIX4PMState *s = opaque;
+    uint32_t val = 0;
+    MemStatus *g = &s->gpe_mem;
+    if (addr < PIIX4_MEM_LEN) {
+        val = (uint32_t) g->mems_sts[addr];
+    }
+    PIIX4_DPRINTF("memhp read %" PRIu32 " == %" PRIu32 "\n", addr, val);
+    return val;
+}
+
+static const MemoryRegionOps mem_hotplug_ops = {
+    .old_portio = (MemoryRegionPortio[]) {
+        {
+            .offset = 0,   .len = PIIX4_MEM_LEN, .size = 1,
+            .read = mem_status_readb,
+        },
+        PORTIO_END_OF_LIST()
+    },
+    .endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+static void piix4_init_mem_status(PIIX4PMState *s)
+{
+    int i;
+    for (i = 0; i < PIIX4_MEM_LEN; i++) {
+        s->gpe_mem.mems_sts[i] = 0;
+    }
+}
+
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
                                 PCIHotplugState state);
+static int piix4_mem_hotplug(DeviceState *qdev, DimmDevice *dev, int add);
 
 static void piix4_acpi_system_hot_add_init(MemoryRegion *parent,
                                            PCIBus *bus, PIIX4PMState *s)
@@ -686,6 +730,13 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion *parent,
     memory_region_add_subregion(parent, PIIX4_PROC_BASE, &s->io_cpu);
     s->cpu_added_notifier.notify = piix4_cpu_added_req;
     qemu_register_cpu_added_notifier(&s->cpu_added_notifier);
+
+    piix4_init_mem_status(s);
+    memory_region_init_io(&s->io_mem, &mem_hotplug_ops, s, "apci-mem-hotplug",
+                          PIIX4_MEM_LEN);
+    memory_region_add_subregion(parent, PIIX4_MEM_BASE, &s->io_mem);
+
+    dimm_bus_hotplug(piix4_mem_hotplug, &s->dev.qdev);
 }
 
 static void enable_device(PIIX4PMState *s, int slot)
@@ -725,3 +776,22 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
 
     return 0;
 }
+
+static void enable_mem_device(PIIX4PMState *s, int memdevice)
+{
+    MemStatus *g = &s->gpe_mem;
+    s->ar.gpe.sts[0] |= PIIX4_MEM_HOTPLUG_STATUS;
+    g->mems_sts[memdevice / 8] |= (1 << (memdevice % 8));
+}
+
+static int piix4_mem_hotplug(DeviceState *dev, DimmDevice *dimm, int add)
+{
+    PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, dev);
+    PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, pci_dev);
+
+    if (add) {
+        enable_mem_device(s, dimm->idx);
+    }
+    pm_update_sci(s);
+    return 0;
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 07/14] acpi_ich9 : Implement memory device hotplug registers
  2013-06-26  9:13 [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug Hu Tao
                   ` (5 preceding siblings ...)
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 06/14] acpi_piix4 : Implement memory device hotplug registers Hu Tao
@ 2013-06-26  9:13 ` Hu Tao
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 08/14] memory: record below_4g_mem_size, above_4g_mem_size Hu Tao
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vasilis Liaskovitis

From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

This implements acpi dimm hot-add capability for q35 (ich9). The logic is the
same as for the pc machine (piix4).

TODO: Fix acpi irq delivery bug. Currently there is a flood of irqs when
delivering an acpi interrupt (should be just one). Guest complains as follows:
"irq 9: nobody cared
[...]
Disabling IRQ #9"

where #9 is the acpi irq

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 hw/acpi/ich9.c         | 56 ++++++++++++++++++++++++++++++++++++++++++++++++--
 include/hw/acpi/ich9.h | 10 +++++++++
 2 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 4a17f32..0034aa2 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -33,6 +33,7 @@
 #include "exec/address-spaces.h"
 
 #include "hw/i386/ich9.h"
+#include "hw/mem-hotplug/dimm.h"
 
 //#define DEBUG
 
@@ -49,11 +50,14 @@ static void pm_update_sci(ICH9LPCPMRegs *pm)
 
     pm1a_sts = acpi_pm1_evt_get_sts(&pm->acpi_regs);
 
-    sci_level = (((pm1a_sts & pm->acpi_regs.pm1.evt.en) &
+    sci_level = ((((pm1a_sts & pm->acpi_regs.pm1.evt.en) &
                   (ACPI_BITMASK_RT_CLOCK_ENABLE |
                    ACPI_BITMASK_POWER_BUTTON_ENABLE |
                    ACPI_BITMASK_GLOBAL_LOCK_ENABLE |
-                   ACPI_BITMASK_TIMER_ENABLE)) != 0);
+                   ACPI_BITMASK_TIMER_ENABLE)) != 0) ||
+        (((pm->acpi_regs.gpe.sts[0] & pm->acpi_regs.gpe.en[0]) &
+          (ICH9_MEM_HOTPLUG_STATUS)) != 0));
+
     qemu_set_irq(pm->irq, sci_level);
 
     /* schedule a timer interruption if needed */
@@ -202,6 +206,47 @@ static void pm_powerdown_req(Notifier *n, void *opaque)
     acpi_pm1_evt_power_down(&pm->acpi_regs);
 }
 
+static uint32_t mem_status_readb(void *opaque, uint32_t addr)
+{
+    ICH9LPCPMRegs *s = opaque;
+    uint32_t val = 0;
+    MemStatus *g = &s->gpe_mem;
+    if (addr < ICH9_MEM_LEN) {
+        val = (uint32_t) g->mems_sts[addr];
+    }
+    ICH9_DEBUG("memhp read %" PRIu32 " == %" PRIu32 "\n", addr, val);
+    return val;
+}
+
+static const MemoryRegionOps ich9_mem_hotplug_ops = {
+    .old_portio = (MemoryRegionPortio[]) {
+        {
+            .offset = 0,   .len = ICH9_MEM_LEN, .size = 1,
+            .read = mem_status_readb,
+        },
+        PORTIO_END_OF_LIST()
+    },
+    .endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+static void enable_mem_device(ICH9LPCState *s, int memdevice)
+{
+    MemStatus *g = &s->pm.gpe_mem;
+    s->pm.acpi_regs.gpe.sts[0] |= ICH9_MEM_HOTPLUG_STATUS;
+    g->mems_sts[memdevice / 8] |= (1 << (memdevice % 8));
+}
+
+static int ich9_mem_hotplug(DeviceState *dev, DimmDevice *dimm, int add)
+{
+    ICH9LPCState *s = ICH9_LPC_DEVICE(dev);
+
+    if (add) {
+        enable_mem_device(s, dimm->idx);
+    }
+    pm_update_sci(&s->pm);
+    return 0;
+}
+
 void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm,
                   qemu_irq sci_irq)
 {
@@ -227,4 +272,11 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm,
     qemu_register_reset(pm_reset, pm);
     pm->powerdown_notifier.notify = pm_powerdown_req;
     qemu_register_powerdown_notifier(&pm->powerdown_notifier);
+
+    memory_region_init_io(&pm->io_mem, &ich9_mem_hotplug_ops, pm,
+                          "acpi-memory-hotplug0", ICH9_MEM_BASE);
+    memory_region_add_subregion(get_system_io(), ICH9_MEM_BASE, &pm->io_mem);
+
+    dimm_bus_hotplug(ich9_mem_hotplug, &lpc_pci->qdev);
+
 }
diff --git a/include/hw/acpi/ich9.h b/include/hw/acpi/ich9.h
index b1fe71f..300e07f 100644
--- a/include/hw/acpi/ich9.h
+++ b/include/hw/acpi/ich9.h
@@ -23,6 +23,14 @@
 
 #include "hw/acpi/acpi.h"
 
+#define ICH9_MEM_BASE 0xaf80
+#define ICH9_MEM_LEN 32
+#define ICH9_MEM_HOTPLUG_STATUS 8
+
+typedef struct MemStatus {
+    uint8_t mems_sts[ICH9_MEM_LEN];
+} MemStatus;
+
 typedef struct ICH9LPCPMRegs {
     /*
      * In ich9 spec says that pm1_cnt register is 32bit width and
@@ -34,6 +42,7 @@ typedef struct ICH9LPCPMRegs {
     MemoryRegion io;
     MemoryRegion io_gpe;
     MemoryRegion io_smi;
+    MemoryRegion io_mem;
 
     uint32_t smi_en;
     uint32_t smi_sts;
@@ -42,6 +51,7 @@ typedef struct ICH9LPCPMRegs {
 
     uint32_t pm_io_base;
     Notifier powerdown_notifier;
+    MemStatus gpe_mem;
 } ICH9LPCPMRegs;
 
 void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 08/14] memory: record below_4g_mem_size, above_4g_mem_size
  2013-06-26  9:13 [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug Hu Tao
                   ` (6 preceding siblings ...)
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 07/14] acpi_ich9 " Hu Tao
@ 2013-06-26  9:13 ` Hu Tao
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 09/14] memory controller: initialize dram controller Hu Tao
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vasilis Liaskovitis

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 hw/i386/pc.c         | 17 ++++++++---------
 include/hw/i386/pc.h |  2 ++
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 5bb4989..55056b1 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1288,7 +1288,6 @@ static int memory_controller_init(PCIDevice *dev)
     MemoryController *m = MEMORY_CONTROLLER(dev);
     MemoryControllerClass *c = MEMORY_CONTROLLER_GET_CLASS(dev);
     ram_addr_t ram_size;
-    hwaddr below_4g_mem_size, above_4g_mem_size;
     hwaddr pci_hole_start, pci_hole_size;
     hwaddr pci_hole64_start, pci_hole64_size;
     int i;
@@ -1296,11 +1295,11 @@ static int memory_controller_init(PCIDevice *dev)
     g_assert(m->system_memory != NULL);
 
     if(m->ram_size > c->pci_hole_start) {
-        below_4g_mem_size = c->pci_hole_start;
-        above_4g_mem_size = m->ram_size - c->pci_hole_start;
+        m->below_4g_mem_size = c->pci_hole_start;
+        m->above_4g_mem_size = m->ram_size - c->pci_hole_start;
     } else {
-        below_4g_mem_size = m->ram_size;
-        above_4g_mem_size = 0;
+        m->below_4g_mem_size = m->ram_size;
+        m->above_4g_mem_size = 0;
     }
 
     /* Allocate RAM.  We allocate it as a single memory region and use
@@ -1310,16 +1309,16 @@ static int memory_controller_init(PCIDevice *dev)
     memory_region_init_ram(&m->ram, "pc.ram", m->ram_size);
     vmstate_register_ram_global(&m->ram);
     memory_region_init_alias(&m->ram_below_4g, "ram-below-4g", &m->ram,
-                             0, below_4g_mem_size);
+                             0, m->below_4g_mem_size);
     memory_region_add_subregion(m->system_memory, 0, &m->ram_below_4g);
-    if (above_4g_mem_size > 0) {
+    if (m->above_4g_mem_size > 0) {
         memory_region_init_alias(&m->ram_above_4g, "ram-above-4g", &m->ram,
-                                 below_4g_mem_size, above_4g_mem_size);
+                                 m->below_4g_mem_size, m->above_4g_mem_size);
         memory_region_add_subregion(m->system_memory, c->pci_hole_end,
                                     &m->ram_above_4g);
     }
 
-    pci_hole_start = below_4g_mem_size;
+    pci_hole_start = m->below_4g_mem_size;
     pci_hole_size = c->pci_hole_end - pci_hole_start;
 
     pci_hole64_start = c->pci_hole_end + m->ram_size - pci_hole_start;
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 5d36558..e2cbc1b 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -46,6 +46,8 @@ typedef struct MemoryController {
     MemoryRegion ram;
     MemoryRegion ram_below_4g;
     MemoryRegion ram_above_4g;
+    hwaddr below_4g_mem_size;
+    hwaddr above_4g_mem_size;
 } MemoryController;
 
 void mc_update_pam(MemoryController *d);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 09/14] memory controller: initialize dram controller.
  2013-06-26  9:13 [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug Hu Tao
                   ` (7 preceding siblings ...)
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 08/14] memory: record below_4g_mem_size, above_4g_mem_size Hu Tao
@ 2013-06-26  9:13 ` Hu Tao
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 10/14] pc: Add dimm paravirt SRAT info Hu Tao
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vasilis Liaskovitis

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 hw/i386/pc.c         | 27 +++++++++++++++++++++++++++
 include/hw/i386/pc.h |  5 +++++
 2 files changed, 32 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 55056b1..65838a6 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1283,6 +1283,27 @@ void mc_set_smm(int val, void *arg)
     memory_region_transaction_commit();
 }
 
+static hwaddr mc_dimm_offset(DeviceState *dev, uint64_t size)
+{
+    MemoryController *d = MEMORY_CONTROLLER(dev);
+    MemoryControllerClass *c = MEMORY_CONTROLLER_GET_CLASS(d);
+    hwaddr ret;
+
+    if (d->below_4g_mem_size + size <= c->pci_hole_start) {
+        /* if dimm fits before pci hole, append it normally */
+        ret = d->below_4g_mem_size;
+        d->below_4g_mem_size += size;
+    } else {
+        /* otherwise place it above 4GB */
+        ret = d->above_4g_mem_size + c->pci_hole_end;
+        d->above_4g_mem_size += size;
+    }
+
+    d->ram_size += size;
+
+    return ret;
+}
+
 static int memory_controller_init(PCIDevice *dev)
 {
     MemoryController *m = MEMORY_CONTROLLER(dev);
@@ -1353,6 +1374,11 @@ static int memory_controller_init(PCIDevice *dev)
                  PAM_EXPAN_SIZE);
     }
 
+    m->dram_channel0 = dimm_bus_create(OBJECT(m), "membus.0", 8,
+                                       c->dimm_offset);
+    m->pv_dram_channel = dimm_bus_create(OBJECT(m), "membus.pv", 0,
+                                         c->dimm_offset);
+
     ram_size = m->ram_size / 8 / 1024 / 1024;
     if (ram_size > 255) {
         ram_size = 255;
@@ -1388,6 +1414,7 @@ static void memory_controller_class_init(ObjectClass *klass, void *data)
     dc->no_user = 1;
     mc->set_smm = mc_set_smm;
     mc->update = mc_update;
+    mc->dimm_offset = mc_dimm_offset;
 }
 
 static const TypeInfo memory_controller_type_info = {
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index e2cbc1b..959b92b 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -10,6 +10,7 @@
 #include "hw/i386/ioapic.h"
 #include "hw/pci/pci.h"
 #include "hw/pci-host/pam.h"
+#include "hw/mem-hotplug/dimm.h"
 
 #define TYPE_MEMORY_CONTROLLER "memory controller"
 #define MEMORY_CONTROLLER(obj) OBJECT_CHECK(MemoryController, (obj), TYPE_DEVICE)
@@ -29,6 +30,7 @@ typedef struct MemoryControllerClass {
 
     void (*set_smm)(int val, void *arg);
     void (*update)(MemoryController *m);
+    hwaddr (*dimm_offset)(DeviceState *d, uint64_t size);
 } MemoryControllerClass;
 
 typedef struct MemoryController {
@@ -48,6 +50,9 @@ typedef struct MemoryController {
     MemoryRegion ram_above_4g;
     hwaddr below_4g_mem_size;
     hwaddr above_4g_mem_size;
+
+    DimmBus *dram_channel0;
+    DimmBus *pv_dram_channel;
 } MemoryController;
 
 void mc_update_pam(MemoryController *d);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 10/14] pc: Add dimm paravirt SRAT info
  2013-06-26  9:13 [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug Hu Tao
                   ` (8 preceding siblings ...)
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 09/14] memory controller: initialize dram controller Hu Tao
@ 2013-06-26  9:13 ` Hu Tao
  2013-07-10 10:10   ` Michael S. Tsirkin
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 11/14] Introduce paravirt interface QEMU_CFG_PCI_WINDOW Hu Tao
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vasilis Liaskovitis

The numa_fw_cfg paravirt interface is extended to include SRAT information for
all hotplug-able dimms. There are 3 words for each hotplug-able memory slot,
denoting start address, size and node proximity. The new info is appended after
existing numa info, so that the fw_cfg layout does not break.  This information
is used by Seabios to build hotplug memory device objects at runtime.
nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info
to SeaBIOS.

v3->v4: numa_fw_cfg needs to be initalized after memory controller sets up dimm
ranges.  Make changes for pc_piix and pc_q35 to set numa_fw_cfg after i440fx
initialization.

v2->v3: setting nb_numa_nodes to 1 is not needed

v1->v2:
Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order not
to break existing layout
Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 docs/specs/fwcfg.txt    | 28 ++++++++++++++++++++++++++++
 hw/i386/pc.c            | 30 ++++++++++++++++++++++++------
 hw/i386/pc_piix.c       |  1 +
 hw/i386/pc_q35.c        |  7 +++++--
 include/hw/i386/pc.h    |  1 +
 include/sysemu/sysemu.h |  1 +
 6 files changed, 60 insertions(+), 8 deletions(-)
 create mode 100644 docs/specs/fwcfg.txt

diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt
new file mode 100644
index 0000000..e6fcd8f
--- /dev/null
+++ b/docs/specs/fwcfg.txt
@@ -0,0 +1,28 @@
+QEMU<->BIOS Paravirt Documentation
+--------------------------------------
+
+This document describes paravirt data structures passed from QEMU to BIOS.
+
+fw_cfg SRAT paravirt info
+--------------------
+The SRAT info passed from QEMU to BIOS has the following layout:
+
+-----------------------------------------------------------------------------------------------
+#nodes | cpu0_pxm | cpu1_pxm | ... | cpulast_pxm | node0_mem | node1_mem | ... | nodelast_mem
+
+-----------------------------------------------------------------------------------------------
+#dimms | dimm0_start | dimm0_sz | dimm0_pxm | ... | dimmlast_start | dimmlast_sz | dimmlast_pxm
+
+Entry 0 contains the number of numa nodes (nb_numa_nodes).
+
+Entries 1..max_cpus: The next max_cpus entries describe node proximity for each
+one of the vCPUs in the system.
+
+Entries max_cpus+1..max_cpus+nb_numa_nodes+1:  The next nb_numa_nodes entries
+describe the memory size for each one of the NUMA nodes in the system.
+
+Entry max_cpus+nb_numa_nodes+1 contains the number of memory dimms (nb_hp_dimms)
+
+The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet contains
+the physical address offset, size (in bytes), and node proximity for the
+respective dimm.
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 65838a6..b51d3b5 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -55,6 +55,7 @@
 #include "hw/acpi/acpi.h"
 #include "hw/cpu/icc_bus.h"
 #include "hw/boards.h"
+#include "hw/mem-hotplug/dimm.h"
 
 /* debug PC/ISA interrupts */
 //#define DEBUG_IRQ
@@ -606,8 +607,6 @@ static FWCfgState *bochs_bios_init(void)
     FWCfgState *fw_cfg;
     uint8_t *smbios_table;
     size_t smbios_len;
-    uint64_t *numa_fw_cfg;
-    int i, j;
     unsigned int apic_id_limit = pc_apic_id_limit(max_cpus);
 
     fw_cfg = fw_cfg_init(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 1, 0, 0);
@@ -640,11 +639,25 @@ static FWCfgState *bochs_bios_init(void)
                      &e820_table, sizeof(e820_table));
 
     fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, &hpet_cfg, sizeof(hpet_cfg));
+
+    return fw_cfg;
+}
+
+void bochs_meminfo_bios_init(void *fw_cfg)
+{
+    uint64_t *numa_fw_cfg;
+    uint64_t *hp_dimms_fw_cfg;
+    int i, j;
+    unsigned int apic_id_limit = pc_apic_id_limit(max_cpus);
+
     /* allocate memory for the NUMA channel: one (64bit) word for the number
      * of nodes, one word for each VCPU->node and one word for each node to
      * hold the amount of memory.
+     * Finally one word for the number of hotplug memory slots and three words
+     * for each hotplug memory slot (start address, size and node proximity).
      */
-    numa_fw_cfg = g_new0(uint64_t, 1 + apic_id_limit + nb_numa_nodes);
+    numa_fw_cfg = g_new0(uint64_t,
+                         2 + apic_id_limit + nb_numa_nodes  + 3 * nb_hp_dimms);
     numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
     for (i = 0; i < max_cpus; i++) {
         unsigned int apic_id = x86_cpu_apic_id_from_index(i);
@@ -659,11 +672,16 @@ static FWCfgState *bochs_bios_init(void)
     for (i = 0; i < nb_numa_nodes; i++) {
         numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]);
     }
+
+    numa_fw_cfg[1 + apic_id_limit + nb_numa_nodes] = cpu_to_le64(nb_hp_dimms);
+
+    hp_dimms_fw_cfg = numa_fw_cfg + 2 + apic_id_limit + nb_numa_nodes;
+    if (nb_hp_dimms) {
+        dimm_setup_fwcfg_layout(hp_dimms_fw_cfg);
+    }
     fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
-                     (1 + apic_id_limit + nb_numa_nodes) *
+                     (2 + apic_id_limit + nb_numa_nodes + 3 * nb_hp_dimms) *
                      sizeof(*numa_fw_cfg));
-
-    return fw_cfg;
 }
 
 static long get_file_size(FILE *f)
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index fb056df..6e18343 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -138,6 +138,7 @@ static void pc_init1(MemoryRegion *system_memory,
     if (!xen_enabled()) {
         fw_cfg = pc_memory_init(kernel_filename, kernel_cmdline, initrd_filename,
                                 below_4g_mem_size, above_4g_mem_size);
+        bochs_meminfo_bios_init(fw_cfg);
     }
 
     if (kvm_irqchip_in_kernel()) {
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 5fe14bb..2c14977 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -74,6 +74,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
     ICH9LPCState *ich9_lpc;
     PCIDevice *ahci;
     DeviceState *icc_bridge;
+    void *fw_cfg = NULL;
 
     icc_bridge = qdev_create(NULL, TYPE_ICC_BRIDGE);
     object_property_add_child(qdev_get_machine(), "icc-bridge",
@@ -97,8 +98,9 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
 
     /* allocate ram and load rom/bios */
     if (!xen_enabled()) {
-        pc_memory_init(kernel_filename, kernel_cmdline,
-                       initrd_filename, below_4g_mem_size, above_4g_mem_size);
+        fw_cfg = pc_memory_init(kernel_filename, kernel_cmdline,
+                                initrd_filename, below_4g_mem_size,
+                                above_4g_mem_size);
     }
 
     /* irq lines */
@@ -116,6 +118,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
     q35_host->mch.address_space_io = get_system_io();
     /* pci */
     qdev_init_nofail(DEVICE(q35_host));
+    bochs_meminfo_bios_init(fw_cfg);
     host_bus = q35_host->host.pci.bus;
     /* create ISA bus */
     lpc = pci_create_simple_multifunction(host_bus, PCI_DEVFN(ICH9_LPC_DEV,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 959b92b..4a29e6e 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -231,6 +231,7 @@ int pvpanic_init(ISABus *bus);
 #define E820_UNUSABLE   5
 
 int e820_add_entry(uint64_t, uint64_t, uint32_t);
+void bochs_meminfo_bios_init(void *fw_cfg);
 
 #define PC_COMPAT_1_5 \
         {\
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 2fb71af..2644faa 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -132,6 +132,7 @@ extern QEMUClock *rtc_clock;
 extern int nb_numa_nodes;
 extern uint64_t node_mem[MAX_NODES];
 extern unsigned long *node_cpumask[MAX_NODES];
+extern int nb_hp_dimms;
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 11/14] Introduce paravirt interface QEMU_CFG_PCI_WINDOW
  2013-06-26  9:13 [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug Hu Tao
                   ` (9 preceding siblings ...)
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 10/14] pc: Add dimm paravirt SRAT info Hu Tao
@ 2013-06-26  9:13 ` Hu Tao
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 12/14] Implement "info memory" and "query-memory" Hu Tao
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vasilis Liaskovitis

From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

Qemu calculates the 32-bit and 64-bit PCI starting offsets based on
initial memory and hotplug-able dimms. This info needs to be passed to Seabios
for PCI initialization.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 hw/i386/pc_q35.c          | 10 ++++++++++
 hw/pci-host/piix.c        | 13 +++++++++++++
 include/hw/nvram/fw_cfg.h |  1 +
 3 files changed, 24 insertions(+)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 2c14977..f956c62 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -42,6 +42,7 @@
 #include "hw/ide/ahci.h"
 #include "hw/usb.h"
 #include "hw/cpu/icc_bus.h"
+#include "hw/nvram/fw_cfg.h"
 
 /* ICH9 AHCI has 6 ports */
 #define MAX_SATA_PORTS     6
@@ -75,6 +76,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
     PCIDevice *ahci;
     DeviceState *icc_bridge;
     void *fw_cfg = NULL;
+    uint64_t *pci_window_fw_cfg;
 
     icc_bridge = qdev_create(NULL, TYPE_ICC_BRIDGE);
     object_property_add_child(qdev_get_machine(), "icc-bridge",
@@ -119,6 +121,14 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
     /* pci */
     qdev_init_nofail(DEVICE(q35_host));
     bochs_meminfo_bios_init(fw_cfg);
+
+    pci_window_fw_cfg = g_new0(uint64_t, 2);
+    pci_window_fw_cfg[0] = cpu_to_le64(MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT);
+    pci_window_fw_cfg[1] = cpu_to_le64(0x100000000ULL +
+                                       q35_host->mch.dev.above_4g_mem_size);
+    fw_cfg_add_bytes(fw_cfg, FW_CFG_PCI_WINDOW,
+                     (uint8_t *)pci_window_fw_cfg, 2 * 8);
+
     host_bus = q35_host->host.pci.bus;
     /* create ISA bus */
     lpc = pci_create_simple_multifunction(host_bus, PCI_DEVFN(ICH9_LPC_DEV,
diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 71a9b8b..ace8f2b 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -32,6 +32,7 @@
 #include "hw/xen/xen.h"
 #include "hw/pci-host/pam.h"
 #include "sysemu/sysemu.h"
+#include "hw/nvram/fw_cfg.h"
 
 /*
  * I440FX chipset data sheet.
@@ -229,6 +230,8 @@ static PCIBus *i440fx_common_init(const char *device_name,
     PIIX3State *piix3;
     I440FXPMCState *f;
     I440FXState *i440fx;
+    uint64_t *pci_window_fw_cfg;
+    void *fw_cfg;
 
     i440fx = I440FX_DEVICE(object_new(TYPE_I440FX_DEVICE));
     s = PCI_HOST_BRIDGE(i440fx);
@@ -266,6 +269,16 @@ static PCIBus *i440fx_common_init(const char *device_name,
     *piix3_devfn = piix3->dev.devfn;
     *pci_address_space = &i440fx->pci_address_space;
 
+    fw_cfg = fw_cfg_find();
+    if (fw_cfg) {
+        pci_window_fw_cfg = g_new0(uint64_t, 2);
+        pci_window_fw_cfg[0] = cpu_to_le64(f->dev.below_4g_mem_size);
+        pci_window_fw_cfg[1] = cpu_to_le64(0x100000000ULL +
+                                           f->dev.above_4g_mem_size);
+        fw_cfg_add_bytes(fw_cfg, FW_CFG_PCI_WINDOW,
+                         (uint8_t *)pci_window_fw_cfg, 2 * 8);
+    }
+
     return s->bus;
 }
 
diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
index f60dd67..41cfa32 100644
--- a/include/hw/nvram/fw_cfg.h
+++ b/include/hw/nvram/fw_cfg.h
@@ -35,6 +35,7 @@
 #define FW_CFG_SETUP_SIZE       0x17
 #define FW_CFG_SETUP_DATA       0x18
 #define FW_CFG_FILE_DIR         0x19
+#define FW_CFG_PCI_WINDOW       0x1a
 
 #define FW_CFG_FILE_FIRST       0x20
 #define FW_CFG_FILE_SLOTS       0x10
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 12/14] Implement "info memory" and "query-memory"
  2013-06-26  9:13 [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug Hu Tao
                   ` (10 preceding siblings ...)
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 11/14] Introduce paravirt interface QEMU_CFG_PCI_WINDOW Hu Tao
@ 2013-06-26  9:13 ` Hu Tao
  2013-06-28 20:27   ` Eric Blake
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 13/14] balloon: update with hotplugged memory Hu Tao
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vasilis Liaskovitis

From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

Returns total physical memory available to guest in bytes, including hotplugged
memory. Note that the number reported here may be different from what the guest
sees e.g. if the guest has not logically onlined hotplugged memory.

This functionality is provided independently of a balloon device, since a
guest can be using ACPI memory hotplug without using a balloon device.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 hmp-commands.hx               |  2 ++
 hmp.c                         |  8 ++++++++
 hmp.h                         |  1 +
 hw/mem-hotplug/dimm.c         | 14 ++++++++++++++
 include/hw/mem-hotplug/dimm.h |  1 +
 monitor.c                     |  7 +++++++
 qapi-schema.json              | 13 +++++++++++++
 qmp-commands.hx               | 22 ++++++++++++++++++++++
 vl.c                          |  9 +++++++++
 9 files changed, 77 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 915b0d1..b2937c2 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1653,6 +1653,8 @@ show device tree
 show qdev device model list
 @item info roms
 show roms
+@item info memory-total
+show memory-total
 @item info tpm
 show the TPM device
 @end table
diff --git a/hmp.c b/hmp.c
index 494a9aa..0371da9 100644
--- a/hmp.c
+++ b/hmp.c
@@ -675,6 +675,14 @@ void hmp_info_tpm(Monitor *mon, const QDict *qdict)
     qapi_free_TPMInfoList(info_list);
 }
 
+void hmp_info_memory(Monitor *mon, const QDict *qdict)
+{
+    MemoryInfo *mem;
+    mem = qmp_query_memory(NULL);
+    monitor_printf(mon, "MemTotal: %" PRIu64 "\n", mem->total);
+    qapi_free_MemoryInfo(mem);
+}
+
 void hmp_quit(Monitor *mon, const QDict *qdict)
 {
     monitor_suspend(mon);
diff --git a/hmp.h b/hmp.h
index 56d2e92..b5a5afb 100644
--- a/hmp.h
+++ b/hmp.h
@@ -37,6 +37,7 @@ void hmp_info_balloon(Monitor *mon, const QDict *qdict);
 void hmp_info_pci(Monitor *mon, const QDict *qdict);
 void hmp_info_block_jobs(Monitor *mon, const QDict *qdict);
 void hmp_info_tpm(Monitor *mon, const QDict *qdict);
+void hmp_info_memory(Monitor *mon, const QDict *qdict);
 void hmp_quit(Monitor *mon, const QDict *qdict);
 void hmp_stop(Monitor *mon, const QDict *qdict);
 void hmp_system_reset(Monitor *mon, const QDict *qdict);
diff --git a/hw/mem-hotplug/dimm.c b/hw/mem-hotplug/dimm.c
index 09cfc4b..da31a18 100644
--- a/hw/mem-hotplug/dimm.c
+++ b/hw/mem-hotplug/dimm.c
@@ -185,6 +185,20 @@ void dimm_setup_fwcfg_layout(uint64_t *fw_cfg_slots)
     }
 }
 
+uint64_t get_hp_memory_total(void)
+{
+    DimmBus *bus;
+    DimmDevice *slot;
+    uint64_t info = 0;
+
+    QLIST_FOREACH(bus, &memory_buses, next) {
+        QTAILQ_FOREACH(slot, &bus->dimmlist, nextdimm) {
+            info += slot->size;
+        }
+    }
+    return info;
+}
+
 static void dimm_realize(DeviceState *s, Error **errp)
 {
     DimmBus *bus = DIMM_BUS(qdev_get_parent_bus(s));
diff --git a/include/hw/mem-hotplug/dimm.h b/include/hw/mem-hotplug/dimm.h
index 8625ae6..3947539 100644
--- a/include/hw/mem-hotplug/dimm.h
+++ b/include/hw/mem-hotplug/dimm.h
@@ -73,5 +73,6 @@ DimmBus *dimm_bus_create(Object *parent, const char *name, uint32_t max_dimms,
     dimm_calcoffset_fn pmc_set_offset);
 void dimm_config_create(char *id, uint64_t size, const char *bus, uint64_t node,
         uint32_t dimm_idx);
+uint64_t get_hp_memory_total(void);
 
 #endif
diff --git a/monitor.c b/monitor.c
index 70ae8f5..14a955e 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2760,6 +2760,13 @@ static mon_cmd_t info_cmds[] = {
         .mhandler.cmd = hmp_info_tpm,
     },
     {
+        .name       = "memory",
+        .args_type  = "",
+        .params     = "",
+        .help       = "show memory informations including total memory size",
+        .mhandler.cmd = hmp_info_memory,
+    },
+    {
         .name       = NULL,
     },
 };
diff --git a/qapi-schema.json b/qapi-schema.json
index a80ee40..d2940dc 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3608,3 +3608,16 @@
             '*cpuid-input-ecx': 'int',
             'cpuid-register': 'X86CPURegister32',
             'features': 'int' } }
+
+##
+# @query-memory:
+#
+# Returns total memory in bytes, including hotplugged dimms
+#
+# Returns: int
+#
+# Since: 1.6
+##
+{ 'type': 'MemoryInfo',
+  'data': { 'total': 'int' } }
+{ 'command': 'query-memory', 'returns': 'MemoryInfo' }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 8cea5e5..58d7c7c 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2997,3 +2997,25 @@ Example:
 <- { "return": {} }
 
 EQMP
+
+    {
+        .name       = "query-memory",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_input_query_memory
+    },
+SQMP
+query-memory
+----------
+
+Return a json-object with the following information:
+
+- "total": total memory in bytes, including hotplugged dimms
+
+Example:
+
+-> { "execute": "query-memory" }
+<- {
+      "return": { "total": 1073741824 }
+   }
+
+EQMP
diff --git a/vl.c b/vl.c
index 9d88a79..8d2b129 100644
--- a/vl.c
+++ b/vl.c
@@ -671,6 +671,15 @@ StatusInfo *qmp_query_status(Error **errp)
     return info;
 }
 
+MemoryInfo *qmp_query_memory(Error **errp)
+{
+    MemoryInfo *info = g_malloc0(sizeof(*info));
+
+    info->total = ram_size + get_hp_memory_total();
+
+    return info;
+}
+
 /***********************************************************/
 /* real time host monotonic timer */
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 13/14] balloon: update with hotplugged memory
  2013-06-26  9:13 [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug Hu Tao
                   ` (11 preceding siblings ...)
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 12/14] Implement "info memory" and "query-memory" Hu Tao
@ 2013-06-26  9:13 ` Hu Tao
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 14/14] Implement dimm-info Hu Tao
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vasilis Liaskovitis

From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

query-balloon and "info balloon" should report total memory available to the
guest.

balloon inflate/ deflate can also use all memory available to the guest (initial
+ hotplugged memory)

Ballon driver has been minimaly tested with the patch, please review and test.

Caveat: if the guest does not online hotplugged-memory, it's easy for a balloon
inflate command to OOM a guest.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hw/virtio/virtio-balloon.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index d669756..c866000 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -30,6 +30,7 @@
 #endif
 
 #include "hw/virtio/virtio-bus.h"
+#include "hw/mem-hotplug/dimm.h"
 
 static void balloon_page(void *addr, int deflate)
 {
@@ -271,10 +272,11 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
     struct virtio_balloon_config config;
     uint32_t oldactual = dev->actual;
+    uint64_t hotplugged_ram_size = get_hp_memory_total();
     memcpy(&config, config_data, 8);
     dev->actual = le32_to_cpu(config.actual);
     if (dev->actual != oldactual) {
-        qemu_balloon_changed(ram_size -
+        qemu_balloon_changed(ram_size + hotplugged_ram_size -
                        ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
     }
 }
@@ -290,18 +292,21 @@ static void virtio_balloon_stat(void *opaque, BalloonInfo *info)
     VirtIOBalloon *dev = opaque;
     info->actual = ram_size - ((uint64_t) dev->actual <<
                                VIRTIO_BALLOON_PFN_SHIFT);
+    info->actual += get_hp_memory_total();
 }
 
 static void virtio_balloon_to_target(void *opaque, ram_addr_t target)
 {
     VirtIOBalloon *dev = VIRTIO_BALLOON(opaque);
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    uint64_t hotplugged_ram_size = get_hp_memory_total();
 
-    if (target > ram_size) {
-        target = ram_size;
+    if (target > ram_size + hotplugged_ram_size) {
+        target = ram_size + hotplugged_ram_size;
     }
     if (target) {
-        dev->num_pages = (ram_size - target) >> VIRTIO_BALLOON_PFN_SHIFT;
+        dev->num_pages = (ram_size + hotplugged_ram_size - target) >>
+                                 VIRTIO_BALLOON_PFN_SHIFT;
         virtio_notify_config(vdev);
     }
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 14/14] Implement dimm-info
  2013-06-26  9:13 [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug Hu Tao
                   ` (12 preceding siblings ...)
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 13/14] balloon: update with hotplugged memory Hu Tao
@ 2013-06-26  9:13 ` Hu Tao
  2013-06-28 20:28   ` Eric Blake
  2013-06-26  9:14 ` [Qemu-devel] [PATCH v5 0/7] support for ACPI memory hotplug Hu Tao
  2013-07-08  9:48 ` [Qemu-devel] [PATCH v5 00/14] " Andreas Färber
  15 siblings, 1 reply; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vasilis Liaskovitis

From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

"query-dimm-info" and "info dimm" will give current state of all dimms in the
system e.g.

dimm0: on
dimm1: off
dimm2: off
dimm3: on
etc.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 hmp-commands.hx       |  2 ++
 hmp.c                 | 17 +++++++++++++++++
 hmp.h                 |  1 +
 hw/mem-hotplug/dimm.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 monitor.c             |  7 +++++++
 qapi-schema.json      | 27 +++++++++++++++++++++++++++
 6 files changed, 97 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index b2937c2..6cb1285 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1657,6 +1657,8 @@ show roms
 show memory-total
 @item info tpm
 show the TPM device
+@item info dimm
+show dimm
 @end table
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index 0371da9..04586b3 100644
--- a/hmp.c
+++ b/hmp.c
@@ -683,6 +683,23 @@ void hmp_info_memory(Monitor *mon, const QDict *qdict)
     qapi_free_MemoryInfo(mem);
 }
 
+void hmp_info_dimm(Monitor *mon, const QDict *qdict)
+{
+    DimmInfoList *info;
+    DimmInfoList *item;
+    DimmInfo *dimm;
+
+    info = qmp_query_dimm_info(NULL);
+    for (item = info; item; item = item->next) {
+        dimm = item->value;
+        monitor_printf(mon, "dimm %s : %s\n", dimm->dimm,
+                       dimm->state ? "on" : "off");
+        dimm->dimm = NULL;
+    }
+
+    qapi_free_DimmInfoList(info);
+}
+
 void hmp_quit(Monitor *mon, const QDict *qdict)
 {
     monitor_suspend(mon);
diff --git a/hmp.h b/hmp.h
index b5a5afb..06cea4e 100644
--- a/hmp.h
+++ b/hmp.h
@@ -38,6 +38,7 @@ void hmp_info_pci(Monitor *mon, const QDict *qdict);
 void hmp_info_block_jobs(Monitor *mon, const QDict *qdict);
 void hmp_info_tpm(Monitor *mon, const QDict *qdict);
 void hmp_info_memory(Monitor *mon, const QDict *qdict);
+void hmp_info_dimm(Monitor *mon, const QDict *qdict);
 void hmp_quit(Monitor *mon, const QDict *qdict);
 void hmp_stop(Monitor *mon, const QDict *qdict);
 void hmp_system_reset(Monitor *mon, const QDict *qdict);
diff --git a/hw/mem-hotplug/dimm.c b/hw/mem-hotplug/dimm.c
index da31a18..4b08706 100644
--- a/hw/mem-hotplug/dimm.c
+++ b/hw/mem-hotplug/dimm.c
@@ -170,6 +170,18 @@ static DimmConfig *dimmcfg_find_from_name(DimmBus *bus, const char *name)
     return NULL;
 }
 
+static DimmDevice *dimm_find_from_name(DimmBus *bus, const char *name)
+{
+    DimmDevice *slot;
+
+    QTAILQ_FOREACH(slot, &bus->dimmlist, nextdimm) {
+        if (!strcmp(slot->qdev.id, name)) {
+            return slot;
+        }
+    }
+    return NULL;
+}
+
 void dimm_setup_fwcfg_layout(uint64_t *fw_cfg_slots)
 {
     DimmConfig *slot;
@@ -185,6 +197,37 @@ void dimm_setup_fwcfg_layout(uint64_t *fw_cfg_slots)
     }
 }
 
+DimmInfoList *qmp_query_dimm_info(Error **errp)
+{
+    DimmBus *bus;
+    DimmConfig *slot;
+    DimmInfoList *head = NULL, *info, *cur_item = NULL;
+
+    QLIST_FOREACH(bus, &memory_buses, next) {
+        QTAILQ_FOREACH(slot, &bus->dimmconfig_list, nextdimmcfg) {
+
+            info = g_malloc0(sizeof(*info));
+            info->value = g_malloc0(sizeof(*info->value));
+            info->value->dimm = g_malloc0(sizeof(char) * 32);
+            strcpy(info->value->dimm, slot->name);
+            if (dimm_find_from_name(bus, slot->name)) {
+                info->value->state = 1;
+            } else {
+                info->value->state = 0;
+            }
+            /* XXX: waiting for the qapi to support GSList */
+            if (!cur_item) {
+                head = cur_item = info;
+            } else {
+                cur_item->next = info;
+                cur_item = info;
+            }
+        }
+    }
+
+    return head;
+}
+
 uint64_t get_hp_memory_total(void)
 {
     DimmBus *bus;
diff --git a/monitor.c b/monitor.c
index 14a955e..c2b046c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2753,6 +2753,13 @@ static mon_cmd_t info_cmds[] = {
         .mhandler.cmd = do_trace_print_events,
     },
     {
+        .name       = "dimm",
+        .args_type  = "",
+        .params     = "",
+        .help       = "show active and non active dimms",
+        .mhandler.cmd = hmp_info_dimm,
+    },
+    {
         .name       = "tpm",
         .args_type  = "",
         .params     = "",
diff --git a/qapi-schema.json b/qapi-schema.json
index d2940dc..88bbfac 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3621,3 +3621,30 @@
 { 'type': 'MemoryInfo',
   'data': { 'total': 'int' } }
 { 'command': 'query-memory', 'returns': 'MemoryInfo' }
+
+##
+# @DimmInfo:
+#
+# Information about status of dimm device
+#
+# @dimm: the name of the dimm
+#
+# @state: 'true' means the dimm device is plugged in, 'false' means
+#         means the dimm device is plugged out.
+#
+# Since: 1.6
+#
+##
+{ 'type': 'DimmInfo',
+  'data': {'dimm': 'str', 'state': 'bool'} }
+
+##
+# @query-dimm-info:
+#
+# Returns the state of dimm devices
+#
+# Returns: list of @DimmInfo
+#
+# Since: 1.6
+##
+{ 'command': 'query-dimm-info', 'returns': ['DimmInfo'] }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 0/7] support for ACPI memory hotplug
  2013-06-26  9:13 [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug Hu Tao
                   ` (13 preceding siblings ...)
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 14/14] Implement dimm-info Hu Tao
@ 2013-06-26  9:14 ` Hu Tao
  2013-06-26  9:14   ` [Qemu-devel] [PATCH v5 1/7] Add ACPI_EXTRACT_DEVICE* macros Hu Tao
                     ` (7 more replies)
  2013-07-08  9:48 ` [Qemu-devel] [PATCH v5 00/14] " Andreas Färber
  15 siblings, 8 replies; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:14 UTC (permalink / raw)
  To: qemu-devel, seabios; +Cc: Vasilis Liaskovitis

This series adds memory devices and related methods to support
ACPI memory hotplug.

This series works with qemu counterpart. See qemu series for
a detailed description.

Hu Tao (2):
  set psize to 0 when romfile_loadfile failed
  acpi: generate hotplug memory devices

Vasilis Liaskovitis (5):
  Add ACPI_EXTRACT_DEVICE* macros
  Add SSDT memory device support
  acpi-dsdt: Implement functions for memory hotplug
  q35: Add memory hotplug handler
  pci: Use paravirt interface for pcimem_start and pcimem64_start

 Makefile                      |   2 +-
 src/acpi-dsdt-mem-hotplug.dsl |  57 ++++++++++++++++
 src/acpi-dsdt.dsl             |   5 +-
 src/acpi.c                    | 151 ++++++++++++++++++++++++++++++++++++++++--
 src/paravirt.c                |  15 +++++
 src/paravirt.h                |   1 +
 src/pciinit.c                 |   9 +++
 src/q35-acpi-dsdt.dsl         |   6 +-
 src/romfile.c                 |  13 ++--
 src/ssdt-mem.dsl              |  61 +++++++++++++++++
 tools/acpi_extract.py         |  28 ++++++++
 11 files changed, 333 insertions(+), 15 deletions(-)
 create mode 100644 src/acpi-dsdt-mem-hotplug.dsl
 create mode 100644 src/ssdt-mem.dsl

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 1/7] Add ACPI_EXTRACT_DEVICE* macros
  2013-06-26  9:14 ` [Qemu-devel] [PATCH v5 0/7] support for ACPI memory hotplug Hu Tao
@ 2013-06-26  9:14   ` Hu Tao
  2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 2/7] Add SSDT memory device support Hu Tao
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:14 UTC (permalink / raw)
  To: qemu-devel, seabios; +Cc: Vasilis Liaskovitis

From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

This allows to extract the beginning, end and name of a Device object.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 tools/acpi_extract.py | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/tools/acpi_extract.py b/tools/acpi_extract.py
index ab8ced6..8ad15d3 100755
--- a/tools/acpi_extract.py
+++ b/tools/acpi_extract.py
@@ -230,6 +230,28 @@ def aml_package_start(offset):
     offset += 1
     return offset + aml_pkglen_bytes(offset) + 1
 
+def aml_device_start(offset):
+    #0x5B 0x82 DeviceOp PkgLength NameString ProcID
+    if ((aml[offset] != 0x5B) or (aml[offset + 1] != 0x82)):
+        die( "Name offset 0x%x: expected 0x5B 0x83 actual 0x%x 0x%x" %
+             (offset, aml[offset], aml[offset + 1]));
+    return offset
+
+def aml_device_string(offset):
+    #0x5B 0x82 DeviceOp PkgLength NameString ProcID
+    start = aml_device_start(offset)
+    offset += 2
+    pkglenbytes = aml_pkglen_bytes(offset)
+    offset += pkglenbytes
+    return offset
+
+def aml_device_end(offset):
+    start = aml_device_start(offset)
+    offset += 2
+    pkglenbytes = aml_pkglen_bytes(offset)
+    pkglen = aml_pkglen(offset)
+    return offset + pkglen
+
 lineno = 0
 for line in fileinput.input():
     # Strip trailing newline
@@ -322,6 +344,12 @@ for i in range(len(asl)):
         offset = aml_processor_end(offset)
     elif (directive == "ACPI_EXTRACT_PKG_START"):
         offset = aml_package_start(offset)
+    elif (directive == "ACPI_EXTRACT_DEVICE_START"):
+        offset = aml_device_start(offset)
+    elif (directive == "ACPI_EXTRACT_DEVICE_STRING"):
+        offset = aml_device_string(offset)
+    elif (directive == "ACPI_EXTRACT_DEVICE_END"):
+        offset = aml_device_end(offset)
     else:
         die("Unsupported directive %s" % directive)
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 2/7] Add SSDT memory device support
  2013-06-26  9:14 ` [Qemu-devel] [PATCH v5 0/7] support for ACPI memory hotplug Hu Tao
  2013-06-26  9:14   ` [Qemu-devel] [PATCH v5 1/7] Add ACPI_EXTRACT_DEVICE* macros Hu Tao
@ 2013-06-26  9:15   ` Hu Tao
  2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 3/7] acpi-dsdt: Implement functions for memory hotplug Hu Tao
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:15 UTC (permalink / raw)
  To: qemu-devel, seabios; +Cc: Vasilis Liaskovitis

From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

Define SSDT hotplug-able memory devices in _SB namespace. The dynamically
generated SSDT includes per memory device hotplug methods. These methods
just call methods defined in the DSDT. Also dynamically generate a MTFY
method and a MEON array of the online/available memory devices.  ACPI
extraction macros are used to place the AML code in variables later used by
src/acpi. The design is taken from SSDT cpu generation.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 Makefile         |  2 +-
 src/ssdt-mem.dsl | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 1 deletion(-)
 create mode 100644 src/ssdt-mem.dsl

diff --git a/Makefile b/Makefile
index 759bbbb..b88fb90 100644
--- a/Makefile
+++ b/Makefile
@@ -223,7 +223,7 @@ $(OUT)%.hex: src/%.dsl ./tools/acpi_extract_preprocess.py ./tools/acpi_extract.p
 	$(Q)$(PYTHON) ./tools/acpi_extract.py $(OUT)$*.lst > $(OUT)$*.off
 	$(Q)cat $(OUT)$*.off > $@
 
-$(OUT)acpi.o: $(OUT)acpi-dsdt.hex $(OUT)ssdt-proc.hex $(OUT)ssdt-pcihp.hex $(OUT)ssdt-misc.hex $(OUT)q35-acpi-dsdt.hex
+$(OUT)acpi.o: $(OUT)acpi-dsdt.hex $(OUT)ssdt-proc.hex $(OUT)ssdt-pcihp.hex $(OUT)ssdt-misc.hex $(OUT)q35-acpi-dsdt.hex $(OUT)ssdt-mem.hex
 
 ################ Kconfig rules
 
diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl
new file mode 100644
index 0000000..d0f6203
--- /dev/null
+++ b/src/ssdt-mem.dsl
@@ -0,0 +1,61 @@
+/* This file is the basis for the ssdt_mem[] variable in src/acpi.c.
+ * It is similar in design to the ssdt_proc variable.
+ * It defines the contents of the per-dimm QWordMemory() object.  At
+ * runtime, a dynamically generated SSDT will contain one copy of this
+ * AML snippet for every possible memory device in the system.  The
+ * objects will * be placed in the \_SB_ namespace.
+ *
+ * In addition to the aml code generated from this file, the
+ * src/acpi.c file creates a MTFY method with an entry for each memdevice:
+ *     Method(MTFY, 2) {
+ *         If (LEqual(Arg0, 0x00)) { Notify(MP00, Arg1) }
+ *         If (LEqual(Arg0, 0x01)) { Notify(MP01, Arg1) }
+ *         ...
+ *     }
+ * and a MEON array with the list of active and inactive memory devices:
+ *     Name(MEON, Package() { One, One, ..., Zero, Zero, ... })
+ */
+ACPI_EXTRACT_ALL_CODE ssdm_mem_aml
+
+DefinitionBlock ("ssdt-mem.aml", "SSDT", 0x02, "BXPC", "CSSDT", 0x1)
+/*  v------------------ DO NOT EDIT ------------------v */
+{
+    ACPI_EXTRACT_DEVICE_START ssdt_mem_start
+    ACPI_EXTRACT_DEVICE_END ssdt_mem_end
+    ACPI_EXTRACT_DEVICE_STRING ssdt_mem_name
+    Device(MPAA) {
+        ACPI_EXTRACT_NAME_BYTE_CONST ssdt_mem_id
+        Name(ID, 0xAA)
+/*  ^------------------ DO NOT EDIT ------------------^
+ *
+ * The src/acpi.c code requires the above layout so that it can update
+ * MPAA and 0xAA with the appropriate MEMDEVICE id (see
+ * SD_OFFSET_MEMHEX/MEMID1/MEMID2).  Don't change the above without
+ * also updating the C code.
+ */
+        Name(_HID, EISAID("PNP0C80"))
+        Name(_PXM, 0xAA)
+
+        External(CMST, MethodObj)
+        External(MPEJ, MethodObj)
+
+        Name(_CRS, ResourceTemplate() {
+            QwordMemory(
+               ResourceConsumer,
+               ,                     // _DEC
+               MinFixed,             // _MIF
+               MaxFixed,             // _MAF
+               Cacheable,            // _MEM
+               ReadWrite,            // _RW
+               0x0,                  // _GRA
+               0xDEADBEEF,           // _MIN
+               0xE6ADBEEE,           // _MAX
+               0x00000000,           // _TRA
+               0x08000000,           // _LEN
+               )
+        })
+        Method (_STA, 0) {
+            Return(CMST(ID))
+        }
+    }
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 3/7] acpi-dsdt: Implement functions for memory hotplug
  2013-06-26  9:14 ` [Qemu-devel] [PATCH v5 0/7] support for ACPI memory hotplug Hu Tao
  2013-06-26  9:14   ` [Qemu-devel] [PATCH v5 1/7] Add ACPI_EXTRACT_DEVICE* macros Hu Tao
  2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 2/7] Add SSDT memory device support Hu Tao
@ 2013-06-26  9:15   ` Hu Tao
  2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 4/7] set psize to 0 when romfile_loadfile failed Hu Tao
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:15 UTC (permalink / raw)
  To: qemu-devel, seabios; +Cc: Vasilis Liaskovitis

From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

Extend the DSDT to include methods for handling memory hot-add and hot-remove
notifications and memory device status requests. These functions are called
from the memory device SSDT methods.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 src/acpi-dsdt-mem-hotplug.dsl | 57 +++++++++++++++++++++++++++++++++++++++++++
 src/acpi-dsdt.dsl             |  5 +++-
 2 files changed, 61 insertions(+), 1 deletion(-)
 create mode 100644 src/acpi-dsdt-mem-hotplug.dsl

diff --git a/src/acpi-dsdt-mem-hotplug.dsl b/src/acpi-dsdt-mem-hotplug.dsl
new file mode 100644
index 0000000..1d82532
--- /dev/null
+++ b/src/acpi-dsdt-mem-hotplug.dsl
@@ -0,0 +1,57 @@
+/****************************************************************
+ * Memory hotplug
+ ****************************************************************/
+
+Scope(\_SB) {
+        /* Objects filled in by run-time generated SSDT */
+        External(MTFY, MethodObj)
+        External(MEON, PkgObj)
+
+        Method (CMST, 1, NotSerialized) {
+            // _STA method - return ON status of memdevice
+            // Local0 = MEON flag for this cpu
+            Store(DerefOf(Index(MEON, Arg0)), Local0)
+            If (Local0) { Return(0xF) } Else { Return(0x0) }
+        }
+
+        /* Memory hotplug notify array */
+        OperationRegion(MEST, SystemIO, 0xaf80, 32)
+        Field (MEST, ByteAcc, NoLock, Preserve)
+        {
+            MES, 256
+        }
+
+        Method(MESC, 0) {
+            // Local5 = active memdevice bitmap
+            Store (MES, Local5)
+            // Local2 = last read byte from bitmap
+            Store (Zero, Local2)
+            // Local0 = memory device iterator
+            Store (Zero, Local0)
+            While (LLess(Local0, SizeOf(MEON))) {
+                // Local1 = MEON flag for this memory device
+                Store(DerefOf(Index(MEON, Local0)), Local1)
+                If (And(Local0, 0x07)) {
+                    // Shift down previously read bitmap byte
+                    ShiftRight(Local2, 1, Local2)
+                } Else {
+                    // Read next byte from memdevice bitmap
+                    Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), Local2)
+                }
+                // Local3 = active state for this memory device
+                Store(And(Local2, 1), Local3)
+
+                If (LNotEqual(Local1, Local3)) {
+                    // State change - update MEON with new state
+                    Store(Local3, Index(MEON, Local0))
+                    // Do MEM notify
+                    If (LEqual(Local3, 1)) {
+                        MTFY(Local0, 1)
+                    }
+                }
+                Increment(Local0)
+            }
+            Return(One)
+        }
+
+}
diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 158f6b4..98c9413 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -294,6 +294,7 @@ DefinitionBlock (
     }
 
 #include "acpi-dsdt-cpu-hotplug.dsl"
+#include "acpi-dsdt-mem-hotplug.dsl"
 
 
 /****************************************************************
@@ -313,7 +314,9 @@ DefinitionBlock (
             // CPU hotplug event
             \_SB.PRSC()
         }
-        Method(_L03) {
+        Method(_E03) {
+            // Memory hotplug event
+            \_SB.MESC()
         }
         Method(_L04) {
         }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 4/7] set psize to 0 when romfile_loadfile failed
  2013-06-26  9:14 ` [Qemu-devel] [PATCH v5 0/7] support for ACPI memory hotplug Hu Tao
                     ` (2 preceding siblings ...)
  2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 3/7] acpi-dsdt: Implement functions for memory hotplug Hu Tao
@ 2013-06-26  9:15   ` Hu Tao
  2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 5/7] acpi: generate hotplug memory devices Hu Tao
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:15 UTC (permalink / raw)
  To: qemu-devel, seabios; +Cc: Vasilis Liaskovitis

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 src/romfile.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/romfile.c b/src/romfile.c
index ea71d1f..b1b89bb 100644
--- a/src/romfile.c
+++ b/src/romfile.c
@@ -51,28 +51,33 @@ romfile_loadfile(const char *name, int *psize)
 {
     struct romfile_s *file = romfile_find(name);
     if (!file)
-        return NULL;
+        goto failed;
 
     int filesize = file->size;
     if (!filesize)
-        return NULL;
+        goto failed;
 
     char *data = malloc_tmphigh(filesize+1);
     if (!data) {
         warn_noalloc();
-        return NULL;
+        goto failed;
     }
 
     dprintf(5, "Copying romfile '%s' (len %d)\n", name, filesize);
     int ret = file->copy(file, data, filesize);
     if (ret < 0) {
         free(data);
-        return NULL;
+        goto failed;
     }
     if (psize)
         *psize = filesize;
     data[filesize] = '\0';
     return data;
+
+failed:
+    if (psize)
+        *psize = 0;
+    return NULL;
 }
 
 // Attempt to load an integer from the given file - return 'defval'
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 5/7] acpi: generate hotplug memory devices
  2013-06-26  9:14 ` [Qemu-devel] [PATCH v5 0/7] support for ACPI memory hotplug Hu Tao
                     ` (3 preceding siblings ...)
  2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 4/7] set psize to 0 when romfile_loadfile failed Hu Tao
@ 2013-06-26  9:15   ` Hu Tao
  2013-07-12 10:07     ` Igor Mammedov
  2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 6/7] q35: Add memory hotplug handler Hu Tao
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:15 UTC (permalink / raw)
  To: qemu-devel, seabios; +Cc: Vasilis Liaskovitis

The memory device generation is guided by qemu paravirt info. Seabios
uses the info to setup SRAT entries for the hotplug-able memory slots,
and to generate appropriate memory device objects. One memory device
(and corresponding SRAT entry) is generated for each hotplug-able qemu
memslot. Currently no SSDT memory device is created for initial system
memory.

We only support up to 255 DIMMs for now (PackageOp used for the MEON
array can only describe an array of at most 255 elements. VarPackageOp
would be needed to support more than 255 devices)

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 src/acpi.c     | 151 ++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 src/paravirt.c |   8 +++
 2 files changed, 152 insertions(+), 7 deletions(-)

diff --git a/src/acpi.c b/src/acpi.c
index ce988e0..e9a0326 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -15,6 +15,8 @@
 #include "config.h" // CONFIG_*
 #include "paravirt.h" // RamSize
 #include "dev-q35.h"
+#include "memmap.h"
+#include "paravirt.h"
 
 #include "acpi-dsdt.hex"
 
@@ -250,11 +252,23 @@ encodeLen(u8 *ssdt_ptr, int length, int bytes)
 #define PCIHP_AML (ssdp_pcihp_aml + *ssdt_pcihp_start)
 #define PCI_SLOTS 32
 
+/* 0x5B 0x82 DeviceOp PkgLength NameString DimmID */
+#define MEM_BASE 0xaf80
+#define MEM_AML (ssdm_mem_aml + *ssdt_mem_start)
+#define MEM_SIZEOF (*ssdt_mem_end - *ssdt_mem_start)
+#define MEM_OFFSET_HEX (*ssdt_mem_name - *ssdt_mem_start + 2)
+#define MEM_OFFSET_ID (*ssdt_mem_id - *ssdt_mem_start)
+#define MEM_OFFSET_PXM 31
+#define MEM_OFFSET_START 55
+#define MEM_OFFSET_END   63
+#define MEM_OFFSET_SIZE  79
+
 #define SSDT_SIGNATURE 0x54445353 // SSDT
 #define SSDT_HEADER_LENGTH 36
 
 #include "ssdt-misc.hex"
 #include "ssdt-pcihp.hex"
+#include "ssdt-mem.hex"
 
 #define PCI_RMV_BASE 0xae0c
 
@@ -306,9 +320,100 @@ static void patch_pcihp(int slot, u8 *ssdt_ptr, u32 eject)
     }
 }
 
+static void build_memdev(u8 *ssdt_ptr, int i, u64 mem_base, u64 mem_len, u8 node)
+{
+    memcpy(ssdt_ptr, MEM_AML, MEM_SIZEOF);
+    ssdt_ptr[MEM_OFFSET_HEX] = getHex(i >> 4);
+    ssdt_ptr[MEM_OFFSET_HEX+1] = getHex(i);
+    ssdt_ptr[MEM_OFFSET_ID] = i;
+    ssdt_ptr[MEM_OFFSET_PXM] = node;
+    *(u64*)(ssdt_ptr + MEM_OFFSET_START) = cpu_to_le64(mem_base);
+    *(u64*)(ssdt_ptr + MEM_OFFSET_END) = cpu_to_le64(mem_base + mem_len);
+    *(u64*)(ssdt_ptr + MEM_OFFSET_SIZE) = cpu_to_le64(mem_len);
+}
+
+static u8 *build_memssdt(u8 *ssdt_ptr, int memssdt_len,
+                         u64 *numadimmsmap, int nb_memdevs)
+{
+    u64 mem_base, mem_len;
+    u64 *dimm = numadimmsmap;
+    int node;
+    int i;
+
+    // build Scope(_SB_) header
+    *(ssdt_ptr++) = 0x10; // ScopeOp
+    ssdt_ptr = encodeLen(ssdt_ptr, memssdt_len, 3);
+    *(ssdt_ptr++) = '_';
+    *(ssdt_ptr++) = 'S';
+    *(ssdt_ptr++) = 'B';
+    *(ssdt_ptr++) = '_';
+
+    for (i = 0; i < nb_memdevs; i++) {
+        mem_base = *dimm++;
+        mem_len = *dimm++;
+        node = *dimm++;
+        build_memdev(ssdt_ptr, i, mem_base, mem_len, node);
+        ssdt_ptr += MEM_SIZEOF;
+    }
+
+    // build "Method(MTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CM00, Arg1)} ...}"
+    *(ssdt_ptr++) = 0x14; // MethodOp
+    ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*nb_memdevs), 2);
+    *(ssdt_ptr++) = 'M';
+    *(ssdt_ptr++) = 'T';
+    *(ssdt_ptr++) = 'F';
+    *(ssdt_ptr++) = 'Y';
+    *(ssdt_ptr++) = 0x02;
+    for (i=0; i<nb_memdevs; i++) {
+        *(ssdt_ptr++) = 0xA0; // IfOp
+        ssdt_ptr = encodeLen(ssdt_ptr, 11, 1);
+        *(ssdt_ptr++) = 0x93; // LEqualOp
+        *(ssdt_ptr++) = 0x68; // Arg0Op
+        *(ssdt_ptr++) = 0x0A; // BytePrefix
+        *(ssdt_ptr++) = i;
+        *(ssdt_ptr++) = 0x86; // NotifyOp
+        *(ssdt_ptr++) = 'M';
+        *(ssdt_ptr++) = 'P';
+        *(ssdt_ptr++) = getHex(i >> 4);
+        *(ssdt_ptr++) = getHex(i);
+        *(ssdt_ptr++) = 0x69; // Arg1Op
+    }
+
+    // build "Name(MEON, Package() { One, One, ..., Zero, Zero, ... })"
+    *(ssdt_ptr++) = 0x08; // NameOp
+    *(ssdt_ptr++) = 'M';
+    *(ssdt_ptr++) = 'E';
+    *(ssdt_ptr++) = 'O';
+    *(ssdt_ptr++) = 'N';
+    *(ssdt_ptr++) = 0x12; // PackageOp
+    ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*nb_memdevs), 2);
+    *(ssdt_ptr++) = nb_memdevs;
+
+    dimm = numadimmsmap;
+    u8 memslot_status = 0, enabled;
+
+    for (i = 0; i < nb_memdevs; i++) {
+        enabled = 0;
+        if (i % 8 == 0)
+            memslot_status = inb(MEM_BASE + i/8);
+        enabled = memslot_status & 1;
+        mem_base = *dimm++;
+        mem_len = *dimm++;
+        dimm++;  // node
+        *(ssdt_ptr++) = enabled ? 0x01 : 0x00;
+        if (enabled)
+            add_e820(mem_base, mem_len, E820_RAM);
+        memslot_status = memslot_status >> 1;
+    }
+
+    return ssdt_ptr;
+}
+
 static void*
 build_ssdt(void)
 {
+    int nb_memdevs;
+    u64 *numadimmsmap;
     int acpi_cpus = MaxCountCPUs > 0xff ? 0xff : MaxCountCPUs;
     int length = (sizeof(ssdp_misc_aml)                     // _S3_ / _S4_ / _S5_
                   + (1+3+4)                                 // Scope(_SB_)
@@ -318,9 +423,20 @@ build_ssdt(void)
                   + (1+3+4)                                 // Scope(PCI0)
                   + ((PCI_SLOTS - 1) * PCIHP_SIZEOF)        // slots
                   + (1+2+5+(12*(PCI_SLOTS - 1))));          // PCNT
-    u8 *ssdt = malloc_high(length);
+
+    numadimmsmap = romfile_loadfile("etc/numa-dimm-map", &nb_memdevs);
+    nb_memdevs /= 3 * sizeof(u64);
+
+    // for build_memssdt
+    int memssdt_length = (1+3+4)
+                         + (nb_memdevs * MEM_SIZEOF)
+                         + (1+2+5+(12*nb_memdevs))
+                         + (6+2+1+(1*nb_memdevs));
+
+    u8 *ssdt = malloc_high(length + memssdt_length);
     if (! ssdt) {
         warn_noalloc();
+        free(numadimmsmap);
         return NULL;
     }
     u8 *ssdt_ptr = ssdt;
@@ -411,10 +527,13 @@ build_ssdt(void)
 
     ssdt_ptr = build_notify(ssdt_ptr, "PCNT", 1, PCI_SLOTS, "S00_", 1);
 
+    ssdt_ptr = build_memssdt(ssdt_ptr, memssdt_length, numadimmsmap, nb_memdevs);
+
     build_header((void*)ssdt, SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
 
     //hexdump(ssdt, ssdt_ptr - ssdt);
 
+    free(numadimmsmap);
     return ssdt;
 }
 
@@ -458,7 +577,7 @@ acpi_build_srat_memory(struct srat_memory_affinity *numamem,
     numamem->length = sizeof(*numamem);
     memset(numamem->proximity, 0, 4);
     numamem->proximity[0] = node;
-    numamem->flags = cpu_to_le32(!!enabled);
+    numamem->flags = cpu_to_le32(!!enabled) | cpu_to_le32(0x2);
     numamem->base_addr = cpu_to_le64(base);
     numamem->range_length = cpu_to_le64(len);
 }
@@ -466,18 +585,22 @@ acpi_build_srat_memory(struct srat_memory_affinity *numamem,
 static void *
 build_srat(void)
 {
-    int numadatasize, numacpusize;
+    int numadatasize, numacpusize, nb_numa_dimms;
+    u64 *numadimmsmap;
     u64 *numadata = romfile_loadfile("etc/numa-nodes", &numadatasize);
     u64 *numacpumap = romfile_loadfile("etc/numa-cpu-map", &numacpusize);
-    if (!numadata || !numacpumap)
-        goto fail;
+
     int max_cpu = numacpusize / sizeof(u64);
     int nb_numa_nodes = numadatasize / sizeof(u64);
 
+    numadimmsmap = romfile_loadfile("etc/numa-dimm-map", &nb_numa_dimms);
+
+    nb_numa_dimms /=  3 * sizeof(u64);
+
     struct system_resource_affinity_table *srat;
     int srat_size = sizeof(*srat) +
         sizeof(struct srat_processor_affinity) * max_cpu +
-        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + 2);
+        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + nb_numa_dimms + 2);
 
     srat = malloc_high(srat_size);
     if (!srat) {
@@ -512,6 +635,7 @@ build_srat(void)
      */
     struct srat_memory_affinity *numamem = (void*)core;
     int slots = 0;
+    int node;
     u64 mem_len, mem_base, next_base = 0;
 
     acpi_build_srat_memory(numamem, 0, 640*1024, 0, 1);
@@ -541,7 +665,18 @@ build_srat(void)
         numamem++;
         slots++;
     }
-    for (; slots < nb_numa_nodes + 2; slots++) {
+    if (nb_numa_dimms) {
+        for (i = 1; i < nb_numa_dimms + 1; ++i) {
+            mem_base = *numadimmsmap++;
+            mem_len = *numadimmsmap++;
+            node = *numadimmsmap++;
+            acpi_build_srat_memory(numamem, mem_base, mem_len, node, 1);
+            numamem++;
+            slots++;
+        }
+    }
+
+    for (; slots < nb_numa_nodes + nb_numa_dimms + 2; slots++) {
         acpi_build_srat_memory(numamem, 0, 0, 0, 0);
         numamem++;
     }
@@ -550,10 +685,12 @@ build_srat(void)
 
     free(numadata);
     free(numacpumap);
+    free(numadimmsmap);
     return srat;
 fail:
     free(numadata);
     free(numacpumap);
+    free(numadimmsmap);
     return NULL;
 }
 
diff --git a/src/paravirt.c b/src/paravirt.c
index d1a5d3e..5925c63 100644
--- a/src/paravirt.c
+++ b/src/paravirt.c
@@ -240,6 +240,14 @@ qemu_cfg_legacy(void)
                      , sizeof(numacount) + max_cpu*sizeof(u64)
                      , numacount*sizeof(u64));
 
+    u64 dimm_count;
+    qemu_cfg_select(QEMU_CFG_NUMA);
+    qemu_cfg_skip((1 + max_cpu + numacount) * sizeof(u64));
+    qemu_cfg_read(&dimm_count, sizeof(dimm_count));
+    qemu_romfile_add("etc/numa-dimm-map", QEMU_CFG_NUMA
+                     , (2 + max_cpu + numacount) * sizeof(u64),
+                     dimm_count * 3 * sizeof(u64));
+
     // e820 data
     u32 count32;
     qemu_cfg_read_entry(&count32, QEMU_CFG_E820_TABLE, sizeof(count32));
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 6/7] q35: Add memory hotplug handler
  2013-06-26  9:14 ` [Qemu-devel] [PATCH v5 0/7] support for ACPI memory hotplug Hu Tao
                     ` (4 preceding siblings ...)
  2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 5/7] acpi: generate hotplug memory devices Hu Tao
@ 2013-06-26  9:15   ` Hu Tao
  2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 7/7] pci: Use paravirt interface for pcimem_start and pcimem64_start Hu Tao
  2013-07-07  8:36   ` [Qemu-devel] [PATCH v5 0/7] support for ACPI memory hotplug Michael S. Tsirkin
  7 siblings, 0 replies; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:15 UTC (permalink / raw)
  To: qemu-devel, seabios; +Cc: Vasilis Liaskovitis

From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 src/q35-acpi-dsdt.dsl | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/q35-acpi-dsdt.dsl b/src/q35-acpi-dsdt.dsl
index c031d83..5b28d72 100644
--- a/src/q35-acpi-dsdt.dsl
+++ b/src/q35-acpi-dsdt.dsl
@@ -403,7 +403,7 @@ DefinitionBlock (
     }
 
 #include "acpi-dsdt-cpu-hotplug.dsl"
-
+#include "acpi-dsdt-mem-hotplug.dsl"
 
 /****************************************************************
  * General purpose events
@@ -418,7 +418,9 @@ DefinitionBlock (
             // CPU hotplug event
             \_SB.PRSC()
         }
-        Method(_L02) {
+        Method(_E02) {
+            // Memory hotplug event
+            \_SB.MESC()
         }
         Method(_L03) {
         }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v5 7/7] pci: Use paravirt interface for pcimem_start and pcimem64_start
  2013-06-26  9:14 ` [Qemu-devel] [PATCH v5 0/7] support for ACPI memory hotplug Hu Tao
                     ` (5 preceding siblings ...)
  2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 6/7] q35: Add memory hotplug handler Hu Tao
@ 2013-06-26  9:15   ` Hu Tao
  2013-07-15 20:11     ` Vasilis Liaskovitis
  2013-07-07  8:36   ` [Qemu-devel] [PATCH v5 0/7] support for ACPI memory hotplug Michael S. Tsirkin
  7 siblings, 1 reply; 56+ messages in thread
From: Hu Tao @ 2013-06-26  9:15 UTC (permalink / raw)
  To: qemu-devel, seabios; +Cc: Vasilis Liaskovitis

From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

Initialize the 32-bit and 64-bit pci starting offsets from values passed in by
the qemu paravirt interface QEMU_CFG_PCI_WINDOW. Qemu calculates the starting
offsets based on initial memory and hotplug-able dimms.
It's possible to avoid the new paravirt interface, and calculate pci ranges from
srat entries. But the code changes are ugly, see:
http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg03548.html

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 src/paravirt.c | 7 +++++++
 src/paravirt.h | 1 +
 src/pciinit.c  | 9 +++++++++
 3 files changed, 17 insertions(+)

diff --git a/src/paravirt.c b/src/paravirt.c
index 5925c63..9c1e511 100644
--- a/src/paravirt.c
+++ b/src/paravirt.c
@@ -134,6 +134,7 @@ qemu_platform_setup(void)
 #define QEMU_CFG_BOOT_MENU              0x0e
 #define QEMU_CFG_MAX_CPUS               0x0f
 #define QEMU_CFG_FILE_DIR               0x19
+#define QEMU_CFG_PCI_WINDOW             0x1a
 #define QEMU_CFG_ARCH_LOCAL             0x8000
 #define QEMU_CFG_ACPI_TABLES            (QEMU_CFG_ARCH_LOCAL + 0)
 #define QEMU_CFG_SMBIOS_ENTRIES         (QEMU_CFG_ARCH_LOCAL + 1)
@@ -339,3 +340,9 @@ void qemu_cfg_init(void)
                          , 0, be32_to_cpu(qfile.size));
     }
 }
+
+void qemu_cfg_get_pci_offsets(u64 *pcimem_start, u64 *pcimem64_start)
+{
+    qemu_cfg_read_entry(pcimem_start, QEMU_CFG_PCI_WINDOW, sizeof(u64));
+    qemu_cfg_read((u8*)(pcimem64_start), sizeof(u64));
+}
diff --git a/src/paravirt.h b/src/paravirt.h
index fce5af9..2c37d0d 100644
--- a/src/paravirt.h
+++ b/src/paravirt.h
@@ -27,5 +27,6 @@ static inline int runningOnKVM(void) {
 void qemu_preinit(void);
 void qemu_platform_setup(void);
 void qemu_cfg_init(void);
+void qemu_cfg_get_pci_offsets(u64 *pcimem_start, u64 *pcimem64_start);
 
 #endif
diff --git a/src/pciinit.c b/src/pciinit.c
index 8370b96..7e63c5e 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -805,6 +805,7 @@ static void pci_bios_map_devices(struct pci_bus *busses)
 void
 pci_setup(void)
 {
+    u64 pv_pcimem_start, pv_pcimem64_start;
     if (!CONFIG_QEMU)
         return;
 
@@ -837,6 +838,14 @@ pci_setup(void)
 
     pci_bios_init_devices();
 
+    /* if qemu gives us other pci window values, it means there are hotplug-able
+     * dimms. Adjust accordingly */
+    qemu_cfg_get_pci_offsets(&pv_pcimem_start, &pv_pcimem64_start);
+    if (pv_pcimem_start > pcimem_start)
+        pcimem_start = pv_pcimem_start;
+    if (pv_pcimem64_start > pcimem64_start)
+        pcimem64_start = pv_pcimem64_start;
+
     free(busses);
 
     pci_enable_default_vga();
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm" Hu Tao
@ 2013-06-26  9:46   ` Paolo Bonzini
  2013-06-27  5:08     ` Wanlong Gao
  0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2013-06-26  9:46 UTC (permalink / raw)
  To: Hu Tao
  Cc: Vasilis Liaskovitis, Bandan Das, qemu-devel, Wanlong Gao,
	Eduardo Habkost

Il 26/06/2013 11:13, Hu Tao ha scritto:
> From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> 
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> ---
>  vl.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/vl.c b/vl.c
> index 767e020..9d88a79 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -170,6 +170,7 @@ int main(int argc, char **argv)
>  
>  #include "ui/qemu-spice.h"
>  #include "qapi/string-input-visitor.h"
> +#include "hw/mem-hotplug/dimm.h"
>  
>  //#define DEBUG_NET
>  //#define DEBUG_SLIRP
> @@ -252,6 +253,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
>  int nb_numa_nodes;
>  uint64_t node_mem[MAX_NODES];
>  unsigned long *node_cpumask[MAX_NODES];
> +int nb_hp_dimms;
>  
>  uint8_t qemu_uuid[16];
>  
> @@ -2338,6 +2340,50 @@ static int chardev_init_func(QemuOpts *opts, void *opaque)
>      return 0;
>  }
>  
> +static int dimmcfg_init_func(QemuOpts *opts, void *opaque)
> +{
> +    const char *driver;
> +    const char *id;
> +    uint64_t node, size;
> +    uint32_t populated;
> +    const char *buf, *busbuf;
> +
> +    /* DimmDevice configuration needs to be known in order to initialize chipset
> +     * with correct memory and pci ranges. But all devices are created after
> +     * chipset / machine initialization. In * order to avoid this problem, we
> +     * parse dimm information earlier into dimmcfg structs. */
> +
> +    driver = qemu_opt_get(opts, "driver");
> +    if (!strcmp(driver, "dimm")) {
> +
> +        id = qemu_opts_id(opts);
> +        buf = qemu_opt_get(opts, "size");
> +        parse_option_size("size", buf, &size, NULL);
> +        buf = qemu_opt_get(opts, "node");
> +        parse_option_number("node", buf, &node, NULL);
> +        busbuf = qemu_opt_get(opts, "bus");
> +        buf = qemu_opt_get(opts, "populated");
> +        if (!buf) {
> +            populated = 0;
> +        } else {
> +            populated = strcmp(buf, "on") ? 0 : 1;
> +        }
> +
> +        dimm_config_create((char *)id, size, busbuf ? busbuf : "membus.0",
> +                           node, nb_hp_dimms);
> +
> +        /* if !populated, we just keep the config. The real device
> +         * will be created in the future with a normal device_add
> +         * command. */
> +        if (!populated) {
> +            qemu_opts_del(opts);
> +        }

I think you need another option than -device dimm.  For example it could
be declared together with the NUMA node.  This could declare two NUMA
nodes, each with 2G of populated and 2G of unpopulated RAM:

   -numa node,mem-range=0-2G,mem-range-hotplug=4G-6G \
   -numa node,mem-range=2G-4G,mem-range-hotplug=6G-8G

I'm not sure I like the names particularly though.  CCing Eduardo,
Bandan and Wanlong Gao.

Paolo

> +        nb_hp_dimms++;
> +    }
> +
> +    return 0;
> +}
> +
>  #ifdef CONFIG_VIRTFS
>  static int fsdev_init_func(QemuOpts *opts, void *opaque)
>  {
> @@ -4260,6 +4306,11 @@ int main(int argc, char **argv, char **envp)
>      }
>      qemu_add_globals();
>  
> +    /* init generic devices */
> +    if (qemu_opts_foreach(qemu_find_opts("device"),
> +           dimmcfg_init_func, NULL, 1) != 0) {
> +        exit(1);
> +    }
>      qdev_machine_init();
>  
>      QEMUMachineInitArgs args = { .ram_size = ram_size,
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-06-26  9:46   ` Paolo Bonzini
@ 2013-06-27  5:08     ` Wanlong Gao
  2013-06-27  6:55       ` Paolo Bonzini
  0 siblings, 1 reply; 56+ messages in thread
From: Wanlong Gao @ 2013-06-27  5:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Eduardo Habkost, Hu Tao, qemu-devel, Vasilis Liaskovitis,
	Bandan Das, Wanlong Gao

On 06/26/2013 05:46 PM, Paolo Bonzini wrote:
> Il 26/06/2013 11:13, Hu Tao ha scritto:
>> From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
>>
>> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
>> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
>> ---
>>  vl.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 51 insertions(+)
>>
>> diff --git a/vl.c b/vl.c
>> index 767e020..9d88a79 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -170,6 +170,7 @@ int main(int argc, char **argv)
>>  
>>  #include "ui/qemu-spice.h"
>>  #include "qapi/string-input-visitor.h"
>> +#include "hw/mem-hotplug/dimm.h"
>>  
>>  //#define DEBUG_NET
>>  //#define DEBUG_SLIRP
>> @@ -252,6 +253,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
>>  int nb_numa_nodes;
>>  uint64_t node_mem[MAX_NODES];
>>  unsigned long *node_cpumask[MAX_NODES];
>> +int nb_hp_dimms;
>>  
>>  uint8_t qemu_uuid[16];
>>  
>> @@ -2338,6 +2340,50 @@ static int chardev_init_func(QemuOpts *opts, void *opaque)
>>      return 0;
>>  }
>>  
>> +static int dimmcfg_init_func(QemuOpts *opts, void *opaque)
>> +{
>> +    const char *driver;
>> +    const char *id;
>> +    uint64_t node, size;
>> +    uint32_t populated;
>> +    const char *buf, *busbuf;
>> +
>> +    /* DimmDevice configuration needs to be known in order to initialize chipset
>> +     * with correct memory and pci ranges. But all devices are created after
>> +     * chipset / machine initialization. In * order to avoid this problem, we
>> +     * parse dimm information earlier into dimmcfg structs. */
>> +
>> +    driver = qemu_opt_get(opts, "driver");
>> +    if (!strcmp(driver, "dimm")) {
>> +
>> +        id = qemu_opts_id(opts);
>> +        buf = qemu_opt_get(opts, "size");
>> +        parse_option_size("size", buf, &size, NULL);
>> +        buf = qemu_opt_get(opts, "node");
>> +        parse_option_number("node", buf, &node, NULL);
>> +        busbuf = qemu_opt_get(opts, "bus");
>> +        buf = qemu_opt_get(opts, "populated");
>> +        if (!buf) {
>> +            populated = 0;
>> +        } else {
>> +            populated = strcmp(buf, "on") ? 0 : 1;
>> +        }
>> +
>> +        dimm_config_create((char *)id, size, busbuf ? busbuf : "membus.0",
>> +                           node, nb_hp_dimms);
>> +
>> +        /* if !populated, we just keep the config. The real device
>> +         * will be created in the future with a normal device_add
>> +         * command. */
>> +        if (!populated) {
>> +            qemu_opts_del(opts);
>> +        }
> 
> I think you need another option than -device dimm.  For example it could
> be declared together with the NUMA node.  This could declare two NUMA
> nodes, each with 2G of populated and 2G of unpopulated RAM:
> 
>    -numa node,mem-range=0-2G,mem-range-hotplug=4G-6G \
>    -numa node,mem-range=2G-4G,mem-range-hotplug=6G-8G
> 
> I'm not sure I like the names particularly though.  CCing Eduardo,
> Bandan and Wanlong Gao.

Do we really need to specify the memory range? I suspect that we can
follow current design of normal memory in hot-plug memory. Currently,
we just specify the size of normal memory in each node, and the range
in normal memory is node by node. Then I think we can just specify
the memory size of hot-plug in each node, then the hot-plug memory
range is also node by node, and the whole hot-plug memory block is
just located after the normal memory block. If so, the option can
come like:
    -numa node,nodeid=0,mem=2G,cpus=0-1,mem-hotplug=2G,mem-policy=membind,mem-hostnode=0-1,mem-hotplug-policy=interleave,mem-hotplug-hostnode=1
    -numa node,nodeid=1,mem=2G,cpus=2-3,mem-hotplug=2G,mem-policy=preferred,mem-hostnode=1,mem-hotplug-policy=membind,mem-hotplug-hostnode=0-1

And each hot-plug memory device size can be assigned through "-device dimm,size=1G",
assume that we specify 4 hot-plug memory devices and each 1G, then first two devices
as we ranged belong to node0, and other two belong to node1. Then the hot-plug memory
will have no effect on current normal memory design.


Thanks,
Wanlong Gao

> 
> Paolo
> 
>> +        nb_hp_dimms++;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>  #ifdef CONFIG_VIRTFS
>>  static int fsdev_init_func(QemuOpts *opts, void *opaque)
>>  {
>> @@ -4260,6 +4306,11 @@ int main(int argc, char **argv, char **envp)
>>      }
>>      qemu_add_globals();
>>  
>> +    /* init generic devices */
>> +    if (qemu_opts_foreach(qemu_find_opts("device"),
>> +           dimmcfg_init_func, NULL, 1) != 0) {
>> +        exit(1);
>> +    }
>>      qdev_machine_init();
>>  
>>      QEMUMachineInitArgs args = { .ram_size = ram_size,
>>
> 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-06-27  5:08     ` Wanlong Gao
@ 2013-06-27  6:55       ` Paolo Bonzini
  2013-07-09 16:53         ` Igor Mammedov
  2013-07-15 17:05         ` Vasilis Liaskovitis
  0 siblings, 2 replies; 56+ messages in thread
From: Paolo Bonzini @ 2013-06-27  6:55 UTC (permalink / raw)
  To: gaowanlong
  Cc: Vasilis Liaskovitis, Hu Tao, Bandan Das, Eduardo Habkost, qemu-devel

Il 27/06/2013 07:08, Wanlong Gao ha scritto:
> Do we really need to specify the memory range? I suspect that we can
> follow current design of normal memory in hot-plug memory.

I think we can do both.  I'm afraid that the configuration of the VM
will not be perfectly reproducible without specifying the range, more so
if you allow hotplug.

> Currently,
> we just specify the size of normal memory in each node, and the range
> in normal memory is node by node. Then I think we can just specify
> the memory size of hot-plug in each node, then the hot-plug memory
> range is also node by node, and the whole hot-plug memory block is
> just located after the normal memory block. If so, the option can
> come like:
>     -numa node,nodeid=0,mem=2G,cpus=0-1,mem-hotplug=2G,mem-policy=membind,mem-hostnode=0-1,mem-hotplug-policy=interleave,mem-hotplug-hostnode=1
>     -numa node,nodeid=1,mem=2G,cpus=2-3,mem-hotplug=2G,mem-policy=preferred,mem-hostnode=1,mem-hotplug-policy=membind,mem-hotplug-hostnode=0-1

I think specifying different policies and bindings for normal and
hotplug memory is too much fine-grained.  If you really want that, then
you would need something like

    -numa node,nodeid=0,cpus=0-1 \
    -numa mem,nodeid=0,size=2G,policy=membind,hostnode=0-1 \
    -numa mem,nodeid=0,size=2G,policy=interleave,hostnode=1,populated=no

Hmm... this actually doesn't look too bad, and it is much more
future-proof.  Eduardo, what do you think about it?  Should Wanlong redo
his patches to support this "-numa mem" syntax?  Parsing it should be
easy using the QemuOpts visitor, too.

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 12/14] Implement "info memory" and "query-memory"
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 12/14] Implement "info memory" and "query-memory" Hu Tao
@ 2013-06-28 20:27   ` Eric Blake
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Blake @ 2013-06-28 20:27 UTC (permalink / raw)
  To: Hu Tao; +Cc: Vasilis Liaskovitis, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1765 bytes --]

On 06/26/2013 03:13 AM, Hu Tao wrote:
> From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> 
> Returns total physical memory available to guest in bytes, including hotplugged
> memory. Note that the number reported here may be different from what the guest
> sees e.g. if the guest has not logically onlined hotplugged memory.
> 
> This functionality is provided independently of a balloon device, since a
> guest can be using ACPI memory hotplug without using a balloon device.
> 
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> ---

> +++ b/qapi-schema.json
> @@ -3608,3 +3608,16 @@
>              '*cpuid-input-ecx': 'int',
>              'cpuid-register': 'X86CPURegister32',
>              'features': 'int' } }
> +
> +##
> +# @query-memory:
> +#
> +# Returns total memory in bytes, including hotplugged dimms
> +#
> +# Returns: int

Not true - it returns a struct of useful memory statistics (of which,
your initial implementation of the MemoryInfo has only one statistic).
This struct is free to grow larger in the future, if we determine other
useful things to return.

> +#
> +# Since: 1.6
> +##
> +{ 'type': 'MemoryInfo',
> +  'data': { 'total': 'int' } }
> +{ 'command': 'query-memory', 'returns': 'MemoryInfo' }

You've created two items (the 'MemoryInfo' type and the 'query-memory'
command), so you should have two pieces of documentation (yes, I know
there are pre-existing cases of undocumented types that are declared
next to the sole command that uses them, but we should avoid that where
possible).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 14/14] Implement dimm-info
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 14/14] Implement dimm-info Hu Tao
@ 2013-06-28 20:28   ` Eric Blake
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Blake @ 2013-06-28 20:28 UTC (permalink / raw)
  To: Hu Tao; +Cc: Vasilis Liaskovitis, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1192 bytes --]

On 06/26/2013 03:13 AM, Hu Tao wrote:
> From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> 
> "query-dimm-info" and "info dimm" will give current state of all dimms in the
> system e.g.
> 

> +++ b/qapi-schema.json
> @@ -3621,3 +3621,30 @@
>  { 'type': 'MemoryInfo',
>    'data': { 'total': 'int' } }
>  { 'command': 'query-memory', 'returns': 'MemoryInfo' }
> +
> +##
> +# @DimmInfo:
> +#
> +# Information about status of dimm device
> +#
> +# @dimm: the name of the dimm
> +#
> +# @state: 'true' means the dimm device is plugged in, 'false' means
> +#         means the dimm device is plugged out.

s/plugged out/unplugged/

> +#
> +# Since: 1.6
> +#
> +##
> +{ 'type': 'DimmInfo',
> +  'data': {'dimm': 'str', 'state': 'bool'} }
> +
> +##
> +# @query-dimm-info:
> +#
> +# Returns the state of dimm devices
> +#
> +# Returns: list of @DimmInfo
> +#
> +# Since: 1.6
> +##
> +{ 'command': 'query-dimm-info', 'returns': ['DimmInfo'] }

Looks good from an interface perspective; but missing an example in
qmp-commands.hx.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/7] support for ACPI memory hotplug
  2013-06-26  9:14 ` [Qemu-devel] [PATCH v5 0/7] support for ACPI memory hotplug Hu Tao
                     ` (6 preceding siblings ...)
  2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 7/7] pci: Use paravirt interface for pcimem_start and pcimem64_start Hu Tao
@ 2013-07-07  8:36   ` Michael S. Tsirkin
  7 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2013-07-07  8:36 UTC (permalink / raw)
  To: Hu Tao; +Cc: Vasilis Liaskovitis, seabios, qemu-devel

On Wed, Jun 26, 2013 at 05:14:58PM +0800, Hu Tao wrote:
> This series adds memory devices and related methods to support
> ACPI memory hotplug.
> 
> This series works with qemu counterpart. See qemu series for
> a detailed description.
> 
> Hu Tao (2):
>   set psize to 0 when romfile_loadfile failed
>   acpi: generate hotplug memory devices
> 
> Vasilis Liaskovitis (5):
>   Add ACPI_EXTRACT_DEVICE* macros
>   Add SSDT memory device support
>   acpi-dsdt: Implement functions for memory hotplug
>   q35: Add memory hotplug handler
>   pci: Use paravirt interface for pcimem_start and pcimem64_start
> 
>  Makefile                      |   2 +-
>  src/acpi-dsdt-mem-hotplug.dsl |  57 ++++++++++++++++
>  src/acpi-dsdt.dsl             |   5 +-
>  src/acpi.c                    | 151 ++++++++++++++++++++++++++++++++++++++++--
>  src/paravirt.c                |  15 +++++
>  src/paravirt.h                |   1 +
>  src/pciinit.c                 |   9 +++
>  src/q35-acpi-dsdt.dsl         |   6 +-
>  src/romfile.c                 |  13 ++--
>  src/ssdt-mem.dsl              |  61 +++++++++++++++++
>  tools/acpi_extract.py         |  28 ++++++++
>  11 files changed, 333 insertions(+), 15 deletions(-)
>  create mode 100644 src/acpi-dsdt-mem-hotplug.dsl
>  create mode 100644 src/ssdt-mem.dsl


We were looking for a new feature to use the acpi tables in qemu.
Could you use that instead of adding code to seabios?

You can find the latest bits here:

git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git acpi




> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 02/14] Add SIZE type to qdev properties
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 02/14] Add SIZE type to qdev properties Hu Tao
@ 2013-07-08  9:37   ` Andreas Färber
  2013-07-12  1:27     ` Hu Tao
  0 siblings, 1 reply; 56+ messages in thread
From: Andreas Färber @ 2013-07-08  9:37 UTC (permalink / raw)
  To: Hu Tao; +Cc: Vasilis Liaskovitis, qemu-devel

Am 26.06.2013 11:13, schrieb Hu Tao:
> From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> 
> This patch adds a 'SIZE' type property to qdev.
> 
> It will make dimm description more convenient by allowing sizes to be specified
> with K,M,G,T prefixes instead of number of bytes e.g.:
> -device dimm,id=mem0,size=2G,bus=membus.0
> 
> Credits go to Ian Molton for original patch. See:
> http://patchwork.ozlabs.org/patch/38835/
> 
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>

If the original patch is not from Vasilis, why is this patch not
carrying Ian's Signed-off-by from that patch before Vasilis'? (Then we
could drop the Patchwork link.)

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug
  2013-06-26  9:13 [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug Hu Tao
                   ` (14 preceding siblings ...)
  2013-06-26  9:14 ` [Qemu-devel] [PATCH v5 0/7] support for ACPI memory hotplug Hu Tao
@ 2013-07-08  9:48 ` Andreas Färber
  2013-07-12  1:30   ` Hu Tao
  2013-07-14 16:56   ` Paolo Bonzini
  15 siblings, 2 replies; 56+ messages in thread
From: Andreas Färber @ 2013-07-08  9:48 UTC (permalink / raw)
  To: Hu Tao; +Cc: Vasilis Liaskovitis, Paolo Bonzini, qemu-devel

Am 26.06.2013 11:13, schrieb Hu Tao:
> It's been quite a while since v4 and lots of changes happend
> in qemu and v4 just can't apply anymore. So this series is
> basically a rebase. Another purpose is to bring up discussions
> to make consensus on some questions since v4, see
> http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg01219.html
> and http://lists.gnu.org/archive/html/qemu-devel/2013-02/msg05332.html
[...]
> changes from v4:
> 
>   - rebased on a recent qemu-git
>   - based on another series which seperates i440fx refactor from v4.
>     http://lists.nongnu.org/archive/html/qemu-devel/2013-06/msg03500.html
>   - hot-unplug patches not included, as suggested by Vasilis, since
>     hot-unplug has some more complications with refcounting memory regions.

This should hopefully be addressed with yesterday's merge of Paolo's
MemoryRegion refactoring.

Andreas

>   - fix some copy-paste errors in qapi-schema.json.
> 
> v4: http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02693.html
[snip]

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-06-27  6:55       ` Paolo Bonzini
@ 2013-07-09 16:53         ` Igor Mammedov
  2013-07-12  2:39           ` Hu Tao
  2013-07-15 17:05         ` Vasilis Liaskovitis
  1 sibling, 1 reply; 56+ messages in thread
From: Igor Mammedov @ 2013-07-09 16:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Eduardo Habkost, mst, Hu Tao, qemu-devel, Vasilis Liaskovitis,
	Bandan Das, gaowanlong

On Thu, 27 Jun 2013 08:55:25 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 27/06/2013 07:08, Wanlong Gao ha scritto:
> > Do we really need to specify the memory range? I suspect that we can
> > follow current design of normal memory in hot-plug memory.
> 
> I think we can do both.  I'm afraid that the configuration of the VM
> will not be perfectly reproducible without specifying the range, more so
> if you allow hotplug.
> 
> > Currently,
> > we just specify the size of normal memory in each node, and the range
> > in normal memory is node by node. Then I think we can just specify
> > the memory size of hot-plug in each node, then the hot-plug memory
> > range is also node by node, and the whole hot-plug memory block is
> > just located after the normal memory block. If so, the option can
> > come like:
> >     -numa node,nodeid=0,mem=2G,cpus=0-1,mem-hotplug=2G,mem-policy=membind,mem-hostnode=0-1,mem-hotplug-policy=interleave,mem-hotplug-hostnode=1
> >     -numa node,nodeid=1,mem=2G,cpus=2-3,mem-hotplug=2G,mem-policy=preferred,mem-hostnode=1,mem-hotplug-policy=membind,mem-hotplug-hostnode=0-1
> 
> I think specifying different policies and bindings for normal and
> hotplug memory is too much fine-grained.  If you really want that, then
> you would need something like
> 
>     -numa node,nodeid=0,cpus=0-1 \
>     -numa mem,nodeid=0,size=2G,policy=membind,hostnode=0-1 \
>     -numa mem,nodeid=0,size=2G,policy=interleave,hostnode=1,populated=no
> 
> Hmm... this actually doesn't look too bad, and it is much more
> future-proof.  Eduardo, what do you think about it?  Should Wanlong redo
> his patches to support this "-numa mem" syntax?  Parsing it should be
> easy using the QemuOpts visitor, too.

Is hot-plugged memory and its bindings to numa nodes have to be specified
at startup?

How about extending -m option to support following syntax:

  -m initial_mem[, number_of_hotplug_dimms, max_mem]]

and dynamically forming MEMXX._CRS/_PXM resources on demand instead
of creating static resources in SSDT.

without necessity to specify to be hot-plugged DIMMs at startup, hot-plug
could become more flexible since arbitrarily sized DIMMs with required NUMA
mapping could be specified during hot-plug time, for example:

  device_add dimm,id=dimm1,bus=membus.0,size=3G,node=1,...

the only limit would be left is a number of increments(DIMM slots), due to
need to pre-generate ACPI memory devices that could supply _CRS/_PXM
resources later.


> Paolo
> 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/14] pc: Add dimm paravirt SRAT info
  2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 10/14] pc: Add dimm paravirt SRAT info Hu Tao
@ 2013-07-10 10:10   ` Michael S. Tsirkin
  2013-07-11  5:13     ` Igor Mammedov
  0 siblings, 1 reply; 56+ messages in thread
From: Michael S. Tsirkin @ 2013-07-10 10:10 UTC (permalink / raw)
  To: Hu Tao; +Cc: Vasilis Liaskovitis, qemu-devel

On Wed, Jun 26, 2013 at 05:13:33PM +0800, Hu Tao wrote:
> The numa_fw_cfg paravirt interface is extended to include SRAT information for
> all hotplug-able dimms. There are 3 words for each hotplug-able memory slot,
> denoting start address, size and node proximity. The new info is appended after
> existing numa info, so that the fw_cfg layout does not break.  This information
> is used by Seabios to build hotplug memory device objects at runtime.
> nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info
> to SeaBIOS.
> 
> v3->v4: numa_fw_cfg needs to be initalized after memory controller sets up dimm
> ranges.  Make changes for pc_piix and pc_q35 to set numa_fw_cfg after i440fx
> initialization.
> 
> v2->v3: setting nb_numa_nodes to 1 is not needed
> 
> v1->v2:
> Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order not
> to break existing layout
> Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt
> 
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>

Please do not add any more fwcfg interfaces - generating
ACPI in qemu removes the need for it.

So please rebase on top of that work and generate the appropriate ACPI
tables directly.

You can find the latest code gnerating ACPI from qemu here:
git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git acpi

This code is work in progress, but once you base on
top of that, I can put it on that branch and keep updating if
interfaces change.

> ---
>  docs/specs/fwcfg.txt    | 28 ++++++++++++++++++++++++++++
>  hw/i386/pc.c            | 30 ++++++++++++++++++++++++------
>  hw/i386/pc_piix.c       |  1 +
>  hw/i386/pc_q35.c        |  7 +++++--
>  include/hw/i386/pc.h    |  1 +
>  include/sysemu/sysemu.h |  1 +
>  6 files changed, 60 insertions(+), 8 deletions(-)
>  create mode 100644 docs/specs/fwcfg.txt
> 
> diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt
> new file mode 100644
> index 0000000..e6fcd8f
> --- /dev/null
> +++ b/docs/specs/fwcfg.txt
> @@ -0,0 +1,28 @@
> +QEMU<->BIOS Paravirt Documentation
> +--------------------------------------
> +
> +This document describes paravirt data structures passed from QEMU to BIOS.
> +
> +fw_cfg SRAT paravirt info
> +--------------------
> +The SRAT info passed from QEMU to BIOS has the following layout:
> +
> +-----------------------------------------------------------------------------------------------
> +#nodes | cpu0_pxm | cpu1_pxm | ... | cpulast_pxm | node0_mem | node1_mem | ... | nodelast_mem
> +
> +-----------------------------------------------------------------------------------------------
> +#dimms | dimm0_start | dimm0_sz | dimm0_pxm | ... | dimmlast_start | dimmlast_sz | dimmlast_pxm
> +
> +Entry 0 contains the number of numa nodes (nb_numa_nodes).
> +
> +Entries 1..max_cpus: The next max_cpus entries describe node proximity for each
> +one of the vCPUs in the system.
> +
> +Entries max_cpus+1..max_cpus+nb_numa_nodes+1:  The next nb_numa_nodes entries
> +describe the memory size for each one of the NUMA nodes in the system.
> +
> +Entry max_cpus+nb_numa_nodes+1 contains the number of memory dimms (nb_hp_dimms)
> +
> +The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet contains
> +the physical address offset, size (in bytes), and node proximity for the
> +respective dimm.
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 65838a6..b51d3b5 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -55,6 +55,7 @@
>  #include "hw/acpi/acpi.h"
>  #include "hw/cpu/icc_bus.h"
>  #include "hw/boards.h"
> +#include "hw/mem-hotplug/dimm.h"
>  
>  /* debug PC/ISA interrupts */
>  //#define DEBUG_IRQ
> @@ -606,8 +607,6 @@ static FWCfgState *bochs_bios_init(void)
>      FWCfgState *fw_cfg;
>      uint8_t *smbios_table;
>      size_t smbios_len;
> -    uint64_t *numa_fw_cfg;
> -    int i, j;
>      unsigned int apic_id_limit = pc_apic_id_limit(max_cpus);
>  
>      fw_cfg = fw_cfg_init(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 1, 0, 0);
> @@ -640,11 +639,25 @@ static FWCfgState *bochs_bios_init(void)
>                       &e820_table, sizeof(e820_table));
>  
>      fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, &hpet_cfg, sizeof(hpet_cfg));
> +
> +    return fw_cfg;
> +}
> +
> +void bochs_meminfo_bios_init(void *fw_cfg)
> +{
> +    uint64_t *numa_fw_cfg;
> +    uint64_t *hp_dimms_fw_cfg;
> +    int i, j;
> +    unsigned int apic_id_limit = pc_apic_id_limit(max_cpus);
> +
>      /* allocate memory for the NUMA channel: one (64bit) word for the number
>       * of nodes, one word for each VCPU->node and one word for each node to
>       * hold the amount of memory.
> +     * Finally one word for the number of hotplug memory slots and three words
> +     * for each hotplug memory slot (start address, size and node proximity).
>       */
> -    numa_fw_cfg = g_new0(uint64_t, 1 + apic_id_limit + nb_numa_nodes);
> +    numa_fw_cfg = g_new0(uint64_t,
> +                         2 + apic_id_limit + nb_numa_nodes  + 3 * nb_hp_dimms);
>      numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
>      for (i = 0; i < max_cpus; i++) {
>          unsigned int apic_id = x86_cpu_apic_id_from_index(i);
> @@ -659,11 +672,16 @@ static FWCfgState *bochs_bios_init(void)
>      for (i = 0; i < nb_numa_nodes; i++) {
>          numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]);
>      }
> +
> +    numa_fw_cfg[1 + apic_id_limit + nb_numa_nodes] = cpu_to_le64(nb_hp_dimms);
> +
> +    hp_dimms_fw_cfg = numa_fw_cfg + 2 + apic_id_limit + nb_numa_nodes;
> +    if (nb_hp_dimms) {
> +        dimm_setup_fwcfg_layout(hp_dimms_fw_cfg);
> +    }
>      fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
> -                     (1 + apic_id_limit + nb_numa_nodes) *
> +                     (2 + apic_id_limit + nb_numa_nodes + 3 * nb_hp_dimms) *
>                       sizeof(*numa_fw_cfg));
> -
> -    return fw_cfg;
>  }
>  
>  static long get_file_size(FILE *f)
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index fb056df..6e18343 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -138,6 +138,7 @@ static void pc_init1(MemoryRegion *system_memory,
>      if (!xen_enabled()) {
>          fw_cfg = pc_memory_init(kernel_filename, kernel_cmdline, initrd_filename,
>                                  below_4g_mem_size, above_4g_mem_size);
> +        bochs_meminfo_bios_init(fw_cfg);
>      }
>  
>      if (kvm_irqchip_in_kernel()) {
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 5fe14bb..2c14977 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -74,6 +74,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
>      ICH9LPCState *ich9_lpc;
>      PCIDevice *ahci;
>      DeviceState *icc_bridge;
> +    void *fw_cfg = NULL;
>  
>      icc_bridge = qdev_create(NULL, TYPE_ICC_BRIDGE);
>      object_property_add_child(qdev_get_machine(), "icc-bridge",
> @@ -97,8 +98,9 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
>  
>      /* allocate ram and load rom/bios */
>      if (!xen_enabled()) {
> -        pc_memory_init(kernel_filename, kernel_cmdline,
> -                       initrd_filename, below_4g_mem_size, above_4g_mem_size);
> +        fw_cfg = pc_memory_init(kernel_filename, kernel_cmdline,
> +                                initrd_filename, below_4g_mem_size,
> +                                above_4g_mem_size);
>      }
>  
>      /* irq lines */
> @@ -116,6 +118,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
>      q35_host->mch.address_space_io = get_system_io();
>      /* pci */
>      qdev_init_nofail(DEVICE(q35_host));
> +    bochs_meminfo_bios_init(fw_cfg);
>      host_bus = q35_host->host.pci.bus;
>      /* create ISA bus */
>      lpc = pci_create_simple_multifunction(host_bus, PCI_DEVFN(ICH9_LPC_DEV,
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 959b92b..4a29e6e 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -231,6 +231,7 @@ int pvpanic_init(ISABus *bus);
>  #define E820_UNUSABLE   5
>  
>  int e820_add_entry(uint64_t, uint64_t, uint32_t);
> +void bochs_meminfo_bios_init(void *fw_cfg);
>  
>  #define PC_COMPAT_1_5 \
>          {\
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 2fb71af..2644faa 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -132,6 +132,7 @@ extern QEMUClock *rtc_clock;
>  extern int nb_numa_nodes;
>  extern uint64_t node_mem[MAX_NODES];
>  extern unsigned long *node_cpumask[MAX_NODES];
> +extern int nb_hp_dimms;
>  
>  #define MAX_OPTION_ROMS 16
>  typedef struct QEMUOptionRom {
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/14] pc: Add dimm paravirt SRAT info
  2013-07-10 10:10   ` Michael S. Tsirkin
@ 2013-07-11  5:13     ` Igor Mammedov
  2013-07-11  8:49       ` Michael S. Tsirkin
  0 siblings, 1 reply; 56+ messages in thread
From: Igor Mammedov @ 2013-07-11  5:13 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Vasilis Liaskovitis, Hu Tao, qemu-devel

On Wed, 10 Jul 2013 13:10:03 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Wed, Jun 26, 2013 at 05:13:33PM +0800, Hu Tao wrote:
> > The numa_fw_cfg paravirt interface is extended to include SRAT information for
> > all hotplug-able dimms. There are 3 words for each hotplug-able memory slot,
> > denoting start address, size and node proximity. The new info is appended after
> > existing numa info, so that the fw_cfg layout does not break.  This information
> > is used by Seabios to build hotplug memory device objects at runtime.
> > nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info
> > to SeaBIOS.
> > 
> > v3->v4: numa_fw_cfg needs to be initalized after memory controller sets up dimm
> > ranges.  Make changes for pc_piix and pc_q35 to set numa_fw_cfg after i440fx
> > initialization.
> > 
> > v2->v3: setting nb_numa_nodes to 1 is not needed
> > 
> > v1->v2:
> > Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order not
> > to break existing layout
> > Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt
> > 
> > Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> 
> Please do not add any more fwcfg interfaces - generating
> ACPI in qemu removes the need for it.
> 
> So please rebase on top of that work and generate the appropriate ACPI
> tables directly.
> 
> You can find the latest code gnerating ACPI from qemu here:
> git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git acpi

will it work with upstream Seabios or custom tree is required for it as well?

> 
> This code is work in progress, but once you base on
> top of that, I can put it on that branch and keep updating if
> interfaces change.
> 
> > ---
> >  docs/specs/fwcfg.txt    | 28 ++++++++++++++++++++++++++++
> >  hw/i386/pc.c            | 30 ++++++++++++++++++++++++------
> >  hw/i386/pc_piix.c       |  1 +
> >  hw/i386/pc_q35.c        |  7 +++++--
> >  include/hw/i386/pc.h    |  1 +
> >  include/sysemu/sysemu.h |  1 +
> >  6 files changed, 60 insertions(+), 8 deletions(-)
> >  create mode 100644 docs/specs/fwcfg.txt
> > 
> > diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt
> > new file mode 100644
> > index 0000000..e6fcd8f
> > --- /dev/null
> > +++ b/docs/specs/fwcfg.txt
> > @@ -0,0 +1,28 @@
> > +QEMU<->BIOS Paravirt Documentation
> > +--------------------------------------
> > +
> > +This document describes paravirt data structures passed from QEMU to BIOS.
> > +
> > +fw_cfg SRAT paravirt info
> > +--------------------
> > +The SRAT info passed from QEMU to BIOS has the following layout:
> > +
> > +-----------------------------------------------------------------------------------------------
> > +#nodes | cpu0_pxm | cpu1_pxm | ... | cpulast_pxm | node0_mem | node1_mem | ... | nodelast_mem
> > +
> > +-----------------------------------------------------------------------------------------------
> > +#dimms | dimm0_start | dimm0_sz | dimm0_pxm | ... | dimmlast_start | dimmlast_sz | dimmlast_pxm
> > +
> > +Entry 0 contains the number of numa nodes (nb_numa_nodes).
> > +
> > +Entries 1..max_cpus: The next max_cpus entries describe node proximity for each
> > +one of the vCPUs in the system.
> > +
> > +Entries max_cpus+1..max_cpus+nb_numa_nodes+1:  The next nb_numa_nodes entries
> > +describe the memory size for each one of the NUMA nodes in the system.
> > +
> > +Entry max_cpus+nb_numa_nodes+1 contains the number of memory dimms (nb_hp_dimms)
> > +
> > +The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet contains
> > +the physical address offset, size (in bytes), and node proximity for the
> > +respective dimm.
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index 65838a6..b51d3b5 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -55,6 +55,7 @@
> >  #include "hw/acpi/acpi.h"
> >  #include "hw/cpu/icc_bus.h"
> >  #include "hw/boards.h"
> > +#include "hw/mem-hotplug/dimm.h"
> >  
> >  /* debug PC/ISA interrupts */
> >  //#define DEBUG_IRQ
> > @@ -606,8 +607,6 @@ static FWCfgState *bochs_bios_init(void)
> >      FWCfgState *fw_cfg;
> >      uint8_t *smbios_table;
> >      size_t smbios_len;
> > -    uint64_t *numa_fw_cfg;
> > -    int i, j;
> >      unsigned int apic_id_limit = pc_apic_id_limit(max_cpus);
> >  
> >      fw_cfg = fw_cfg_init(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 1, 0, 0);
> > @@ -640,11 +639,25 @@ static FWCfgState *bochs_bios_init(void)
> >                       &e820_table, sizeof(e820_table));
> >  
> >      fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, &hpet_cfg, sizeof(hpet_cfg));
> > +
> > +    return fw_cfg;
> > +}
> > +
> > +void bochs_meminfo_bios_init(void *fw_cfg)
> > +{
> > +    uint64_t *numa_fw_cfg;
> > +    uint64_t *hp_dimms_fw_cfg;
> > +    int i, j;
> > +    unsigned int apic_id_limit = pc_apic_id_limit(max_cpus);
> > +
> >      /* allocate memory for the NUMA channel: one (64bit) word for the number
> >       * of nodes, one word for each VCPU->node and one word for each node to
> >       * hold the amount of memory.
> > +     * Finally one word for the number of hotplug memory slots and three words
> > +     * for each hotplug memory slot (start address, size and node proximity).
> >       */
> > -    numa_fw_cfg = g_new0(uint64_t, 1 + apic_id_limit + nb_numa_nodes);
> > +    numa_fw_cfg = g_new0(uint64_t,
> > +                         2 + apic_id_limit + nb_numa_nodes  + 3 * nb_hp_dimms);
> >      numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
> >      for (i = 0; i < max_cpus; i++) {
> >          unsigned int apic_id = x86_cpu_apic_id_from_index(i);
> > @@ -659,11 +672,16 @@ static FWCfgState *bochs_bios_init(void)
> >      for (i = 0; i < nb_numa_nodes; i++) {
> >          numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]);
> >      }
> > +
> > +    numa_fw_cfg[1 + apic_id_limit + nb_numa_nodes] = cpu_to_le64(nb_hp_dimms);
> > +
> > +    hp_dimms_fw_cfg = numa_fw_cfg + 2 + apic_id_limit + nb_numa_nodes;
> > +    if (nb_hp_dimms) {
> > +        dimm_setup_fwcfg_layout(hp_dimms_fw_cfg);
> > +    }
> >      fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
> > -                     (1 + apic_id_limit + nb_numa_nodes) *
> > +                     (2 + apic_id_limit + nb_numa_nodes + 3 * nb_hp_dimms) *
> >                       sizeof(*numa_fw_cfg));
> > -
> > -    return fw_cfg;
> >  }
> >  
> >  static long get_file_size(FILE *f)
> > diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> > index fb056df..6e18343 100644
> > --- a/hw/i386/pc_piix.c
> > +++ b/hw/i386/pc_piix.c
> > @@ -138,6 +138,7 @@ static void pc_init1(MemoryRegion *system_memory,
> >      if (!xen_enabled()) {
> >          fw_cfg = pc_memory_init(kernel_filename, kernel_cmdline, initrd_filename,
> >                                  below_4g_mem_size, above_4g_mem_size);
> > +        bochs_meminfo_bios_init(fw_cfg);
> >      }
> >  
> >      if (kvm_irqchip_in_kernel()) {
> > diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> > index 5fe14bb..2c14977 100644
> > --- a/hw/i386/pc_q35.c
> > +++ b/hw/i386/pc_q35.c
> > @@ -74,6 +74,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
> >      ICH9LPCState *ich9_lpc;
> >      PCIDevice *ahci;
> >      DeviceState *icc_bridge;
> > +    void *fw_cfg = NULL;
> >  
> >      icc_bridge = qdev_create(NULL, TYPE_ICC_BRIDGE);
> >      object_property_add_child(qdev_get_machine(), "icc-bridge",
> > @@ -97,8 +98,9 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
> >  
> >      /* allocate ram and load rom/bios */
> >      if (!xen_enabled()) {
> > -        pc_memory_init(kernel_filename, kernel_cmdline,
> > -                       initrd_filename, below_4g_mem_size, above_4g_mem_size);
> > +        fw_cfg = pc_memory_init(kernel_filename, kernel_cmdline,
> > +                                initrd_filename, below_4g_mem_size,
> > +                                above_4g_mem_size);
> >      }
> >  
> >      /* irq lines */
> > @@ -116,6 +118,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
> >      q35_host->mch.address_space_io = get_system_io();
> >      /* pci */
> >      qdev_init_nofail(DEVICE(q35_host));
> > +    bochs_meminfo_bios_init(fw_cfg);
> >      host_bus = q35_host->host.pci.bus;
> >      /* create ISA bus */
> >      lpc = pci_create_simple_multifunction(host_bus, PCI_DEVFN(ICH9_LPC_DEV,
> > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> > index 959b92b..4a29e6e 100644
> > --- a/include/hw/i386/pc.h
> > +++ b/include/hw/i386/pc.h
> > @@ -231,6 +231,7 @@ int pvpanic_init(ISABus *bus);
> >  #define E820_UNUSABLE   5
> >  
> >  int e820_add_entry(uint64_t, uint64_t, uint32_t);
> > +void bochs_meminfo_bios_init(void *fw_cfg);
> >  
> >  #define PC_COMPAT_1_5 \
> >          {\
> > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > index 2fb71af..2644faa 100644
> > --- a/include/sysemu/sysemu.h
> > +++ b/include/sysemu/sysemu.h
> > @@ -132,6 +132,7 @@ extern QEMUClock *rtc_clock;
> >  extern int nb_numa_nodes;
> >  extern uint64_t node_mem[MAX_NODES];
> >  extern unsigned long *node_cpumask[MAX_NODES];
> > +extern int nb_hp_dimms;
> >  
> >  #define MAX_OPTION_ROMS 16
> >  typedef struct QEMUOptionRom {
> > -- 
> > 1.8.3.1
> > 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/14] pc: Add dimm paravirt SRAT info
  2013-07-11  5:13     ` Igor Mammedov
@ 2013-07-11  8:49       ` Michael S. Tsirkin
  2013-07-12  1:33         ` Hu Tao
  0 siblings, 1 reply; 56+ messages in thread
From: Michael S. Tsirkin @ 2013-07-11  8:49 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: Vasilis Liaskovitis, Hu Tao, qemu-devel

On Thu, Jul 11, 2013 at 07:13:39AM +0200, Igor Mammedov wrote:
> On Wed, 10 Jul 2013 13:10:03 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Wed, Jun 26, 2013 at 05:13:33PM +0800, Hu Tao wrote:
> > > The numa_fw_cfg paravirt interface is extended to include SRAT information for
> > > all hotplug-able dimms. There are 3 words for each hotplug-able memory slot,
> > > denoting start address, size and node proximity. The new info is appended after
> > > existing numa info, so that the fw_cfg layout does not break.  This information
> > > is used by Seabios to build hotplug memory device objects at runtime.
> > > nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info
> > > to SeaBIOS.
> > > 
> > > v3->v4: numa_fw_cfg needs to be initalized after memory controller sets up dimm
> > > ranges.  Make changes for pc_piix and pc_q35 to set numa_fw_cfg after i440fx
> > > initialization.
> > > 
> > > v2->v3: setting nb_numa_nodes to 1 is not needed
> > > 
> > > v1->v2:
> > > Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order not
> > > to break existing layout
> > > Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt
> > > 
> > > Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> > > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> > 
> > Please do not add any more fwcfg interfaces - generating
> > ACPI in qemu removes the need for it.
> > 
> > So please rebase on top of that work and generate the appropriate ACPI
> > tables directly.
> > 
> > You can find the latest code gnerating ACPI from qemu here:
> > git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git acpi
> 
> will it work with upstream Seabios or custom tree is required for it as well?

Yes.

git://git.kernel.org/pub/scm/virt/kvm/mst/seabios.git acpi


> > 
> > This code is work in progress, but once you base on
> > top of that, I can put it on that branch and keep updating if
> > interfaces change.
> > 
> > > ---
> > >  docs/specs/fwcfg.txt    | 28 ++++++++++++++++++++++++++++
> > >  hw/i386/pc.c            | 30 ++++++++++++++++++++++++------
> > >  hw/i386/pc_piix.c       |  1 +
> > >  hw/i386/pc_q35.c        |  7 +++++--
> > >  include/hw/i386/pc.h    |  1 +
> > >  include/sysemu/sysemu.h |  1 +
> > >  6 files changed, 60 insertions(+), 8 deletions(-)
> > >  create mode 100644 docs/specs/fwcfg.txt
> > > 
> > > diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt
> > > new file mode 100644
> > > index 0000000..e6fcd8f
> > > --- /dev/null
> > > +++ b/docs/specs/fwcfg.txt
> > > @@ -0,0 +1,28 @@
> > > +QEMU<->BIOS Paravirt Documentation
> > > +--------------------------------------
> > > +
> > > +This document describes paravirt data structures passed from QEMU to BIOS.
> > > +
> > > +fw_cfg SRAT paravirt info
> > > +--------------------
> > > +The SRAT info passed from QEMU to BIOS has the following layout:
> > > +
> > > +-----------------------------------------------------------------------------------------------
> > > +#nodes | cpu0_pxm | cpu1_pxm | ... | cpulast_pxm | node0_mem | node1_mem | ... | nodelast_mem
> > > +
> > > +-----------------------------------------------------------------------------------------------
> > > +#dimms | dimm0_start | dimm0_sz | dimm0_pxm | ... | dimmlast_start | dimmlast_sz | dimmlast_pxm
> > > +
> > > +Entry 0 contains the number of numa nodes (nb_numa_nodes).
> > > +
> > > +Entries 1..max_cpus: The next max_cpus entries describe node proximity for each
> > > +one of the vCPUs in the system.
> > > +
> > > +Entries max_cpus+1..max_cpus+nb_numa_nodes+1:  The next nb_numa_nodes entries
> > > +describe the memory size for each one of the NUMA nodes in the system.
> > > +
> > > +Entry max_cpus+nb_numa_nodes+1 contains the number of memory dimms (nb_hp_dimms)
> > > +
> > > +The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet contains
> > > +the physical address offset, size (in bytes), and node proximity for the
> > > +respective dimm.
> > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > > index 65838a6..b51d3b5 100644
> > > --- a/hw/i386/pc.c
> > > +++ b/hw/i386/pc.c
> > > @@ -55,6 +55,7 @@
> > >  #include "hw/acpi/acpi.h"
> > >  #include "hw/cpu/icc_bus.h"
> > >  #include "hw/boards.h"
> > > +#include "hw/mem-hotplug/dimm.h"
> > >  
> > >  /* debug PC/ISA interrupts */
> > >  //#define DEBUG_IRQ
> > > @@ -606,8 +607,6 @@ static FWCfgState *bochs_bios_init(void)
> > >      FWCfgState *fw_cfg;
> > >      uint8_t *smbios_table;
> > >      size_t smbios_len;
> > > -    uint64_t *numa_fw_cfg;
> > > -    int i, j;
> > >      unsigned int apic_id_limit = pc_apic_id_limit(max_cpus);
> > >  
> > >      fw_cfg = fw_cfg_init(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 1, 0, 0);
> > > @@ -640,11 +639,25 @@ static FWCfgState *bochs_bios_init(void)
> > >                       &e820_table, sizeof(e820_table));
> > >  
> > >      fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, &hpet_cfg, sizeof(hpet_cfg));
> > > +
> > > +    return fw_cfg;
> > > +}
> > > +
> > > +void bochs_meminfo_bios_init(void *fw_cfg)
> > > +{
> > > +    uint64_t *numa_fw_cfg;
> > > +    uint64_t *hp_dimms_fw_cfg;
> > > +    int i, j;
> > > +    unsigned int apic_id_limit = pc_apic_id_limit(max_cpus);
> > > +
> > >      /* allocate memory for the NUMA channel: one (64bit) word for the number
> > >       * of nodes, one word for each VCPU->node and one word for each node to
> > >       * hold the amount of memory.
> > > +     * Finally one word for the number of hotplug memory slots and three words
> > > +     * for each hotplug memory slot (start address, size and node proximity).
> > >       */
> > > -    numa_fw_cfg = g_new0(uint64_t, 1 + apic_id_limit + nb_numa_nodes);
> > > +    numa_fw_cfg = g_new0(uint64_t,
> > > +                         2 + apic_id_limit + nb_numa_nodes  + 3 * nb_hp_dimms);
> > >      numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
> > >      for (i = 0; i < max_cpus; i++) {
> > >          unsigned int apic_id = x86_cpu_apic_id_from_index(i);
> > > @@ -659,11 +672,16 @@ static FWCfgState *bochs_bios_init(void)
> > >      for (i = 0; i < nb_numa_nodes; i++) {
> > >          numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]);
> > >      }
> > > +
> > > +    numa_fw_cfg[1 + apic_id_limit + nb_numa_nodes] = cpu_to_le64(nb_hp_dimms);
> > > +
> > > +    hp_dimms_fw_cfg = numa_fw_cfg + 2 + apic_id_limit + nb_numa_nodes;
> > > +    if (nb_hp_dimms) {
> > > +        dimm_setup_fwcfg_layout(hp_dimms_fw_cfg);
> > > +    }
> > >      fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
> > > -                     (1 + apic_id_limit + nb_numa_nodes) *
> > > +                     (2 + apic_id_limit + nb_numa_nodes + 3 * nb_hp_dimms) *
> > >                       sizeof(*numa_fw_cfg));
> > > -
> > > -    return fw_cfg;
> > >  }
> > >  
> > >  static long get_file_size(FILE *f)
> > > diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> > > index fb056df..6e18343 100644
> > > --- a/hw/i386/pc_piix.c
> > > +++ b/hw/i386/pc_piix.c
> > > @@ -138,6 +138,7 @@ static void pc_init1(MemoryRegion *system_memory,
> > >      if (!xen_enabled()) {
> > >          fw_cfg = pc_memory_init(kernel_filename, kernel_cmdline, initrd_filename,
> > >                                  below_4g_mem_size, above_4g_mem_size);
> > > +        bochs_meminfo_bios_init(fw_cfg);
> > >      }
> > >  
> > >      if (kvm_irqchip_in_kernel()) {
> > > diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> > > index 5fe14bb..2c14977 100644
> > > --- a/hw/i386/pc_q35.c
> > > +++ b/hw/i386/pc_q35.c
> > > @@ -74,6 +74,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
> > >      ICH9LPCState *ich9_lpc;
> > >      PCIDevice *ahci;
> > >      DeviceState *icc_bridge;
> > > +    void *fw_cfg = NULL;
> > >  
> > >      icc_bridge = qdev_create(NULL, TYPE_ICC_BRIDGE);
> > >      object_property_add_child(qdev_get_machine(), "icc-bridge",
> > > @@ -97,8 +98,9 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
> > >  
> > >      /* allocate ram and load rom/bios */
> > >      if (!xen_enabled()) {
> > > -        pc_memory_init(kernel_filename, kernel_cmdline,
> > > -                       initrd_filename, below_4g_mem_size, above_4g_mem_size);
> > > +        fw_cfg = pc_memory_init(kernel_filename, kernel_cmdline,
> > > +                                initrd_filename, below_4g_mem_size,
> > > +                                above_4g_mem_size);
> > >      }
> > >  
> > >      /* irq lines */
> > > @@ -116,6 +118,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
> > >      q35_host->mch.address_space_io = get_system_io();
> > >      /* pci */
> > >      qdev_init_nofail(DEVICE(q35_host));
> > > +    bochs_meminfo_bios_init(fw_cfg);
> > >      host_bus = q35_host->host.pci.bus;
> > >      /* create ISA bus */
> > >      lpc = pci_create_simple_multifunction(host_bus, PCI_DEVFN(ICH9_LPC_DEV,
> > > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> > > index 959b92b..4a29e6e 100644
> > > --- a/include/hw/i386/pc.h
> > > +++ b/include/hw/i386/pc.h
> > > @@ -231,6 +231,7 @@ int pvpanic_init(ISABus *bus);
> > >  #define E820_UNUSABLE   5
> > >  
> > >  int e820_add_entry(uint64_t, uint64_t, uint32_t);
> > > +void bochs_meminfo_bios_init(void *fw_cfg);
> > >  
> > >  #define PC_COMPAT_1_5 \
> > >          {\
> > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > > index 2fb71af..2644faa 100644
> > > --- a/include/sysemu/sysemu.h
> > > +++ b/include/sysemu/sysemu.h
> > > @@ -132,6 +132,7 @@ extern QEMUClock *rtc_clock;
> > >  extern int nb_numa_nodes;
> > >  extern uint64_t node_mem[MAX_NODES];
> > >  extern unsigned long *node_cpumask[MAX_NODES];
> > > +extern int nb_hp_dimms;
> > >  
> > >  #define MAX_OPTION_ROMS 16
> > >  typedef struct QEMUOptionRom {
> > > -- 
> > > 1.8.3.1
> > > 
> > 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 02/14] Add SIZE type to qdev properties
  2013-07-08  9:37   ` Andreas Färber
@ 2013-07-12  1:27     ` Hu Tao
  0 siblings, 0 replies; 56+ messages in thread
From: Hu Tao @ 2013-07-12  1:27 UTC (permalink / raw)
  To: Andreas Färber; +Cc: Vasilis Liaskovitis, Ian Molton, qemu-devel

Cced: Ian Molton <ian.molton@collabora.co.uk>

On Mon, Jul 08, 2013 at 11:37:01AM +0200, Andreas Färber wrote:
> Am 26.06.2013 11:13, schrieb Hu Tao:
> > From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> > 
> > This patch adds a 'SIZE' type property to qdev.
> > 
> > It will make dimm description more convenient by allowing sizes to be specified
> > with K,M,G,T prefixes instead of number of bytes e.g.:
> > -device dimm,id=mem0,size=2G,bus=membus.0
> > 
> > Credits go to Ian Molton for original patch. See:
> > http://patchwork.ozlabs.org/patch/38835/
> > 
> > Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> 
> If the original patch is not from Vasilis, why is this patch not
> carrying Ian's Signed-off-by from that patch before Vasilis'? (Then we
> could drop the Patchwork link.)


I'll update in the next version.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug
  2013-07-08  9:48 ` [Qemu-devel] [PATCH v5 00/14] " Andreas Färber
@ 2013-07-12  1:30   ` Hu Tao
  2013-07-14 16:56   ` Paolo Bonzini
  1 sibling, 0 replies; 56+ messages in thread
From: Hu Tao @ 2013-07-12  1:30 UTC (permalink / raw)
  To: Andreas Färber; +Cc: Vasilis Liaskovitis, Paolo Bonzini, qemu-devel

On Mon, Jul 08, 2013 at 11:48:13AM +0200, Andreas Färber wrote:
> Am 26.06.2013 11:13, schrieb Hu Tao:
> > It's been quite a while since v4 and lots of changes happend
> > in qemu and v4 just can't apply anymore. So this series is
> > basically a rebase. Another purpose is to bring up discussions
> > to make consensus on some questions since v4, see
> > http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg01219.html
> > and http://lists.gnu.org/archive/html/qemu-devel/2013-02/msg05332.html
> [...]
> > changes from v4:
> > 
> >   - rebased on a recent qemu-git
> >   - based on another series which seperates i440fx refactor from v4.
> >     http://lists.nongnu.org/archive/html/qemu-devel/2013-06/msg03500.html
> >   - hot-unplug patches not included, as suggested by Vasilis, since
> >     hot-unplug has some more complications with refcounting memory regions.
> 
> This should hopefully be addressed with yesterday's merge of Paolo's
> MemoryRegion refactoring.

Thanks for the tip. I'll consider adding hot-unplug patches in v6.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/14] pc: Add dimm paravirt SRAT info
  2013-07-11  8:49       ` Michael S. Tsirkin
@ 2013-07-12  1:33         ` Hu Tao
  2013-07-14  5:47           ` Michael S. Tsirkin
  0 siblings, 1 reply; 56+ messages in thread
From: Hu Tao @ 2013-07-12  1:33 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Vasilis Liaskovitis, Igor Mammedov, qemu-devel

On Thu, Jul 11, 2013 at 11:49:01AM +0300, Michael S. Tsirkin wrote:
> On Thu, Jul 11, 2013 at 07:13:39AM +0200, Igor Mammedov wrote:
> > On Wed, 10 Jul 2013 13:10:03 +0300
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > On Wed, Jun 26, 2013 at 05:13:33PM +0800, Hu Tao wrote:
> > > > The numa_fw_cfg paravirt interface is extended to include SRAT information for
> > > > all hotplug-able dimms. There are 3 words for each hotplug-able memory slot,
> > > > denoting start address, size and node proximity. The new info is appended after
> > > > existing numa info, so that the fw_cfg layout does not break.  This information
> > > > is used by Seabios to build hotplug memory device objects at runtime.
> > > > nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info
> > > > to SeaBIOS.
> > > > 
> > > > v3->v4: numa_fw_cfg needs to be initalized after memory controller sets up dimm
> > > > ranges.  Make changes for pc_piix and pc_q35 to set numa_fw_cfg after i440fx
> > > > initialization.
> > > > 
> > > > v2->v3: setting nb_numa_nodes to 1 is not needed
> > > > 
> > > > v1->v2:
> > > > Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order not
> > > > to break existing layout
> > > > Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt
> > > > 
> > > > Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> > > > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> > > 
> > > Please do not add any more fwcfg interfaces - generating
> > > ACPI in qemu removes the need for it.
> > > 
> > > So please rebase on top of that work and generate the appropriate ACPI
> > > tables directly.
> > > 
> > > You can find the latest code gnerating ACPI from qemu here:
> > > git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git acpi
> > 
> > will it work with upstream Seabios or custom tree is required for it as well?
> 
> Yes.
> 
> git://git.kernel.org/pub/scm/virt/kvm/mst/seabios.git acpi

This means I must rebase both qemu and seabios patches on top of your
acpi tree. I hope there won't be too much conflicts:)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-07-09 16:53         ` Igor Mammedov
@ 2013-07-12  2:39           ` Hu Tao
  2013-07-14 16:58             ` Paolo Bonzini
  0 siblings, 1 reply; 56+ messages in thread
From: Hu Tao @ 2013-07-12  2:39 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Eduardo Habkost, mst, qemu-devel, Vasilis Liaskovitis,
	Bandan Das, Paolo Bonzini, gaowanlong

On Tue, Jul 09, 2013 at 06:53:47PM +0200, Igor Mammedov wrote:
> On Thu, 27 Jun 2013 08:55:25 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> > Il 27/06/2013 07:08, Wanlong Gao ha scritto:
> > > Do we really need to specify the memory range? I suspect that we can
> > > follow current design of normal memory in hot-plug memory.
> > 
> > I think we can do both.  I'm afraid that the configuration of the VM
> > will not be perfectly reproducible without specifying the range, more so
> > if you allow hotplug.
> > 
> > > Currently,
> > > we just specify the size of normal memory in each node, and the range
> > > in normal memory is node by node. Then I think we can just specify
> > > the memory size of hot-plug in each node, then the hot-plug memory
> > > range is also node by node, and the whole hot-plug memory block is
> > > just located after the normal memory block. If so, the option can
> > > come like:
> > >     -numa node,nodeid=0,mem=2G,cpus=0-1,mem-hotplug=2G,mem-policy=membind,mem-hostnode=0-1,mem-hotplug-policy=interleave,mem-hotplug-hostnode=1
> > >     -numa node,nodeid=1,mem=2G,cpus=2-3,mem-hotplug=2G,mem-policy=preferred,mem-hostnode=1,mem-hotplug-policy=membind,mem-hotplug-hostnode=0-1
> > 
> > I think specifying different policies and bindings for normal and
> > hotplug memory is too much fine-grained.  If you really want that, then
> > you would need something like
> > 
> >     -numa node,nodeid=0,cpus=0-1 \
> >     -numa mem,nodeid=0,size=2G,policy=membind,hostnode=0-1 \
> >     -numa mem,nodeid=0,size=2G,policy=interleave,hostnode=1,populated=no
> > 
> > Hmm... this actually doesn't look too bad, and it is much more
> > future-proof.  Eduardo, what do you think about it?  Should Wanlong redo
> > his patches to support this "-numa mem" syntax?  Parsing it should be
> > easy using the QemuOpts visitor, too.
> 
> Is hot-plugged memory and its bindings to numa nodes have to be specified
> at startup?
> 
> How about extending -m option to support following syntax:
> 
>   -m initial_mem[, number_of_hotplug_dimms, max_mem]]
> 
> and dynamically forming MEMXX._CRS/_PXM resources on demand instead
> of creating static resources in SSDT.
> 
> without necessity to specify to be hot-plugged DIMMs at startup, hot-plug
> could become more flexible since arbitrarily sized DIMMs with required NUMA
> mapping could be specified during hot-plug time, for example:
> 
>   device_add dimm,id=dimm1,bus=membus.0,size=3G,node=1,...

Seems impossible as ACPI memory devices' ranges must be specified at startup,
but I'll be glad if I'm wrong.

> 
> the only limit would be left is a number of increments(DIMM slots), due to
> need to pre-generate ACPI memory devices that could supply _CRS/_PXM
> resources later.
> 
> 
> > Paolo
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 5/7] acpi: generate hotplug memory devices
  2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 5/7] acpi: generate hotplug memory devices Hu Tao
@ 2013-07-12 10:07     ` Igor Mammedov
  0 siblings, 0 replies; 56+ messages in thread
From: Igor Mammedov @ 2013-07-12 10:07 UTC (permalink / raw)
  To: Hu Tao; +Cc: Vasilis Liaskovitis, seabios, qemu-devel

On Wed, 26 Jun 2013 17:15:03 +0800
Hu Tao <hutao@cn.fujitsu.com> wrote:

> The memory device generation is guided by qemu paravirt info. Seabios
> uses the info to setup SRAT entries for the hotplug-able memory slots,
> and to generate appropriate memory device objects. One memory device
> (and corresponding SRAT entry) is generated for each hotplug-able qemu
> memslot. Currently no SSDT memory device is created for initial system
> memory.
> 
> We only support up to 255 DIMMs for now (PackageOp used for the MEON
> array can only describe an array of at most 255 elements. VarPackageOp
> would be needed to support more than 255 devices)
> 
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> ---
>  src/acpi.c     | 151 ++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>  src/paravirt.c |   8 +++
>  2 files changed, 152 insertions(+), 7 deletions(-)
> 
> diff --git a/src/acpi.c b/src/acpi.c
> index ce988e0..e9a0326 100644
> --- a/src/acpi.c
> +++ b/src/acpi.c
> @@ -15,6 +15,8 @@
>  #include "config.h" // CONFIG_*
>  #include "paravirt.h" // RamSize
>  #include "dev-q35.h"
> +#include "memmap.h"
> +#include "paravirt.h"
>  
>  #include "acpi-dsdt.hex"
>  
> @@ -250,11 +252,23 @@ encodeLen(u8 *ssdt_ptr, int length, int bytes)
>  #define PCIHP_AML (ssdp_pcihp_aml + *ssdt_pcihp_start)
>  #define PCI_SLOTS 32
>  
> +/* 0x5B 0x82 DeviceOp PkgLength NameString DimmID */
> +#define MEM_BASE 0xaf80
> +#define MEM_AML (ssdm_mem_aml + *ssdt_mem_start)
> +#define MEM_SIZEOF (*ssdt_mem_end - *ssdt_mem_start)
> +#define MEM_OFFSET_HEX (*ssdt_mem_name - *ssdt_mem_start + 2)
> +#define MEM_OFFSET_ID (*ssdt_mem_id - *ssdt_mem_start)
> +#define MEM_OFFSET_PXM 31
> +#define MEM_OFFSET_START 55
> +#define MEM_OFFSET_END   63
> +#define MEM_OFFSET_SIZE  79
> +
>  #define SSDT_SIGNATURE 0x54445353 // SSDT
>  #define SSDT_HEADER_LENGTH 36
>  
>  #include "ssdt-misc.hex"
>  #include "ssdt-pcihp.hex"
> +#include "ssdt-mem.hex"
>  
>  #define PCI_RMV_BASE 0xae0c
>  
> @@ -306,9 +320,100 @@ static void patch_pcihp(int slot, u8 *ssdt_ptr, u32 eject)
>      }
>  }
>  
> +static void build_memdev(u8 *ssdt_ptr, int i, u64 mem_base, u64 mem_len, u8 node)
> +{
> +    memcpy(ssdt_ptr, MEM_AML, MEM_SIZEOF);
> +    ssdt_ptr[MEM_OFFSET_HEX] = getHex(i >> 4);
> +    ssdt_ptr[MEM_OFFSET_HEX+1] = getHex(i);
> +    ssdt_ptr[MEM_OFFSET_ID] = i;
> +    ssdt_ptr[MEM_OFFSET_PXM] = node;
> +    *(u64*)(ssdt_ptr + MEM_OFFSET_START) = cpu_to_le64(mem_base);
> +    *(u64*)(ssdt_ptr + MEM_OFFSET_END) = cpu_to_le64(mem_base + mem_len);
> +    *(u64*)(ssdt_ptr + MEM_OFFSET_SIZE) = cpu_to_le64(mem_len);
> +}
> +
> +static u8 *build_memssdt(u8 *ssdt_ptr, int memssdt_len,
> +                         u64 *numadimmsmap, int nb_memdevs)
> +{
> +    u64 mem_base, mem_len;
> +    u64 *dimm = numadimmsmap;
> +    int node;
> +    int i;
> +
> +    // build Scope(_SB_) header
> +    *(ssdt_ptr++) = 0x10; // ScopeOp
> +    ssdt_ptr = encodeLen(ssdt_ptr, memssdt_len, 3);
> +    *(ssdt_ptr++) = '_';
> +    *(ssdt_ptr++) = 'S';
> +    *(ssdt_ptr++) = 'B';
> +    *(ssdt_ptr++) = '_';
Windows doesn't like much 2 \_SB in one SSDT table, and BSODs.

Just drop it and add stuff to already existing scope or alternatively
create a second SSDT table.

> +    for (i = 0; i < nb_memdevs; i++) {
> +        mem_base = *dimm++;
> +        mem_len = *dimm++;
> +        node = *dimm++;
> +        build_memdev(ssdt_ptr, i, mem_base, mem_len, node);
> +        ssdt_ptr += MEM_SIZEOF;
> +    }
> +
> +    // build "Method(MTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CM00, Arg1)} ...}"
> +    *(ssdt_ptr++) = 0x14; // MethodOp
> +    ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*nb_memdevs), 2);
> +    *(ssdt_ptr++) = 'M';
> +    *(ssdt_ptr++) = 'T';
> +    *(ssdt_ptr++) = 'F';
> +    *(ssdt_ptr++) = 'Y';
> +    *(ssdt_ptr++) = 0x02;
> +    for (i=0; i<nb_memdevs; i++) {
> +        *(ssdt_ptr++) = 0xA0; // IfOp
> +        ssdt_ptr = encodeLen(ssdt_ptr, 11, 1);
> +        *(ssdt_ptr++) = 0x93; // LEqualOp
> +        *(ssdt_ptr++) = 0x68; // Arg0Op
> +        *(ssdt_ptr++) = 0x0A; // BytePrefix
> +        *(ssdt_ptr++) = i;
> +        *(ssdt_ptr++) = 0x86; // NotifyOp
> +        *(ssdt_ptr++) = 'M';
> +        *(ssdt_ptr++) = 'P';
> +        *(ssdt_ptr++) = getHex(i >> 4);
> +        *(ssdt_ptr++) = getHex(i);
> +        *(ssdt_ptr++) = 0x69; // Arg1Op
> +    }
> +
> +    // build "Name(MEON, Package() { One, One, ..., Zero, Zero, ... })"
> +    *(ssdt_ptr++) = 0x08; // NameOp
> +    *(ssdt_ptr++) = 'M';
> +    *(ssdt_ptr++) = 'E';
> +    *(ssdt_ptr++) = 'O';
> +    *(ssdt_ptr++) = 'N';
> +    *(ssdt_ptr++) = 0x12; // PackageOp
> +    ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*nb_memdevs), 2);
> +    *(ssdt_ptr++) = nb_memdevs;
> +
> +    dimm = numadimmsmap;
> +    u8 memslot_status = 0, enabled;
> +
> +    for (i = 0; i < nb_memdevs; i++) {
> +        enabled = 0;
> +        if (i % 8 == 0)
> +            memslot_status = inb(MEM_BASE + i/8);
> +        enabled = memslot_status & 1;
> +        mem_base = *dimm++;
> +        mem_len = *dimm++;
> +        dimm++;  // node
> +        *(ssdt_ptr++) = enabled ? 0x01 : 0x00;
> +        if (enabled)
> +            add_e820(mem_base, mem_len, E820_RAM);
> +        memslot_status = memslot_status >> 1;
> +    }
> +
> +    return ssdt_ptr;
> +}
> +
>  static void*
>  build_ssdt(void)
>  {
> +    int nb_memdevs;
> +    u64 *numadimmsmap;
>      int acpi_cpus = MaxCountCPUs > 0xff ? 0xff : MaxCountCPUs;
>      int length = (sizeof(ssdp_misc_aml)                     // _S3_ / _S4_ / _S5_
>                    + (1+3+4)                                 // Scope(_SB_)
> @@ -318,9 +423,20 @@ build_ssdt(void)
>                    + (1+3+4)                                 // Scope(PCI0)
>                    + ((PCI_SLOTS - 1) * PCIHP_SIZEOF)        // slots
>                    + (1+2+5+(12*(PCI_SLOTS - 1))));          // PCNT
> -    u8 *ssdt = malloc_high(length);
> +
> +    numadimmsmap = romfile_loadfile("etc/numa-dimm-map", &nb_memdevs);
> +    nb_memdevs /= 3 * sizeof(u64);
> +
> +    // for build_memssdt
> +    int memssdt_length = (1+3+4)
> +                         + (nb_memdevs * MEM_SIZEOF)
> +                         + (1+2+5+(12*nb_memdevs))
> +                         + (6+2+1+(1*nb_memdevs));
> +
> +    u8 *ssdt = malloc_high(length + memssdt_length);
>      if (! ssdt) {
>          warn_noalloc();
> +        free(numadimmsmap);
>          return NULL;
>      }
>      u8 *ssdt_ptr = ssdt;
> @@ -411,10 +527,13 @@ build_ssdt(void)
>  
>      ssdt_ptr = build_notify(ssdt_ptr, "PCNT", 1, PCI_SLOTS, "S00_", 1);
>  
> +    ssdt_ptr = build_memssdt(ssdt_ptr, memssdt_length, numadimmsmap, nb_memdevs);
                        ^^^^^^^ name is misleading, since the result of call is not SSDT

> +
>      build_header((void*)ssdt, SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
>  
>      //hexdump(ssdt, ssdt_ptr - ssdt);
>  
> +    free(numadimmsmap);
>      return ssdt;
>  }
>  
> @@ -458,7 +577,7 @@ acpi_build_srat_memory(struct srat_memory_affinity *numamem,
>      numamem->length = sizeof(*numamem);
>      memset(numamem->proximity, 0, 4);
>      numamem->proximity[0] = node;
> -    numamem->flags = cpu_to_le32(!!enabled);
> +    numamem->flags = cpu_to_le32(!!enabled) | cpu_to_le32(0x2);
>      numamem->base_addr = cpu_to_le64(base);
>      numamem->range_length = cpu_to_le64(len);
>  }
> @@ -466,18 +585,22 @@ acpi_build_srat_memory(struct srat_memory_affinity *numamem,
>  static void *
>  build_srat(void)
>  {
> -    int numadatasize, numacpusize;
> +    int numadatasize, numacpusize, nb_numa_dimms;
> +    u64 *numadimmsmap;
>      u64 *numadata = romfile_loadfile("etc/numa-nodes", &numadatasize);
>      u64 *numacpumap = romfile_loadfile("etc/numa-cpu-map", &numacpusize);
> -    if (!numadata || !numacpumap)
> -        goto fail;
> +
>      int max_cpu = numacpusize / sizeof(u64);
>      int nb_numa_nodes = numadatasize / sizeof(u64);
>  
> +    numadimmsmap = romfile_loadfile("etc/numa-dimm-map", &nb_numa_dimms);
> +
> +    nb_numa_dimms /=  3 * sizeof(u64);
> +
>      struct system_resource_affinity_table *srat;
>      int srat_size = sizeof(*srat) +
>          sizeof(struct srat_processor_affinity) * max_cpu +
> -        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + 2);
> +        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + nb_numa_dimms + 2);
>  
>      srat = malloc_high(srat_size);
>      if (!srat) {
> @@ -512,6 +635,7 @@ build_srat(void)
>       */
>      struct srat_memory_affinity *numamem = (void*)core;
>      int slots = 0;
> +    int node;
>      u64 mem_len, mem_base, next_base = 0;
>  
>      acpi_build_srat_memory(numamem, 0, 640*1024, 0, 1);
> @@ -541,7 +665,18 @@ build_srat(void)
>          numamem++;
>          slots++;
>      }
> -    for (; slots < nb_numa_nodes + 2; slots++) {
> +    if (nb_numa_dimms) {
> +        for (i = 1; i < nb_numa_dimms + 1; ++i) {
> +            mem_base = *numadimmsmap++;
> +            mem_len = *numadimmsmap++;
> +            node = *numadimmsmap++;
> +            acpi_build_srat_memory(numamem, mem_base, mem_len, node, 1);
> +            numamem++;
> +            slots++;
> +        }
> +    }
> +
> +    for (; slots < nb_numa_nodes + nb_numa_dimms + 2; slots++) {
>          acpi_build_srat_memory(numamem, 0, 0, 0, 0);
>          numamem++;
>      }
> @@ -550,10 +685,12 @@ build_srat(void)
>  
>      free(numadata);
>      free(numacpumap);
> +    free(numadimmsmap);
>      return srat;
>  fail:
>      free(numadata);
>      free(numacpumap);
> +    free(numadimmsmap);
>      return NULL;
>  }
>  
> diff --git a/src/paravirt.c b/src/paravirt.c
> index d1a5d3e..5925c63 100644
> --- a/src/paravirt.c
> +++ b/src/paravirt.c
> @@ -240,6 +240,14 @@ qemu_cfg_legacy(void)
>                       , sizeof(numacount) + max_cpu*sizeof(u64)
>                       , numacount*sizeof(u64));
>  
> +    u64 dimm_count;
> +    qemu_cfg_select(QEMU_CFG_NUMA);
> +    qemu_cfg_skip((1 + max_cpu + numacount) * sizeof(u64));
> +    qemu_cfg_read(&dimm_count, sizeof(dimm_count));
> +    qemu_romfile_add("etc/numa-dimm-map", QEMU_CFG_NUMA
> +                     , (2 + max_cpu + numacount) * sizeof(u64),
> +                     dimm_count * 3 * sizeof(u64));
> +
>      // e820 data
>      u32 count32;
>      qemu_cfg_read_entry(&count32, QEMU_CFG_E820_TABLE, sizeof(count32));

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/14] pc: Add dimm paravirt SRAT info
  2013-07-12  1:33         ` Hu Tao
@ 2013-07-14  5:47           ` Michael S. Tsirkin
  0 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2013-07-14  5:47 UTC (permalink / raw)
  To: Hu Tao; +Cc: Vasilis Liaskovitis, Igor Mammedov, qemu-devel

On Fri, Jul 12, 2013 at 09:33:28AM +0800, Hu Tao wrote:
> On Thu, Jul 11, 2013 at 11:49:01AM +0300, Michael S. Tsirkin wrote:
> > On Thu, Jul 11, 2013 at 07:13:39AM +0200, Igor Mammedov wrote:
> > > On Wed, 10 Jul 2013 13:10:03 +0300
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > 
> > > > On Wed, Jun 26, 2013 at 05:13:33PM +0800, Hu Tao wrote:
> > > > > The numa_fw_cfg paravirt interface is extended to include SRAT information for
> > > > > all hotplug-able dimms. There are 3 words for each hotplug-able memory slot,
> > > > > denoting start address, size and node proximity. The new info is appended after
> > > > > existing numa info, so that the fw_cfg layout does not break.  This information
> > > > > is used by Seabios to build hotplug memory device objects at runtime.
> > > > > nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info
> > > > > to SeaBIOS.
> > > > > 
> > > > > v3->v4: numa_fw_cfg needs to be initalized after memory controller sets up dimm
> > > > > ranges.  Make changes for pc_piix and pc_q35 to set numa_fw_cfg after i440fx
> > > > > initialization.
> > > > > 
> > > > > v2->v3: setting nb_numa_nodes to 1 is not needed
> > > > > 
> > > > > v1->v2:
> > > > > Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order not
> > > > > to break existing layout
> > > > > Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt
> > > > > 
> > > > > Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> > > > > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> > > > 
> > > > Please do not add any more fwcfg interfaces - generating
> > > > ACPI in qemu removes the need for it.
> > > > 
> > > > So please rebase on top of that work and generate the appropriate ACPI
> > > > tables directly.
> > > > 
> > > > You can find the latest code gnerating ACPI from qemu here:
> > > > git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git acpi
> > > 
> > > will it work with upstream Seabios or custom tree is required for it as well?
> > 
> > Yes.
> > 
> > git://git.kernel.org/pub/scm/virt/kvm/mst/seabios.git acpi
> 
> This means I must rebase both qemu and seabios patches on top of your
> acpi tree. I hope there won't be too much conflicts:)

Shouldn't be hard, if there's a problem pls tell me and I'll try to
help.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug
  2013-07-08  9:48 ` [Qemu-devel] [PATCH v5 00/14] " Andreas Färber
  2013-07-12  1:30   ` Hu Tao
@ 2013-07-14 16:56   ` Paolo Bonzini
  1 sibling, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2013-07-14 16:56 UTC (permalink / raw)
  To: Andreas Färber; +Cc: Vasilis Liaskovitis, Hu Tao, qemu-devel

Il 08/07/2013 11:48, Andreas Färber ha scritto:
>> >   - hot-unplug patches not included, as suggested by Vasilis, since
>> >     hot-unplug has some more complications with refcounting memory regions.
> This should hopefully be addressed with yesterday's merge of Paolo's
> MemoryRegion refactoring.

Not entirely because address_space_translate does not yet ref/unref the
region, but that's easy enough to add (it is in my rcu branch on
github).  But the ownership infrastructure is indeed in master now.

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-07-12  2:39           ` Hu Tao
@ 2013-07-14 16:58             ` Paolo Bonzini
  2013-07-16  1:26               ` Hu Tao
  0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2013-07-14 16:58 UTC (permalink / raw)
  To: Hu Tao
  Cc: Eduardo Habkost, mst, qemu-devel, Vasilis Liaskovitis,
	Bandan Das, Igor Mammedov, gaowanlong

Il 12/07/2013 04:39, Hu Tao ha scritto:
> > without necessity to specify to be hot-plugged DIMMs at startup, hot-plug
> > could become more flexible since arbitrarily sized DIMMs with required NUMA
> > mapping could be specified during hot-plug time, for example:
> > 
> >   device_add dimm,id=dimm1,bus=membus.0,size=3G,node=1,...
> 
> Seems impossible as ACPI memory devices' ranges must be specified at startup,
> but I'll be glad if I'm wrong.

Yeah, I also understood the problem to be the SRAT, not the DSDT...  Of
course you could reserve a single large part of the address range in
"-numa mem", and then add one or more DIMMs covering only part of the
address range.

Perhaps "populated" could be a size instead of a boolean, to do
something like what Igor suggested.

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-06-27  6:55       ` Paolo Bonzini
  2013-07-09 16:53         ` Igor Mammedov
@ 2013-07-15 17:05         ` Vasilis Liaskovitis
  2013-07-15 17:10           ` Paolo Bonzini
  1 sibling, 1 reply; 56+ messages in thread
From: Vasilis Liaskovitis @ 2013-07-15 17:05 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Hu Tao, Bandan Das, Eduardo Habkost, gaowanlong, qemu-devel

Hi,

On Thu, Jun 27, 2013 at 08:55:25AM +0200, Paolo Bonzini wrote:
> Il 27/06/2013 07:08, Wanlong Gao ha scritto:
> > Do we really need to specify the memory range? I suspect that we can
> > follow current design of normal memory in hot-plug memory.
> 
> I think we can do both.  I'm afraid that the configuration of the VM
> will not be perfectly reproducible without specifying the range, more so
> if you allow hotplug.
> 
> > Currently,
> > we just specify the size of normal memory in each node, and the range
> > in normal memory is node by node. Then I think we can just specify
> > the memory size of hot-plug in each node, then the hot-plug memory
> > range is also node by node, and the whole hot-plug memory block is
> > just located after the normal memory block. If so, the option can
> > come like:
> >     -numa node,nodeid=0,mem=2G,cpus=0-1,mem-hotplug=2G,mem-policy=membind,mem-hostnode=0-1,mem-hotplug-policy=interleave,mem-hotplug-hostnode=1
> >     -numa node,nodeid=1,mem=2G,cpus=2-3,mem-hotplug=2G,mem-policy=preferred,mem-hostnode=1,mem-hotplug-policy=membind,mem-hotplug-hostnode=0-1
> 
> I think specifying different policies and bindings for normal and
> hotplug memory is too much fine-grained.  If you really want that, then
> you would need something like
> 
>     -numa node,nodeid=0,cpus=0-1 \
>     -numa mem,nodeid=0,size=2G,policy=membind,hostnode=0-1 \
>     -numa mem,nodeid=0,size=2G,policy=interleave,hostnode=1,populated=no
> 
> Hmm... this actually doesn't look too bad, and it is much more
> future-proof.  Eduardo, what do you think about it?  Should Wanlong redo
> his patches to support this "-numa mem" syntax?  Parsing it should be
> easy using the QemuOpts visitor, too.

from what i understand, we are currently favoring this numa option? (I saw it
mentioned in Gao's numa patchset series as well)

There is still the question of "how many hotpluggable dimm devices does this
memory range describe?". With the dimm device that was clearly defined, but not
so with this option. Do we choose a default granularity e.g. 1 GB?

Also, as you mentioned, without specifying the memory range, the VM
configuration may be ambiguous. Currently, the VM memory map depends on the
order of dimms defined on the command line. So:

"-device dimm,id=dimm0,size=1G,node=0 -device dimm,id=dimm1,size=2G,node=0"
and
"-device dimm,id=dimm1,size=2G,node=0 -device dimm,id=dimm1,size=1G,node=0"

assign different memory ranges to the dimms.

On the other hand, iirc memory ranges were discussed with previous maintainers
but was rejected: The user/management library may not want to know or simply
does not know architectural details of the guest hardware. What happens if
the user specifies memory on the PCI-hole? Do we bail out or adjust their
arguments? Adjusting ranges might open another can of worms.

In any case, it would be good to get a final consensus on this.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-07-15 17:05         ` Vasilis Liaskovitis
@ 2013-07-15 17:10           ` Paolo Bonzini
  2013-07-15 17:20             ` Vasilis Liaskovitis
  2013-07-16  1:27             ` Hu Tao
  0 siblings, 2 replies; 56+ messages in thread
From: Paolo Bonzini @ 2013-07-15 17:10 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: Hu Tao, Bandan Das, Eduardo Habkost, gaowanlong, qemu-devel

Il 15/07/2013 19:05, Vasilis Liaskovitis ha scritto:
> from what i understand, we are currently favoring this numa option? (I saw it
> mentioned in Gao's numa patchset series as well)

The two patchsets have some overlap, so it's good to find a design that
fits both.

> There is still the question of "how many hotpluggable dimm devices does this
> memory range describe?". With the dimm device that was clearly defined, but not
> so with this option. Do we choose a default granularity e.g. 1 GB?

I think it's the same.  One "-numa mem" option = one "-device dimm"
option; both define one range.  Unused memory ranges may remain if you
stumble upon a unusable range such as the PCI window.  For example two
"-numa mem,size=2G" options would allocate memory from 0 to 2G and from
4 to 6G.

I'm snipping the rest of the email because I hope this covers all your
doubts.

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-07-15 17:10           ` Paolo Bonzini
@ 2013-07-15 17:20             ` Vasilis Liaskovitis
  2013-07-16  1:27             ` Hu Tao
  1 sibling, 0 replies; 56+ messages in thread
From: Vasilis Liaskovitis @ 2013-07-15 17:20 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Hu Tao, Bandan Das, Eduardo Habkost, gaowanlong, qemu-devel

On Mon, Jul 15, 2013 at 07:10:30PM +0200, Paolo Bonzini wrote:
> Il 15/07/2013 19:05, Vasilis Liaskovitis ha scritto:
> > from what i understand, we are currently favoring this numa option? (I saw it
> > mentioned in Gao's numa patchset series as well)
> 
> The two patchsets have some overlap, so it's good to find a design that
> fits both.
> 
> > There is still the question of "how many hotpluggable dimm devices does this
> > memory range describe?". With the dimm device that was clearly defined, but not
> > so with this option. Do we choose a default granularity e.g. 1 GB?
> 
> I think it's the same.  One "-numa mem" option = one "-device dimm"
> option; both define one range.  Unused memory ranges may remain if you

ah ok, I get the proposal now.

> stumble upon a unusable range such as the PCI window.  For example two
> "-numa mem,size=2G" options would allocate memory from 0 to 2G and from
> 4 to 6G.

ok, that's done already.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 7/7] pci: Use paravirt interface for pcimem_start and pcimem64_start
  2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 7/7] pci: Use paravirt interface for pcimem_start and pcimem64_start Hu Tao
@ 2013-07-15 20:11     ` Vasilis Liaskovitis
  0 siblings, 0 replies; 56+ messages in thread
From: Vasilis Liaskovitis @ 2013-07-15 20:11 UTC (permalink / raw)
  To: Hu Tao; +Cc: seabios, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3158 bytes --]

Hi,

2013/6/26 Hu Tao <hutao@cn.fujitsu.com>

> From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
>
> Initialize the 32-bit and 64-bit pci starting offsets from values passed
> in by
> the qemu paravirt interface QEMU_CFG_PCI_WINDOW. Qemu calculates the
> starting
> offsets based on initial memory and hotplug-able dimms.
>

We should drop this patch and the corresponding seabios patch since Michael
Tsirkin's pci-window patches are merged or will be soon.
See
"pc: pass PCI hole ranges to Guests" - already in qemu master
"pci: load memory window setup from host" (should go into seabios)

thanks,

- Vasilis



> It's possible to avoid the new paravirt interface, and calculate pci
> ranges from
> srat entries. But the code changes are ugly, see:
> http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg03548.html
>
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> ---
>  src/paravirt.c | 7 +++++++
>  src/paravirt.h | 1 +
>  src/pciinit.c  | 9 +++++++++
>  3 files changed, 17 insertions(+)
>
> diff --git a/src/paravirt.c b/src/paravirt.c
> index 5925c63..9c1e511 100644
> --- a/src/paravirt.c
> +++ b/src/paravirt.c
> @@ -134,6 +134,7 @@ qemu_platform_setup(void)
>  #define QEMU_CFG_BOOT_MENU              0x0e
>  #define QEMU_CFG_MAX_CPUS               0x0f
>  #define QEMU_CFG_FILE_DIR               0x19
> +#define QEMU_CFG_PCI_WINDOW             0x1a
>  #define QEMU_CFG_ARCH_LOCAL             0x8000
>  #define QEMU_CFG_ACPI_TABLES            (QEMU_CFG_ARCH_LOCAL + 0)
>  #define QEMU_CFG_SMBIOS_ENTRIES         (QEMU_CFG_ARCH_LOCAL + 1)
> @@ -339,3 +340,9 @@ void qemu_cfg_init(void)
>                           , 0, be32_to_cpu(qfile.size));
>      }
>  }
> +
> +void qemu_cfg_get_pci_offsets(u64 *pcimem_start, u64 *pcimem64_start)
> +{
> +    qemu_cfg_read_entry(pcimem_start, QEMU_CFG_PCI_WINDOW, sizeof(u64));
> +    qemu_cfg_read((u8*)(pcimem64_start), sizeof(u64));
> +}
> diff --git a/src/paravirt.h b/src/paravirt.h
> index fce5af9..2c37d0d 100644
> --- a/src/paravirt.h
> +++ b/src/paravirt.h
> @@ -27,5 +27,6 @@ static inline int runningOnKVM(void) {
>  void qemu_preinit(void);
>  void qemu_platform_setup(void);
>  void qemu_cfg_init(void);
> +void qemu_cfg_get_pci_offsets(u64 *pcimem_start, u64 *pcimem64_start);
>
>  #endif
> diff --git a/src/pciinit.c b/src/pciinit.c
> index 8370b96..7e63c5e 100644
> --- a/src/pciinit.c
> +++ b/src/pciinit.c
> @@ -805,6 +805,7 @@ static void pci_bios_map_devices(struct pci_bus
> *busses)
>  void
>  pci_setup(void)
>  {
> +    u64 pv_pcimem_start, pv_pcimem64_start;
>      if (!CONFIG_QEMU)
>          return;
>
> @@ -837,6 +838,14 @@ pci_setup(void)
>
>      pci_bios_init_devices();
>
> +    /* if qemu gives us other pci window values, it means there are
> hotplug-able
> +     * dimms. Adjust accordingly */
> +    qemu_cfg_get_pci_offsets(&pv_pcimem_start, &pv_pcimem64_start);
> +    if (pv_pcimem_start > pcimem_start)
> +        pcimem_start = pv_pcimem_start;
> +    if (pv_pcimem64_start > pcimem64_start)
> +        pcimem64_start = pv_pcimem64_start;
> +
>      free(busses);
>
>      pci_enable_default_vga();
> --
> 1.8.3.1
>
>

[-- Attachment #2: Type: text/html, Size: 4340 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-07-14 16:58             ` Paolo Bonzini
@ 2013-07-16  1:26               ` Hu Tao
  0 siblings, 0 replies; 56+ messages in thread
From: Hu Tao @ 2013-07-16  1:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Eduardo Habkost, mst, qemu-devel, Vasilis Liaskovitis,
	Bandan Das, Igor Mammedov, gaowanlong

On Sun, Jul 14, 2013 at 06:58:23PM +0200, Paolo Bonzini wrote:
> Il 12/07/2013 04:39, Hu Tao ha scritto:
> > > without necessity to specify to be hot-plugged DIMMs at startup, hot-plug
> > > could become more flexible since arbitrarily sized DIMMs with required NUMA
> > > mapping could be specified during hot-plug time, for example:
> > > 
> > >   device_add dimm,id=dimm1,bus=membus.0,size=3G,node=1,...
> > 
> > Seems impossible as ACPI memory devices' ranges must be specified at startup,
> > but I'll be glad if I'm wrong.
> 
> Yeah, I also understood the problem to be the SRAT, not the DSDT...  Of
> course you could reserve a single large part of the address range in
> "-numa mem", and then add one or more DIMMs covering only part of the
> address range.

still needs to specify -dimm at startup, right?

> 
> Perhaps "populated" could be a size instead of a boolean, to do
> something like what Igor suggested.

seems odd that a memory device is partially populated. How can guest OS
and firmware cope with that?

> 
> Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-07-15 17:10           ` Paolo Bonzini
  2013-07-15 17:20             ` Vasilis Liaskovitis
@ 2013-07-16  1:27             ` Hu Tao
  2013-07-16  6:19               ` Paolo Bonzini
  1 sibling, 1 reply; 56+ messages in thread
From: Hu Tao @ 2013-07-16  1:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vasilis Liaskovitis, Bandan Das, Eduardo Habkost, gaowanlong, qemu-devel

On Mon, Jul 15, 2013 at 07:10:30PM +0200, Paolo Bonzini wrote:
> Il 15/07/2013 19:05, Vasilis Liaskovitis ha scritto:
> > from what i understand, we are currently favoring this numa option? (I saw it
> > mentioned in Gao's numa patchset series as well)
> 
> The two patchsets have some overlap, so it's good to find a design that
> fits both.
> 
> > There is still the question of "how many hotpluggable dimm devices does this
> > memory range describe?". With the dimm device that was clearly defined, but not
> > so with this option. Do we choose a default granularity e.g. 1 GB?
> 
> I think it's the same.  One "-numa mem" option = one "-device dimm"
> option; both define one range.  Unused memory ranges may remain if you
> stumble upon a unusable range such as the PCI window.  For example two
> "-numa mem,size=2G" options would allocate memory from 0 to 2G and from
> 4 to 6G.

So we can drop -dimm if we agree on -numa mem?

> 
> I'm snipping the rest of the email because I hope this covers all your
> doubts.
> 
> Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-07-16  1:27             ` Hu Tao
@ 2013-07-16  6:19               ` Paolo Bonzini
  2013-07-16  7:27                 ` Hu Tao
  2013-07-16 10:19                 ` Igor Mammedov
  0 siblings, 2 replies; 56+ messages in thread
From: Paolo Bonzini @ 2013-07-16  6:19 UTC (permalink / raw)
  To: Hu Tao
  Cc: Vasilis Liaskovitis, Bandan Das, Eduardo Habkost, gaowanlong, qemu-devel

Il 16/07/2013 03:27, Hu Tao ha scritto:
> > I think it's the same.  One "-numa mem" option = one "-device dimm"
> > option; both define one range.  Unused memory ranges may remain if you
> > stumble upon a unusable range such as the PCI window.  For example two
> > "-numa mem,size=2G" options would allocate memory from 0 to 2G and from
> > 4 to 6G.
> 
> So we can drop -dimm if we agree on -numa mem?

Yes, the point of the "-numa mem" proposal was to avoid the concept of a
"partially initialized device" that you had for DIMMs.

BTW, how do you specify which module you are plugging in?  I.e., what if
I have three 1G ranges at 0, 1G and 2G, and I want to plug the first and
the third?

This is especially important with hot-unplug, because then you can have
this kind of hole in the address space.  If you migrate the VM, you have
to reproduce the situation in the destination command line.

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-07-16  6:19               ` Paolo Bonzini
@ 2013-07-16  7:27                 ` Hu Tao
  2013-07-16 10:22                   ` Igor Mammedov
  2013-07-16 10:19                 ` Igor Mammedov
  1 sibling, 1 reply; 56+ messages in thread
From: Hu Tao @ 2013-07-16  7:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vasilis Liaskovitis, Bandan Das, Eduardo Habkost, gaowanlong, qemu-devel

On Tue, Jul 16, 2013 at 08:19:48AM +0200, Paolo Bonzini wrote:
> Il 16/07/2013 03:27, Hu Tao ha scritto:
> > > I think it's the same.  One "-numa mem" option = one "-device dimm"
> > > option; both define one range.  Unused memory ranges may remain if you
> > > stumble upon a unusable range such as the PCI window.  For example two
> > > "-numa mem,size=2G" options would allocate memory from 0 to 2G and from
> > > 4 to 6G.
> > 
> > So we can drop -dimm if we agree on -numa mem?
> 
> Yes, the point of the "-numa mem" proposal was to avoid the concept of a
> "partially initialized device" that you had for DIMMs.
> 
> BTW, how do you specify which module you are plugging in?  I.e., what if
> I have three 1G ranges at 0, 1G and 2G, and I want to plug the first and
> the third?

I think an id is still needed to identify ranges, which can be shown to
user with `info numa' or similar command, along with the corresponding
ranges.

> 
> This is especially important with hot-unplug, because then you can have
> this kind of hole in the address space.  If you migrate the VM, you have
> to reproduce the situation in the destination command line.
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-07-16  6:19               ` Paolo Bonzini
  2013-07-16  7:27                 ` Hu Tao
@ 2013-07-16 10:19                 ` Igor Mammedov
  2013-07-16 10:31                   ` Paolo Bonzini
  1 sibling, 1 reply; 56+ messages in thread
From: Igor Mammedov @ 2013-07-16 10:19 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Eduardo Habkost, Hu Tao, qemu-devel, Vasilis Liaskovitis,
	Bandan Das, gaowanlong

On Tue, 16 Jul 2013 08:19:48 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 16/07/2013 03:27, Hu Tao ha scritto:
> > > I think it's the same.  One "-numa mem" option = one "-device dimm"
> > > option; both define one range.  Unused memory ranges may remain if you
> > > stumble upon a unusable range such as the PCI window.  For example two
> > > "-numa mem,size=2G" options would allocate memory from 0 to 2G and from
> > > 4 to 6G.
> > 
> > So we can drop -dimm if we agree on -numa mem?
> 
> Yes, the point of the "-numa mem" proposal was to avoid the concept of a
> "partially initialized device" that you had for DIMMs.
I've though -numa mem was for mapping initial memory to numa nodes.
It seem wrong to use it for representing dimm device and also limiting
possible hotplugged regions to specified at startup ranges.

we can leave -numa for initial memory mapping and manage of the mapping
of hotpluggable regions with -device dimm,node=X,size=Y.

It that case command line -device dimm will provide a fully initialized
dimm device usable at startup (but hot-unplugable) and
  (monitor) device_add dimm,,node=X,size=Y
would serve hot-plug case.

That way arbitrary sized dimm could be hot-pluged without specifying them
at startup, like it's done on bare-metal.

In addition command line -device would be used in migration case to describe
already hot-plugged dimms on target.

> BTW, how do you specify which module you are plugging in?  I.e., what if
> I have three 1G ranges at 0, 1G and 2G, and I want to plug the first and
> the third?
> This is especially important with hot-unplug, because then you can have
> this kind of hole in the address space.  If you migrate the VM, you have
> to reproduce the situation in the destination command line.
Could we add optional advising 'base-addr' property to replicate exact
source configuration on target?  some thing like:
 -device dimm,node=X,size=Y,base-addr=Z
where mgmt could get actual X,Y,Z inspecting dim device via qom-get.
and if base-addr is not set parent of dimm would auto-allocate address
where dimm is mapped.

> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-07-16  7:27                 ` Hu Tao
@ 2013-07-16 10:22                   ` Igor Mammedov
  0 siblings, 0 replies; 56+ messages in thread
From: Igor Mammedov @ 2013-07-16 10:22 UTC (permalink / raw)
  To: Hu Tao
  Cc: Eduardo Habkost, qemu-devel, Vasilis Liaskovitis, Bandan Das,
	Paolo Bonzini, gaowanlong

On Tue, 16 Jul 2013 15:27:19 +0800
Hu Tao <hutao@cn.fujitsu.com> wrote:

> On Tue, Jul 16, 2013 at 08:19:48AM +0200, Paolo Bonzini wrote:
> > Il 16/07/2013 03:27, Hu Tao ha scritto:
> > > > I think it's the same.  One "-numa mem" option = one "-device dimm"
> > > > option; both define one range.  Unused memory ranges may remain if you
> > > > stumble upon a unusable range such as the PCI window.  For example two
> > > > "-numa mem,size=2G" options would allocate memory from 0 to 2G and from
> > > > 4 to 6G.
> > > 
> > > So we can drop -dimm if we agree on -numa mem?
> > 
> > Yes, the point of the "-numa mem" proposal was to avoid the concept of a
> > "partially initialized device" that you had for DIMMs.
> > 
> > BTW, how do you specify which module you are plugging in?  I.e., what if
> > I have three 1G ranges at 0, 1G and 2G, and I want to plug the first and
> > the third?
> 
> I think an id is still needed to identify ranges, which can be shown to
> user with `info numa' or similar command, along with the corresponding
> ranges.
"info numa" could get actual ranges enumerating present dimms if we would use
plain -device for hotplug memory instead of -numa mem hacking.

> 
> > 
> > This is especially important with hot-unplug, because then you can have
> > this kind of hole in the address space.  If you migrate the VM, you have
> > to reproduce the situation in the destination command line.
> > 
> > Paolo
> > 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-07-16 10:19                 ` Igor Mammedov
@ 2013-07-16 10:31                   ` Paolo Bonzini
  2013-07-16 12:00                     ` Igor Mammedov
  0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2013-07-16 10:31 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Eduardo Habkost, Hu Tao, qemu-devel, Vasilis Liaskovitis,
	Bandan Das, gaowanlong

Il 16/07/2013 12:19, Igor Mammedov ha scritto:
> On Tue, 16 Jul 2013 08:19:48 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> Il 16/07/2013 03:27, Hu Tao ha scritto:
>>>> I think it's the same.  One "-numa mem" option = one "-device dimm"
>>>> option; both define one range.  Unused memory ranges may remain if you
>>>> stumble upon a unusable range such as the PCI window.  For example two
>>>> "-numa mem,size=2G" options would allocate memory from 0 to 2G and from
>>>> 4 to 6G.
>>>
>>> So we can drop -dimm if we agree on -numa mem?
>>
>> Yes, the point of the "-numa mem" proposal was to avoid the concept of a
>> "partially initialized device" that you had for DIMMs.
> I've though -numa mem was for mapping initial memory to numa nodes.
> It seem wrong to use it for representing dimm device and also limiting
> possible hotplugged regions to specified at startup ranges.

It's not for DIMM devices, it is for reserving areas of the address
space for hot-plugged RAM.  DIMM hotplug is done with "device_add dimm"
(and you can also use "-numa mem,populated=no,... -device dimm,..." to
start a VM with hot-unpluggable memory).

> we can leave -numa for initial memory mapping and manage of the mapping
> of hotpluggable regions with -device dimm,node=X,size=Y.
> 
> It that case command line -device dimm will provide a fully initialized
> dimm device usable at startup (but hot-unplugable) and
>   (monitor) device_add dimm,,node=X,size=Y
> would serve hot-plug case.
> 
> That way arbitrary sized dimm could be hot-pluged without specifying them
> at startup, like it's done on bare-metal.

But the memory ranges need to be specified at startup in the ACPI
tables, and that's what "-numa mem" is for.

> In addition command line -device would be used in migration case to describe
> already hot-plugged dimms on target.

Yep.

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-07-16 10:31                   ` Paolo Bonzini
@ 2013-07-16 12:00                     ` Igor Mammedov
  2013-07-16 12:17                       ` Paolo Bonzini
  0 siblings, 1 reply; 56+ messages in thread
From: Igor Mammedov @ 2013-07-16 12:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Eduardo Habkost, Hu Tao, qemu-devel, Vasilis Liaskovitis,
	Bandan Das, gaowanlong

On Tue, 16 Jul 2013 12:31:46 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 16/07/2013 12:19, Igor Mammedov ha scritto:
> > On Tue, 16 Jul 2013 08:19:48 +0200
> > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > 
> >> Il 16/07/2013 03:27, Hu Tao ha scritto:
> >>>> I think it's the same.  One "-numa mem" option = one "-device dimm"
> >>>> option; both define one range.  Unused memory ranges may remain if you
> >>>> stumble upon a unusable range such as the PCI window.  For example two
> >>>> "-numa mem,size=2G" options would allocate memory from 0 to 2G and from
> >>>> 4 to 6G.
> >>>
> >>> So we can drop -dimm if we agree on -numa mem?
> >>
> >> Yes, the point of the "-numa mem" proposal was to avoid the concept of a
> >> "partially initialized device" that you had for DIMMs.
> > I've though -numa mem was for mapping initial memory to numa nodes.
> > It seem wrong to use it for representing dimm device and also limiting
> > possible hotplugged regions to specified at startup ranges.
> 
> It's not for DIMM devices, it is for reserving areas of the address
> space for hot-plugged RAM.  DIMM hotplug is done with "device_add dimm"
> (and you can also use "-numa mem,populated=no,... -device dimm,..." to
> start a VM with hot-unpluggable memory).
There isn't a real need to reserve from ACPI pov, memory device in ACPI could
provide _PXM() method to return mapping to numa node.
And from my testing linux and windows guest are using it, even if is there is
unnecessary mapping in SRAT table overriding SRAT mammping with dynamic one.

It would be better not to use "populated" concept at all. If there is
-device dim on cmd line, then it populated and for hotplugged dimm
all necessary information could be generated dynamically.

> > we can leave -numa for initial memory mapping and manage of the mapping
> > of hotpluggable regions with -device dimm,node=X,size=Y.
> > 
> > It that case command line -device dimm will provide a fully initialized
> > dimm device usable at startup (but hot-unplugable) and
> >   (monitor) device_add dimm,,node=X,size=Y
> > would serve hot-plug case.
> > 
> > That way arbitrary sized dimm could be hot-pluged without specifying them
> > at startup, like it's done on bare-metal.
> 
> But the memory ranges need to be specified at startup in the ACPI
> tables, and that's what "-numa mem" is for.
not really, there is caveat with windows, which needs a hotplugable SRAT entry
that tells it max possible limit (otherwise windows sees new dimm device but
refuses to use it saying "server is not configured for hotplug" or something
like this), but as far as such entry exists, windows is happily uses dynamic
_CRS() and _PXM() if they are below that limit (even if a new range is not in
any range defined by SRAT).

And ACPI spec doesn't say that SRAT MUST be populated with hotplug ranges.

It's kind of simplier for bare-metal, where they might do it due to limited
supported DIMM capacity by reserving static entries with max supported ranges
per DIMM and know in advance DIMM count for platform. But actual _CRS() anyway
dynamic since plugged in DIMM could have a smaller capacity then supported
max for slot.

To summarize ACPI + windows limitations:
 - ACPI needs to pre-allocate memory devices, i.e. number of possible increments
   OSPM could utilize. It might be possible to overcome limitation be using
   Load() or LoadTable() in runtime, but I haven't tried it.
 - Windows needs to know max supported limit, a fake entry in SRAT from RamSize
   to max_mem works nicely there (tested with ws2008r2DC and ws2012DC).

That's why I was proposing to extend "-m" option for "slots" number (i.e. nr of
memory devices) and 'max_mem' to make Windows happy and cap mgmt tools from
going over initially configured limit.

then -device dimm could be used for hotpluggable mem available at startup
and device_add fir adding more dimms with user defined sizes to desired nodes
at runtime.

Works nice without any need for 'populated=xxx' and predefined ranges.

PS:
I'll be able to post more or less usable RFC that does it on top of mst's ACPI
tables in QEMU by the end of this week.

> 
> > In addition command line -device would be used in migration case to describe
> > already hot-plugged dimms on target.
> 
> Yep.
> 
> Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm"
  2013-07-16 12:00                     ` Igor Mammedov
@ 2013-07-16 12:17                       ` Paolo Bonzini
  0 siblings, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2013-07-16 12:17 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Eduardo Habkost, Hu Tao, qemu-devel, Vasilis Liaskovitis,
	Bandan Das, gaowanlong

Il 16/07/2013 14:00, Igor Mammedov ha scritto:
>>> we can leave -numa for initial memory mapping and manage of the mapping
>>> of hotpluggable regions with -device dimm,node=X,size=Y.
>>>
>>> It that case command line -device dimm will provide a fully initialized
>>> dimm device usable at startup (but hot-unplugable) and
>>>   (monitor) device_add dimm,,node=X,size=Y
>>> would serve hot-plug case.
>>>
>>> That way arbitrary sized dimm could be hot-pluged without specifying them
>>> at startup, like it's done on bare-metal.
>>
>> But the memory ranges need to be specified at startup in the ACPI
>> tables, and that's what "-numa mem" is for.
> not really, there is caveat with windows, which needs a hotplugable SRAT entry
> that tells it max possible limit (otherwise windows sees new dimm device but
> refuses to use it saying "server is not configured for hotplug" or something
> like this), but as far as such entry exists, windows is happily uses dynamic
> _CRS() and _PXM() if they are below that limit (even if a new range is not in
> any range defined by SRAT).
> 
> And ACPI spec doesn't say that SRAT MUST be populated with hotplug ranges.

Right, what is required in ACPI is only pre-allocation of slots.

> It's kind of simplier for bare-metal, where they might do it due to limited
> supported DIMM capacity by reserving static entries with max supported ranges
> per DIMM and know in advance DIMM count for platform. But actual _CRS() anyway
> dynamic since plugged in DIMM could have a smaller capacity then supported
> max for slot.
> 
> To summarize ACPI + windows limitations:
>  - ACPI needs to pre-allocate memory devices, i.e. number of possible increments
>    OSPM could utilize. It might be possible to overcome limitation be using
>    Load() or LoadTable() in runtime, but I haven't tried it.
>  - Windows needs to know max supported limit, a fake entry in SRAT from RamSize
>    to max_mem works nicely there (tested with ws2008r2DC and ws2012DC).
> 
> That's why I was proposing to extend "-m" option for "slots" number (i.e. nr of
> memory devices) and 'max_mem' to make Windows happy and cap mgmt tools from
> going over initially configured limit.

As far as memory hot-plug is concerned, the "-numa mem" proposal is
exactly the same thing that you are proposing, minus the ability to
specify many slots in one go.  The same "-numa mem" can be used for host
node binding as well, but that's not really relevant for memory hot-plug.

In case you want multiple hotpluggable ranges, bonud to different host
nodes, It doesn't matter if the SRAT will have one or many fake entries.

> then -device dimm could be used for hotpluggable mem available at startup
> and device_add fir adding more dimms with user defined sizes to desired nodes
> at runtime.

Yes, no disagreement on this.

> Works nice without any need for 'populated=xxx' and predefined ranges.
> 
> PS:
> I'll be able to post more or less usable RFC that does it on top of mst's ACPI
> tables in QEMU by the end of this week.

Good!

Paolo

>>
>>> In addition command line -device would be used in migration case to describe
>>> already hot-plugged dimms on target.
>>
>> Yep.
>>
>> Paolo
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2013-07-16 12:17 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-26  9:13 [Qemu-devel] [PATCH v5 00/14] ACPI memory hotplug Hu Tao
2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 01/14] qapi: make visit_type_size fallback to v->type_int() Hu Tao
2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 02/14] Add SIZE type to qdev properties Hu Tao
2013-07-08  9:37   ` Andreas Färber
2013-07-12  1:27     ` Hu Tao
2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 03/14] qemu-option: export parse_option_number Hu Tao
2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 04/14] Implement dimm device abstraction Hu Tao
2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm" Hu Tao
2013-06-26  9:46   ` Paolo Bonzini
2013-06-27  5:08     ` Wanlong Gao
2013-06-27  6:55       ` Paolo Bonzini
2013-07-09 16:53         ` Igor Mammedov
2013-07-12  2:39           ` Hu Tao
2013-07-14 16:58             ` Paolo Bonzini
2013-07-16  1:26               ` Hu Tao
2013-07-15 17:05         ` Vasilis Liaskovitis
2013-07-15 17:10           ` Paolo Bonzini
2013-07-15 17:20             ` Vasilis Liaskovitis
2013-07-16  1:27             ` Hu Tao
2013-07-16  6:19               ` Paolo Bonzini
2013-07-16  7:27                 ` Hu Tao
2013-07-16 10:22                   ` Igor Mammedov
2013-07-16 10:19                 ` Igor Mammedov
2013-07-16 10:31                   ` Paolo Bonzini
2013-07-16 12:00                     ` Igor Mammedov
2013-07-16 12:17                       ` Paolo Bonzini
2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 06/14] acpi_piix4 : Implement memory device hotplug registers Hu Tao
2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 07/14] acpi_ich9 " Hu Tao
2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 08/14] memory: record below_4g_mem_size, above_4g_mem_size Hu Tao
2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 09/14] memory controller: initialize dram controller Hu Tao
2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 10/14] pc: Add dimm paravirt SRAT info Hu Tao
2013-07-10 10:10   ` Michael S. Tsirkin
2013-07-11  5:13     ` Igor Mammedov
2013-07-11  8:49       ` Michael S. Tsirkin
2013-07-12  1:33         ` Hu Tao
2013-07-14  5:47           ` Michael S. Tsirkin
2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 11/14] Introduce paravirt interface QEMU_CFG_PCI_WINDOW Hu Tao
2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 12/14] Implement "info memory" and "query-memory" Hu Tao
2013-06-28 20:27   ` Eric Blake
2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 13/14] balloon: update with hotplugged memory Hu Tao
2013-06-26  9:13 ` [Qemu-devel] [PATCH v5 14/14] Implement dimm-info Hu Tao
2013-06-28 20:28   ` Eric Blake
2013-06-26  9:14 ` [Qemu-devel] [PATCH v5 0/7] support for ACPI memory hotplug Hu Tao
2013-06-26  9:14   ` [Qemu-devel] [PATCH v5 1/7] Add ACPI_EXTRACT_DEVICE* macros Hu Tao
2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 2/7] Add SSDT memory device support Hu Tao
2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 3/7] acpi-dsdt: Implement functions for memory hotplug Hu Tao
2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 4/7] set psize to 0 when romfile_loadfile failed Hu Tao
2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 5/7] acpi: generate hotplug memory devices Hu Tao
2013-07-12 10:07     ` Igor Mammedov
2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 6/7] q35: Add memory hotplug handler Hu Tao
2013-06-26  9:15   ` [Qemu-devel] [PATCH v5 7/7] pci: Use paravirt interface for pcimem_start and pcimem64_start Hu Tao
2013-07-15 20:11     ` Vasilis Liaskovitis
2013-07-07  8:36   ` [Qemu-devel] [PATCH v5 0/7] support for ACPI memory hotplug Michael S. Tsirkin
2013-07-08  9:48 ` [Qemu-devel] [PATCH v5 00/14] " Andreas Färber
2013-07-12  1:30   ` Hu Tao
2013-07-14 16:56   ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.