All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug
@ 2013-07-23 16:22 Igor Mammedov
  2013-07-23 16:22 ` [Qemu-devel] [PATCH 01/16] pc: use pci_hole64 info consistently Igor Mammedov
                   ` (18 more replies)
  0 siblings, 19 replies; 48+ messages in thread
From: Igor Mammedov @ 2013-07-23 16:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: vasilis.liaskovitis, hutao, pbonzini

As opposed to previous approach,
This series allows to hotplug 'arbitrary' DIMM devices specifying size,
NUMA node mapping, slot and address where to map it, at runtime.

Due to ACPI limitation there is need to specify a number of possible
DIMM devices. For this task -m option was extended to support
following format:

  -m [mem=]RamSize[,slots=N,maxmem=M]

To allow memory hotplug user must specify a pair additional parameters:
    'slots' - number of possible increments
    'maxmem' - max possible total memory size QEMU is allowed to use,
               including RamSize.

minimal monitor command syntax to hotplug DIMM device:

  device_add dimm,id=dimmX

DIMM device provides following properties that could be used with
device_add / -device to alter default behavior:

  id    - unique string identifying device [mandatory]
  slot  - number in range [0-slots) [optional], if not specified
          the first free slot is used
  node  - NUMA node id [optional] (default: 0)
  size  - amount of memory to add [optional] (default: 1Gb)
  start - guest's physical address where to plug DIMM [optional],
          if not specified the first gap in hotplug memory region
          that fits DIMM is used

 -device option could be used for adding potentially hotunplugable DIMMs
and also for specifying hotplugged DIMMs in migration case (not tested).

Current implementation supports only x86-64 variant and places hotplug
memory region above 4Gb before 64-bit PCI hole.

Tested guests:
 - Fedora 19x64
 - Windows 2012DCx64
 - Windows 2008DCx64

Known limitations/bugs/TODOs:
 - only hot-add supported
 - q35 is not supported yet
 - max number of supported DIMM devices 255 (due to ACPI object name
   limit), could be increased creating several containers and putting
   DIMMs there. (exercise for future) 
 - failed hotplug action consumes 1 slot (device_add doesn't delete
   device if realize failed)
 - e820 table doesn't include DIMM devices added with -device /
   (or after reboot devices added with device_add)
 - Windows 2008 remembers DIMM configuration, so if DIMM with other
   start/size is added into the same slot, it refuses to use it insisting
   on old mapping.

Series is based on mst's acpi tables branch and requires corresponding
seabios tree:
  git://git.kernel.org/pub/scm/virt/kvm/mst/seabios.git acpi

Git tree for testing available at:
  https://github.com/imammedo/qemu/commits/memhp-v6

Example QEMU cmd line:
  qemu-system-x86_64 -enable-kvm -monitor unix:/tmp/mon,server,nowait \ 
     -m 4096,slots=4,maxmem=8G -L /custome_seabios guest.img

PS:
  Windows guest requires SRAT table for hotplug to work so add extra option:
   -numa node
  to QEMU command line.

Igor Mammedov (13):
  pc: use pci_hole64 info consistently
  vl: set default ram_size during variable initialization
  vl: convert -m to qemu_opts_parse()
  dimm: map DimmDevice into DimBus provided address space
  pc: piix: make hotplug memory gap in high memory
  pc: i440fx: add DimmBus to chipset and map it into hotplug memory
    region
  dimm: add busy slot check and slot auto-allocation
  dimm: add busy address check and address auto-allocation
  dimm: introduce memory added notifier
  acpi/piix4: introduce memory hot-plug interface QEMU<->ACPI BIOS
  pc: ACPI BIOS: implement memory hotplug interface
  pc: update acpi-dsdt.hex.generated and add ssdt-mem.hex.generated
  pc: ACPI BIOS: reserve SRAT entry for hotplug mem hole

Vasilis Liaskovitis (3):
  qapi: make visit_type_size fallback to type_int
  qdev: Add SIZE type to qdev properties
  dimm: implement dimm device abstraction

 default-configs/x86_64-softmmu.mak |    1 +
 docs/specs/acpi_mem_hotplug.txt    |   38 +++
 hw/Makefile.objs                   |    1 +
 hw/acpi/piix4.c                    |  170 ++++++++++-
 hw/core/qdev-properties.c          |   55 +++
 hw/i386/Makefile.objs              |    2 +-
 hw/i386/acpi-build.c               |   62 +++-
 hw/i386/acpi-dsdt-mem-hotplug.dsl  |  144 ++++++++
 hw/i386/acpi-dsdt.dsl              |    5 +-
 hw/i386/acpi-dsdt.hex.generated    |  633 +++++++++++++++++++++++++++++++++++-
 hw/i386/pc.c                       |   10 +-
 hw/i386/pc_piix.c                  |    5 +-
 hw/i386/ssdt-mem.dsl               |   76 +++++
 hw/i386/ssdt-mem.hex.generated     |  161 +++++++++
 hw/mem-hotplug/Makefile.objs       |    1 +
 hw/mem-hotplug/dimm.c              |  244 ++++++++++++++
 hw/pci-host/piix.c                 |   29 ++-
 hw/pci-host/q35.c                  |    7 +-
 include/hw/i386/pc.h               |    4 +-
 include/hw/mem-hotplug/dimm.h      |   91 +++++
 include/hw/qdev-properties.h       |    3 +
 include/qemu/option.h              |    2 +
 include/sysemu/sysemu.h            |    3 +
 qapi/qapi-visit-core.c             |   11 +-
 qemu-options.hx                    |    9 +-
 stubs/Makefile.objs                |    1 +
 stubs/memory_hotplug.c             |    6 +
 util/qemu-option.c                 |    2 +-
 vl.c                               |   60 +++-
 29 files changed, 1782 insertions(+), 54 deletions(-)
 create mode 100644 docs/specs/acpi_mem_hotplug.txt
 create mode 100644 hw/i386/acpi-dsdt-mem-hotplug.dsl
 create mode 100644 hw/i386/ssdt-mem.dsl
 create mode 100644 hw/i386/ssdt-mem.hex.generated
 create mode 100644 hw/mem-hotplug/Makefile.objs
 create mode 100644 hw/mem-hotplug/dimm.c
 create mode 100644 include/hw/mem-hotplug/dimm.h
 create mode 100644 stubs/memory_hotplug.c

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 01/16] pc: use pci_hole64 info consistently
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
@ 2013-07-23 16:22 ` Igor Mammedov
  2013-07-23 16:22 ` [Qemu-devel] [PATCH 02/16] vl: set default ram_size during variable initialization Igor Mammedov
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Igor Mammedov @ 2013-07-23 16:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: vasilis.liaskovitis, hutao, pbonzini

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
patch causes false positive checkpatch error:

ERROR: need consistent spacing around '*' (ctx:WxV)
+                                  PcGuestInfo *guest_info,

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/i386/pc_piix.c    |    5 +----
 hw/pci-host/piix.c   |   17 ++++++++---------
 hw/pci-host/q35.c    |    7 ++++---
 include/hw/i386/pc.h |    3 +--
 4 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 77e4260..c8afe5e 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -168,10 +168,7 @@ static void pc_init1(MemoryRegion *system_memory,
                               system_memory, system_io, ram_size,
                               below_4g_mem_size,
                               0x100000000ULL - below_4g_mem_size,
-                              0x100000000ULL + above_4g_mem_size,
-                              (sizeof(hwaddr) == 4
-                               ? 0
-                               : ((uint64_t)1 << 62)),
+                              guest_info,
                               pci_memory, ram_memory);
     } else {
         pci_bus = NULL;
diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index c36e725..26bb844 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -233,8 +233,7 @@ static PCIBus *i440fx_common_init(const char *device_name,
                                   ram_addr_t ram_size,
                                   hwaddr pci_hole_start,
                                   hwaddr pci_hole_size,
-                                  hwaddr pci_hole64_start,
-                                  hwaddr pci_hole64_size,
+                                  PcGuestInfo *guest_info,
                                   MemoryRegion *pci_address_space,
                                   MemoryRegion *ram_memory)
 {
@@ -244,6 +243,7 @@ static PCIBus *i440fx_common_init(const char *device_name,
     PCIHostState *s;
     PIIX3State *piix3;
     PCII440FXState *f;
+    Range *pci_hole64 = &guest_info->pci_info.w64;
     unsigned i;
 
     dev = qdev_create(NULL, "i440FX-pcihost");
@@ -264,10 +264,10 @@ static PCIBus *i440fx_common_init(const char *device_name,
                              pci_hole_start, pci_hole_size);
     memory_region_add_subregion(f->system_memory, pci_hole_start, &f->pci_hole);
     memory_region_init_alias(&f->pci_hole_64bit, "pci-hole64",
-                             f->pci_address_space,
-                             pci_hole64_start, pci_hole64_size);
-    if (pci_hole64_size) {
-        memory_region_add_subregion(f->system_memory, pci_hole64_start,
+                             f->pci_address_space, pci_hole64->begin,
+                             pci_hole64->end - pci_hole64->begin);
+    if (pci_hole64->end > pci_hole64->begin) {
+        memory_region_add_subregion(f->system_memory, pci_hole64->begin,
                                     &f->pci_hole_64bit);
     }
     memory_region_init_alias(&f->smram_region, "smram-region",
@@ -321,8 +321,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix3_devfn,
                     ram_addr_t ram_size,
                     hwaddr pci_hole_start,
                     hwaddr pci_hole_size,
-                    hwaddr pci_hole64_start,
-                    hwaddr pci_hole64_size,
+                    PcGuestInfo *guest_info,
                     MemoryRegion *pci_memory, MemoryRegion *ram_memory)
 
 {
@@ -332,7 +331,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix3_devfn,
                            piix3_devfn, isa_bus, pic,
                            address_space_mem, address_space_io, ram_size,
                            pci_hole_start, pci_hole_size,
-                           pci_hole64_start, pci_hole64_size,
+                           guest_info,
                            pci_memory, ram_memory);
     return b;
 }
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 667bd20..237b8ab 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -252,6 +252,7 @@ static int mch_init(PCIDevice *d)
     int i;
     hwaddr pci_hole64_size;
     MCHPCIState *mch = MCH_PCI_DEVICE(d);
+    Range *pci_hole64 = &mch->guest_info->pci_info.w64;
 
     /* Leave enough space for the biggest MCFG BAR */
     /* TODO: this matches current bios behaviour, but
@@ -270,14 +271,14 @@ static int mch_init(PCIDevice *d)
     memory_region_add_subregion(mch->system_memory, mch->below_4g_mem_size,
                                 &mch->pci_hole);
     pci_hole64_size = (sizeof(hwaddr) == 4 ? 0 :
-                       ((uint64_t)1 << 62));
+                       pci_hole64->end - pci_hole64->begin);
     memory_region_init_alias(&mch->pci_hole_64bit, "pci-hole64",
                              mch->pci_address_space,
-                             0x100000000ULL + mch->above_4g_mem_size,
+                             pci_hole64->begin,
                              pci_hole64_size);
     if (pci_hole64_size) {
         memory_region_add_subregion(mch->system_memory,
-                                    0x100000000ULL + mch->above_4g_mem_size,
+                                    pci_hole64->begin,
                                     &mch->pci_hole_64bit);
     }
     /* smram */
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 2ac97d3..7574d92 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -180,8 +180,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix_devfn,
                     ram_addr_t ram_size,
                     hwaddr pci_hole_start,
                     hwaddr pci_hole_size,
-                    hwaddr pci_hole64_start,
-                    hwaddr pci_hole64_size,
+                    PcGuestInfo *guest_info,
                     MemoryRegion *pci_memory,
                     MemoryRegion *ram_memory);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 02/16] vl: set default ram_size during variable initialization
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
  2013-07-23 16:22 ` [Qemu-devel] [PATCH 01/16] pc: use pci_hole64 info consistently Igor Mammedov
@ 2013-07-23 16:22 ` Igor Mammedov
  2013-08-02 20:33   ` Andreas Färber
  2013-07-23 16:22 ` [Qemu-devel] [PATCH 03/16] vl: convert -m to qemu_opts_parse() Igor Mammedov
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Igor Mammedov @ 2013-07-23 16:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: vasilis.liaskovitis, hutao, pbonzini

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 vl.c |    7 +------
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/vl.c b/vl.c
index 8190504..bf0c658 100644
--- a/vl.c
+++ b/vl.c
@@ -2947,7 +2947,7 @@ int main(int argc, char **argv, char **envp)
     module_call_init(MODULE_INIT_MACHINE);
     machine = find_default_machine();
     cpu_model = NULL;
-    ram_size = 0;
+    ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
     snapshot = 0;
     cyls = heads = secs = 0;
     translation = BIOS_ATA_TRANSLATION_AUTO;
@@ -4064,11 +4064,6 @@ int main(int argc, char **argv, char **envp)
         exit(1);
     }
 
-    /* init the memory */
-    if (ram_size == 0) {
-        ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
-    }
-
     if (qemu_opts_foreach(qemu_find_opts("device"), device_help_func, NULL, 0)
         != 0) {
         exit(0);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 03/16] vl: convert -m to qemu_opts_parse()
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
  2013-07-23 16:22 ` [Qemu-devel] [PATCH 01/16] pc: use pci_hole64 info consistently Igor Mammedov
  2013-07-23 16:22 ` [Qemu-devel] [PATCH 02/16] vl: set default ram_size during variable initialization Igor Mammedov
@ 2013-07-23 16:22 ` Igor Mammedov
  2013-07-23 17:11   ` Paolo Bonzini
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 04/16] qapi: make visit_type_size fallback to type_int Igor Mammedov
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Igor Mammedov @ 2013-07-23 16:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: vasilis.liaskovitis, hutao, pbonzini

Along with conversion extend -m option to support following parameters:
  "mem" - startup memory amount
  "slots" - total number of hotplug memory slots
  "maxmem" - maximum possible memory

"slots" and "maxmem" should go in pair and "maxmem" should be greater
than "mem" for memory hotplug to be usable.

v2:
  make sure maxmem is not less than ram size

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 qemu-options.hx |    9 +++++++--
 vl.c            |   53 +++++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 137a39b..f799b3d 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -210,8 +210,13 @@ use is discouraged as it may be removed from future versions.
 ETEXI
 
 DEF("m", HAS_ARG, QEMU_OPTION_m,
-    "-m megs         set virtual RAM size to megs MB [default="
-    stringify(DEFAULT_RAM_SIZE) "]\n", QEMU_ARCH_ALL)
+    "-m [mem=]megs[,slots=n,maxmem=size]\n"
+    "                set virtual RAM size to megs MB [default="
+    stringify(DEFAULT_RAM_SIZE) "]\n"
+    "                mem=start-up memory amount\n"
+    "                slots=maximum number of hotplug slots\n"
+    "                maxmem=maximum total amount of memory\n",
+    QEMU_ARCH_ALL)
 STEXI
 @item -m @var{megs}
 @findex -m
diff --git a/vl.c b/vl.c
index bf0c658..16c6f1e 100644
--- a/vl.c
+++ b/vl.c
@@ -516,6 +516,27 @@ static QemuOptsList qemu_realtime_opts = {
     },
 };
 
+static QemuOptsList qemu_mem_opts = {
+    .name = "memory-opts",
+    .implied_opt_name = "mem",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_mem_opts.head),
+    .desc = {
+        {
+            .name = "mem",
+            .type = QEMU_OPT_SIZE,
+        },
+        {
+            .name = "slots",
+            .type = QEMU_OPT_NUMBER,
+        },
+        {
+            .name = "maxmem",
+            .type = QEMU_OPT_SIZE,
+        },
+        { /* end of list */ }
+    },
+};
+
 const char *qemu_get_vm_name(void)
 {
     return qemu_name;
@@ -2933,6 +2954,7 @@ int main(int argc, char **argv, char **envp)
     qemu_add_opts(&qemu_object_opts);
     qemu_add_opts(&qemu_tpmdev_opts);
     qemu_add_opts(&qemu_realtime_opts);
+    qemu_add_opts(&qemu_mem_opts);
 
     runstate_init();
 
@@ -3224,21 +3246,40 @@ int main(int argc, char **argv, char **envp)
                 exit(0);
                 break;
             case QEMU_OPTION_m: {
-                int64_t value;
                 uint64_t sz;
-                char *end;
+                const char *end;
+                char *s;
 
-                value = strtosz(optarg, &end);
-                if (value < 0 || *end) {
-                    fprintf(stderr, "qemu: invalid ram size: %s\n", optarg);
+                opts = qemu_opts_parse(qemu_find_opts("memory-opts"),
+                                       optarg, 1);
+                if (!opts) {
                     exit(1);
                 }
-                sz = QEMU_ALIGN_UP((uint64_t)value, 8192);
+
+                /* fixup legacy sugffix-less format */
+                end = qemu_opt_get(opts, "mem");
+                if (g_ascii_isdigit(end[strlen(end) - 1])) {
+                    s = g_strconcat(end, "M", NULL);
+                    qemu_opt_set(opts, "mem", s);
+                    g_free(s);
+                }
+
+                sz = QEMU_ALIGN_UP(qemu_opt_get_size(opts, "mem", 0), 8192);
                 ram_size = sz;
                 if (ram_size != sz) {
                     fprintf(stderr, "qemu: ram size too large\n");
                     exit(1);
                 }
+                /* store aligned value for future use */
+                s = g_strdup_printf("%" PRIu64, sz);
+                qemu_opt_set(opts, "mem", s);
+                g_free(s);
+
+                sz = qemu_opt_get_size(opts, "maxmem", ram_size);
+                if (sz < ram_size) {
+                    fprintf(stderr, "qemu: maxmem must be > initial memory\n");
+                    exit(1);
+                }
                 break;
             }
 #ifdef CONFIG_TPM
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 04/16] qapi: make visit_type_size fallback to type_int
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
                   ` (2 preceding siblings ...)
  2013-07-23 16:22 ` [Qemu-devel] [PATCH 03/16] vl: convert -m to qemu_opts_parse() Igor Mammedov
@ 2013-07-23 16:23 ` Igor Mammedov
  2013-07-25  6:41   ` Hu Tao
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 05/16] qdev: Add SIZE type to qdev properties Igor Mammedov
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Igor Mammedov @ 2013-07-23 16:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: vasilis.liaskovitis, hutao, pbonzini

From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

Currently visit_type_size checks if the visitor's type_size function pointer is
NULL. If not, it calls it, otherwise it calls v->type_uint64(). But neither of
these pointers are ever set. Fallback to calling v->type_int() in this third
(default) case.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 qapi/qapi-visit-core.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c
index 401ee6e..fcacaff 100644
--- a/qapi/qapi-visit-core.c
+++ b/qapi/qapi-visit-core.c
@@ -238,8 +238,17 @@ void visit_type_int64(Visitor *v, int64_t *obj, const char *name, Error **errp)
 
 void visit_type_size(Visitor *v, uint64_t *obj, const char *name, Error **errp)
 {
+    int64_t value;
     if (!error_is_set(errp)) {
-        (v->type_size ? v->type_size : v->type_uint64)(v, obj, name, errp);
+        if (v->type_size) {
+            v->type_size(v, obj, name, errp);
+        } else if (v->type_uint64) {
+            v->type_uint64(v, obj, name, errp);
+        } else {
+            value = *obj;
+            v->type_int(v, &value, name, errp);
+            *obj = value;
+        }
     }
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 05/16] qdev: Add SIZE type to qdev properties
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
                   ` (3 preceding siblings ...)
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 04/16] qapi: make visit_type_size fallback to type_int Igor Mammedov
@ 2013-07-23 16:23 ` Igor Mammedov
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 06/16] dimm: implement dimm device abstraction Igor Mammedov
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Igor Mammedov @ 2013-07-23 16:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: vasilis.liaskovitis, hutao, pbonzini

From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

This patch adds a 'SIZE' type property to qdev.

It will make dimm description more convenient by allowing sizes to be specified
with K,M,G,T prefixes instead of number of bytes e.g.:
-device dimm,id=mem0,size=2G,bus=membus.0

Signed-off-by: Ian Molton <ian.molton@collabora.co.uk>
Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/core/qdev-properties.c    |   55 ++++++++++++++++++++++++++++++++++++++++++
 include/hw/qdev-properties.h |    3 ++
 include/qemu/option.h        |    2 +
 util/qemu-option.c           |    2 +-
 4 files changed, 61 insertions(+), 1 deletions(-)

diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 3a324fb..f427baf 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -1134,3 +1134,58 @@ void qdev_prop_set_globals(DeviceState *dev, Error **errp)
         class = object_class_get_parent(class);
     } while (class);
 }
+
+/* --- 64bit unsigned int 'size' type --- */
+
+static void get_size(Object *obj, Visitor *v, void *opaque,
+                       const char *name, Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    Property *prop = opaque;
+    uint64_t *ptr = qdev_get_prop_ptr(dev, prop);
+
+    visit_type_size(v, ptr, name, errp);
+}
+
+static void set_size(Object *obj, Visitor *v, void *opaque,
+                       const char *name, Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    Property *prop = opaque;
+    uint64_t *ptr = qdev_get_prop_ptr(dev, prop);
+
+    visit_type_size(v, ptr, name, errp);
+}
+
+static int parse_size(DeviceState *dev, Property *prop, const char *str)
+{
+    uint64_t *ptr = qdev_get_prop_ptr(dev, prop);
+    Error *errp = NULL;
+
+    if (str != NULL) {
+        parse_option_size(prop->name, str, ptr, &errp);
+    }
+    assert_no_error(errp);
+    return 0;
+}
+
+static int print_size(DeviceState *dev, Property *prop, char *dest, size_t len)
+{
+    uint64_t *ptr = qdev_get_prop_ptr(dev, prop);
+    char suffixes[] = {'T', 'G', 'M', 'K', 'B'};
+    int i = 0;
+    uint64_t div;
+
+    for (div = (long int)1 << 40; !(*ptr / div) ; div >>= 10) {
+        i++;
+    }
+    return snprintf(dest, len, "%0.03f%c", (double)*ptr/div, suffixes[i]);
+}
+
+PropertyInfo qdev_prop_size = {
+    .name  = "size",
+    .parse = parse_size,
+    .print = print_size,
+    .get = get_size,
+    .set = set_size,
+};
diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 39448b7..692f82e 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -15,6 +15,7 @@ extern PropertyInfo qdev_prop_uint64;
 extern PropertyInfo qdev_prop_hex8;
 extern PropertyInfo qdev_prop_hex32;
 extern PropertyInfo qdev_prop_hex64;
+extern PropertyInfo qdev_prop_size;
 extern PropertyInfo qdev_prop_string;
 extern PropertyInfo qdev_prop_chr;
 extern PropertyInfo qdev_prop_ptr;
@@ -116,6 +117,8 @@ extern PropertyInfo qdev_prop_arraylen;
     DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_hex32, uint32_t)
 #define DEFINE_PROP_HEX64(_n, _s, _f, _d)                       \
     DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_hex64, uint64_t)
+#define DEFINE_PROP_SIZE(_n, _s, _f, _d)                       \
+    DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_size, uint64_t)
 #define DEFINE_PROP_PCI_DEVFN(_n, _s, _f, _d)                   \
     DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_pci_devfn, int32_t)
 
diff --git a/include/qemu/option.h b/include/qemu/option.h
index a83c700..479ae5f 100644
--- a/include/qemu/option.h
+++ b/include/qemu/option.h
@@ -73,6 +73,8 @@ QEMUOptionParameter *append_option_parameters(QEMUOptionParameter *dest,
     QEMUOptionParameter *list);
 QEMUOptionParameter *parse_option_parameters(const char *param,
     QEMUOptionParameter *list, QEMUOptionParameter *dest);
+void parse_option_size(const char *name, const char *value,
+                              uint64_t *ret, Error **errp);
 void free_option_parameters(QEMUOptionParameter *list);
 void print_option_parameters(QEMUOptionParameter *list);
 void print_option_help(QEMUOptionParameter *list);
diff --git a/util/qemu-option.c b/util/qemu-option.c
index 412c425..5f10ab5 100644
--- a/util/qemu-option.c
+++ b/util/qemu-option.c
@@ -173,7 +173,7 @@ static void parse_option_number(const char *name, const char *value,
     }
 }
 
-static void parse_option_size(const char *name, const char *value,
+void parse_option_size(const char *name, const char *value,
                               uint64_t *ret, Error **errp)
 {
     char *postfix;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 06/16] dimm: implement dimm device abstraction
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
                   ` (4 preceding siblings ...)
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 05/16] qdev: Add SIZE type to qdev properties Igor Mammedov
@ 2013-07-23 16:23 ` Igor Mammedov
  2013-07-25  6:52   ` Hu Tao
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 07/16] dimm: map DimmDevice into DimBus provided address space Igor Mammedov
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Igor Mammedov @ 2013-07-23 16:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: vasilis.liaskovitis, hutao, pbonzini

From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

Each hotplug-able memory slot is a DimmDevice. All DimmDevices are
attached to a new bus called DimmBus.

A hot-add operation for a DIMM:
- creates a new DimmDevice and attaches it to the DimmBus

Hotplug operations are done through normal device_add commands.
For migration case, all hotplugged DIMMs on source should be specified
on target's command line using '-device' option with properties set to
the same values as on target.

To simplify review, patch introduces only DimmDevice and DimmBus basic
QOM skeleton that will be extended by following patches to implement
actual memory hotplug and related functions.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 default-configs/x86_64-softmmu.mak |    1 +
 hw/Makefile.objs                   |    1 +
 hw/mem-hotplug/Makefile.objs       |    1 +
 hw/mem-hotplug/dimm.c              |   97 ++++++++++++++++++++++++++++++++++++
 include/hw/mem-hotplug/dimm.h      |   66 ++++++++++++++++++++++++
 5 files changed, 166 insertions(+), 0 deletions(-)
 create mode 100644 hw/mem-hotplug/Makefile.objs
 create mode 100644 hw/mem-hotplug/dimm.c
 create mode 100644 include/hw/mem-hotplug/dimm.h

diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
index 10bb0c6..59229ea 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -45,3 +45,4 @@ CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_ICC_BUS=y
 CONFIG_PVPANIC=y
+CONFIG_MEM_HOTPLUG=y
diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 0243d6a..6d3dc73 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -27,6 +27,7 @@ devices-dirs-$(CONFIG_SOFTMMU) += usb/
 devices-dirs-$(CONFIG_VIRTIO) += virtio/
 devices-dirs-$(CONFIG_SOFTMMU) += watchdog/
 devices-dirs-$(CONFIG_SOFTMMU) += xen/
+devices-dirs-$(CONFIG_MEM_HOTPLUG) += mem-hotplug/
 devices-dirs-y += core/
 common-obj-y += $(devices-dirs-y)
 obj-y += $(devices-dirs-y)
diff --git a/hw/mem-hotplug/Makefile.objs b/hw/mem-hotplug/Makefile.objs
new file mode 100644
index 0000000..7563ef5
--- /dev/null
+++ b/hw/mem-hotplug/Makefile.objs
@@ -0,0 +1 @@
+common-obj-$(CONFIG_MEM_HOTPLUG) += dimm.o
diff --git a/hw/mem-hotplug/dimm.c b/hw/mem-hotplug/dimm.c
new file mode 100644
index 0000000..0a337a5
--- /dev/null
+++ b/hw/mem-hotplug/dimm.c
@@ -0,0 +1,97 @@
+/*
+ * Dimm device for Memory Hotplug
+ *
+ * Copyright ProfitBricks GmbH 2012
+ * Copyright (C) 2013 Red Hat Inc
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "hw/mem-hotplug/dimm.h"
+#include "qemu/config-file.h"
+
+static void dimm_bus_initfn(Object *obj)
+{
+    BusState *b = BUS(obj);
+
+    b->allow_hotplug = true;
+}
+static void dimm_bus_class_init(ObjectClass *klass, void *data)
+{
+    BusClass *bc = BUS_CLASS(klass);
+    QemuOpts *opts = qemu_opts_find(qemu_find_opts("memory-opts"), NULL);
+
+    if (opts) {
+        bc->max_dev = qemu_opt_get_number(opts, "slots", 0);
+    }
+}
+
+static const TypeInfo dimm_bus_info = {
+    .name = TYPE_DIMM_BUS,
+    .parent = TYPE_BUS,
+    .instance_init = dimm_bus_initfn,
+    .instance_size = sizeof(DimmBus),
+    .class_init = dimm_bus_class_init,
+};
+
+static Property dimm_properties[] = {
+    DEFINE_PROP_UINT64("start", DimmDevice, start, 0),
+    DEFINE_PROP_SIZE("size", DimmDevice, size, DEFAULT_DIMMSIZE),
+    DEFINE_PROP_UINT32("node", DimmDevice, node, 0),
+    DEFINE_PROP_INT32("slot", DimmDevice, slot, 0),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void dimm_realize(DeviceState *dev, Error **errp)
+{
+    DimmDevice *dimm = DIMM(dev);
+    DimmBus *bus = DIMM_BUS(qdev_get_parent_bus(dev));
+    BusClass *bc = BUS_GET_CLASS(bus);
+
+    if (!dev->id) {
+        error_setg(errp, "missing 'id' property");
+        return;
+    }
+
+    if (dimm->slot >= bc->max_dev) {
+        error_setg(errp, "maximum allowed slot is: %d", bc->max_dev - 1);
+        return;
+    }
+
+    memory_region_init_ram(&dimm->mr, dev->id, dimm->size);
+}
+
+static void dimm_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->realize = dimm_realize;
+    dc->props = dimm_properties;
+    dc->bus_type = TYPE_DIMM_BUS;
+}
+
+static TypeInfo dimm_info = {
+    .name          = TYPE_DIMM,
+    .parent        = TYPE_DEVICE,
+    .instance_size = sizeof(DimmDevice),
+    .class_init    = dimm_class_init,
+};
+
+static void dimm_register_types(void)
+{
+    type_register_static(&dimm_bus_info);
+    type_register_static(&dimm_info);
+}
+
+type_init(dimm_register_types)
diff --git a/include/hw/mem-hotplug/dimm.h b/include/hw/mem-hotplug/dimm.h
new file mode 100644
index 0000000..c83ad26
--- /dev/null
+++ b/include/hw/mem-hotplug/dimm.h
@@ -0,0 +1,66 @@
+/*
+ * DIMM device
+ *
+ * Copyright ProfitBricks GmbH 2012
+ * Copyright (C) 2013 Red Hat Inc
+ *
+ * Authors:
+ *  Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
+ *  Igor Mammedov <imammedo@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_DIMM_H
+#define QEMU_DIMM_H
+
+#include "exec/memory.h"
+#include "hw/qdev.h"
+
+#define DEFAULT_DIMMSIZE (1024*1024*1024)
+
+#define TYPE_DIMM "dimm"
+#define DIMM(obj) \
+    OBJECT_CHECK(DimmDevice, (obj), TYPE_DIMM)
+#define DIMM_CLASS(klass) \
+    OBJECT_CLASS_CHECK(DimmDeviceClass, (klass), TYPE_DIMM)
+#define DIMM_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(DimmDeviceClass, (obj), TYPE_DIMM)
+
+/**
+ * DimmBus:
+ * @start: starting physical address, where @DimmDevice is mapped.
+ * @size: amount of memory mapped at @start.
+ * @node: numa node to which @DimmDevice is attached.
+ * @slot: slot number into which @DimmDevice is plugged in.
+ */
+typedef struct DimmDevice {
+    DeviceState qdev;
+    ram_addr_t start;
+    ram_addr_t size;
+    uint32_t node;
+    int32_t slot;
+    MemoryRegion mr;
+} DimmDevice;
+
+typedef struct DimmDeviceClass {
+    DeviceClass parent_class;
+} DimmDeviceClass;
+
+#define TYPE_DIMM_BUS "dimmbus"
+#define DIMM_BUS(obj) OBJECT_CHECK(DimmBus, (obj), TYPE_DIMM_BUS)
+#define DIMM_BUS_CLASS(klass) \
+    OBJECT_CLASS_CHECK(DimmBusClass, (klass), TYPE_DIMM_BUS)
+#define DIMM_BUS_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(DimmBusClass, (obj), TYPE_DIMM_BUS)
+
+/**
+ * DimmBus:
+ */
+typedef struct DimmBus {
+    BusState qbus;
+} DimmBus;
+
+#endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 07/16] dimm: map DimmDevice into DimBus provided address space
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
                   ` (5 preceding siblings ...)
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 06/16] dimm: implement dimm device abstraction Igor Mammedov
@ 2013-07-23 16:23 ` Igor Mammedov
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 08/16] pc: piix: make hotplug memory gap in high memory Igor Mammedov
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Igor Mammedov @ 2013-07-23 16:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: vasilis.liaskovitis, hutao, pbonzini

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/mem-hotplug/dimm.c         |   14 ++++++++++++++
 include/hw/mem-hotplug/dimm.h |   14 ++++++++++++++
 2 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/hw/mem-hotplug/dimm.c b/hw/mem-hotplug/dimm.c
index 0a337a5..63c2c8e 100644
--- a/hw/mem-hotplug/dimm.c
+++ b/hw/mem-hotplug/dimm.c
@@ -27,14 +27,23 @@ static void dimm_bus_initfn(Object *obj)
 
     b->allow_hotplug = true;
 }
+static void dimm_bus_register_memory(DimmBus *bus, DimmDevice *dimm,
+                                     Error **errp)
+{
+    memory_region_add_subregion(&bus->as, dimm->start - bus->base, &dimm->mr);
+    vmstate_register_ram_global(&dimm->mr);
+}
+
 static void dimm_bus_class_init(ObjectClass *klass, void *data)
 {
     BusClass *bc = BUS_CLASS(klass);
+    DimmBusClass *dc = DIMM_BUS_CLASS(klass);
     QemuOpts *opts = qemu_opts_find(qemu_find_opts("memory-opts"), NULL);
 
     if (opts) {
         bc->max_dev = qemu_opt_get_number(opts, "slots", 0);
     }
+    dc->register_memory = dimm_bus_register_memory;
 }
 
 static const TypeInfo dimm_bus_info = {
@@ -43,6 +52,7 @@ static const TypeInfo dimm_bus_info = {
     .instance_init = dimm_bus_initfn,
     .instance_size = sizeof(DimmBus),
     .class_init = dimm_bus_class_init,
+    .class_size = sizeof(DimmBusClass),
 };
 
 static Property dimm_properties[] = {
@@ -58,6 +68,7 @@ static void dimm_realize(DeviceState *dev, Error **errp)
     DimmDevice *dimm = DIMM(dev);
     DimmBus *bus = DIMM_BUS(qdev_get_parent_bus(dev));
     BusClass *bc = BUS_GET_CLASS(bus);
+    DimmBusClass *dc = DIMM_BUS_GET_CLASS(bus);
 
     if (!dev->id) {
         error_setg(errp, "missing 'id' property");
@@ -70,6 +81,9 @@ static void dimm_realize(DeviceState *dev, Error **errp)
     }
 
     memory_region_init_ram(&dimm->mr, dev->id, dimm->size);
+
+    g_assert(dc->register_memory);
+    dc->register_memory(bus, dimm, errp);
 }
 
 static void dimm_class_init(ObjectClass *klass, void *data)
diff --git a/include/hw/mem-hotplug/dimm.h b/include/hw/mem-hotplug/dimm.h
index c83ad26..84d6ba6 100644
--- a/include/hw/mem-hotplug/dimm.h
+++ b/include/hw/mem-hotplug/dimm.h
@@ -58,9 +58,23 @@ typedef struct DimmDeviceClass {
 
 /**
  * DimmBus:
+ * @base: address from which to start mapping @DimmDevice
+ * @as: hot-plugabble memory area where @DimmDevice-s are attached
  */
 typedef struct DimmBus {
     BusState qbus;
+    hwaddr base;
+    MemoryRegion as;
 } DimmBus;
 
+/**
+ * DimmBusClass:
+ * @register_memory: map @DimmDevice into hot-plugable address space
+ */
+typedef struct DimmBusClass {
+    BusClass parent_class;
+
+    void (*register_memory)(DimmBus *bus, DimmDevice *dimm, Error **errp);
+} DimmBusClass;
+
 #endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 08/16] pc: piix: make hotplug memory gap in high memory
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
                   ` (6 preceding siblings ...)
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 07/16] dimm: map DimmDevice into DimBus provided address space Igor Mammedov
@ 2013-07-23 16:23 ` Igor Mammedov
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 09/16] pc: i440fx: add DimmBus to chipset and map it into hotplug memory region Igor Mammedov
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Igor Mammedov @ 2013-07-23 16:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: vasilis.liaskovitis, hutao, pbonzini

reserves memory region above 4Gb ram region and upto maxmem
before 64-bit PCI hole.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/i386/pc.c         |   10 +++++++++-
 include/hw/i386/pc.h |    1 +
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 6e2c4d9..d3ccc79 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1054,8 +1054,12 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size,
 {
     PcGuestInfoState *guest_info_state = g_malloc0(sizeof *guest_info_state);
     PcGuestInfo *guest_info = &guest_info_state->info;
+    QemuOpts *opts = qemu_opts_find(qemu_find_opts("memory-opts"), NULL);
+    ram_addr_t maxmem = (opts ? qemu_opt_get_size(opts, "maxmem", ram_size) :
+                         ram_size);
 
     guest_info->ram_size = below_4g_mem_size + above_4g_mem_size;
+
     guest_info->apic_id_limit = pc_apic_id_limit(max_cpus);
     guest_info->apic_xrupt_override = kvm_allows_irq0_override();
     guest_info->numa_nodes = nb_numa_nodes;
@@ -1072,6 +1076,10 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size,
         guest_info->pci_info.w64.begin = 0;
         guest_info->pci_info.w64.end = 0;
     } else {
+        guest_info->hotplug_mem_win.begin = ROUND_UP((0x1ULL << 32) +
+            above_4g_mem_size, 0x1ULL << 30);
+        guest_info->hotplug_mem_win.end = guest_info->hotplug_mem_win.begin +
+            (maxmem - guest_info->ram_size);
         /*
          * BIOS does not set MTRR entries for the 64 bit window, so no need to
          * align address to power of two.  Align address at 1G, this makes sure
@@ -1079,7 +1087,7 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size,
          * pages.
          */
         guest_info->pci_info.w64.begin =
-            ROUND_UP((0x1ULL << 32) + above_4g_mem_size, 0x1ULL << 30);
+            ROUND_UP(guest_info->hotplug_mem_win.end, 0x1ULL << 30);
         guest_info->pci_info.w64.end = guest_info->pci_info.w64.begin +
             (0x1ULL << 31);
         assert(guest_info->pci_info.w64.begin <= guest_info->pci_info.w64.end);
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 7574d92..388d803 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -28,6 +28,7 @@ struct PcGuestInfo {
     PcPciInfo pci_info;
     bool has_pci_info;
     hwaddr ram_size;
+    Range hotplug_mem_win;
     unsigned apic_id_limit;
     bool apic_xrupt_override;
     bool has_hpet;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 09/16] pc: i440fx: add DimmBus to chipset and map it into hotplug memory region
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
                   ` (7 preceding siblings ...)
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 08/16] pc: piix: make hotplug memory gap in high memory Igor Mammedov
@ 2013-07-23 16:23 ` Igor Mammedov
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation Igor Mammedov
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Igor Mammedov @ 2013-07-23 16:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: vasilis.liaskovitis, hutao, pbonzini

it makes hot-plugged DimmDevices visible in guest address space.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/pci-host/piix.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 26bb844..2ac7587 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -32,6 +32,7 @@
 #include "hw/xen/xen.h"
 #include "hw/pci-host/pam.h"
 #include "sysemu/sysemu.h"
+#include "hw/mem-hotplug/dimm.h"
 
 /*
  * I440FX chipset data sheet.
@@ -96,6 +97,7 @@ struct PCII440FXState {
     PAMMemoryRegion pam_regions[13];
     MemoryRegion smram_region;
     uint8_t smm_enabled;
+    DimmBus dimm_bus;
 };
 
 
@@ -221,6 +223,8 @@ static int i440fx_initfn(PCIDevice *dev)
     d->dev.config[I440FX_SMRAM] = 0x02;
 
     cpu_smm_register(&i440fx_set_smm, d);
+
+    qbus_create_inplace(&d->dimm_bus, TYPE_DIMM_BUS, DEVICE(d), "membus");
     return 0;
 }
 
@@ -244,6 +248,7 @@ static PCIBus *i440fx_common_init(const char *device_name,
     PIIX3State *piix3;
     PCII440FXState *f;
     Range *pci_hole64 = &guest_info->pci_info.w64;
+    Range *hotplug_mem = &guest_info->hotplug_mem_win;
     unsigned i;
 
     dev = qdev_create(NULL, "i440FX-pcihost");
@@ -260,6 +265,13 @@ static PCIBus *i440fx_common_init(const char *device_name,
     f->system_memory = address_space_mem;
     f->pci_address_space = pci_address_space;
     f->ram_memory = ram_memory;
+
+    f->dimm_bus.base = hotplug_mem->begin;
+    memory_region_init(&f->dimm_bus.as, "hotplug-memory",
+                       hotplug_mem->end - hotplug_mem->begin);
+    memory_region_add_subregion(f->system_memory, f->dimm_bus.base,
+                                &f->dimm_bus.as);
+
     memory_region_init_alias(&f->pci_hole, "pci-hole", f->pci_address_space,
                              pci_hole_start, pci_hole_size);
     memory_region_add_subregion(f->system_memory, pci_hole_start, &f->pci_hole);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
                   ` (8 preceding siblings ...)
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 09/16] pc: i440fx: add DimmBus to chipset and map it into hotplug memory region Igor Mammedov
@ 2013-07-23 16:23 ` Igor Mammedov
  2013-07-23 17:09   ` Paolo Bonzini
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 11/16] dimm: add busy address check and address auto-allocation Igor Mammedov
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Igor Mammedov @ 2013-07-23 16:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: vasilis.liaskovitis, hutao, pbonzini

- if slot property is not specified on -device/device_add command,
treat default value as request for assigning DimmDevice to
the first free slot.

- if slot is provided with -device/device_add command, attempt to
use it or fail command if it's already occupied.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/mem-hotplug/dimm.c         |   51 ++++++++++++++++++++++++++++++++++++++++-
 include/hw/mem-hotplug/dimm.h |    5 ++++
 2 files changed, 55 insertions(+), 1 deletions(-)

diff --git a/hw/mem-hotplug/dimm.c b/hw/mem-hotplug/dimm.c
index 63c2c8e..c716906 100644
--- a/hw/mem-hotplug/dimm.c
+++ b/hw/mem-hotplug/dimm.c
@@ -20,6 +20,7 @@
 
 #include "hw/mem-hotplug/dimm.h"
 #include "qemu/config-file.h"
+#include "qemu/bitmap.h"
 
 static void dimm_bus_initfn(Object *obj)
 {
@@ -27,6 +28,45 @@ static void dimm_bus_initfn(Object *obj)
 
     b->allow_hotplug = true;
 }
+
+static int dimm_bus_slot2bitmap(DeviceState *dev, void *opaque)
+{
+    unsigned long *bitmap = opaque;
+    BusClass *bc = BUS_GET_CLASS(qdev_get_parent_bus(dev));
+    DimmDevice *d = DIMM(dev);
+
+    if (dev->realized) { /* count only realized DIMMs */
+        g_assert(d->slot < bc->max_dev);
+        set_bit(d->slot, bitmap);
+    }
+    return 0;
+}
+
+static int dimm_bus_get_free_slot(DimmBus *bus, const int *hint, Error **errp)
+{
+    BusClass *bc = BUS_GET_CLASS(bus);
+    unsigned long *bitmap = bitmap_new(bc->max_dev);
+    int slot = 0;
+
+    qbus_walk_children(BUS(bus), dimm_bus_slot2bitmap, NULL, bitmap);
+
+    /* check if requested slot is not occupied */
+    if (hint) {
+        if (!test_bit(*hint, bitmap)) {
+            slot = *hint;
+        } else {
+            error_setg(errp, "slot %d is busy", *hint);
+        }
+        goto out;
+    }
+
+    /* search for free slot */
+    slot = find_first_zero_bit(bitmap, bc->max_dev);
+out:
+    g_free(bitmap);
+    return slot;
+}
+
 static void dimm_bus_register_memory(DimmBus *bus, DimmDevice *dimm,
                                      Error **errp)
 {
@@ -44,6 +84,7 @@ static void dimm_bus_class_init(ObjectClass *klass, void *data)
         bc->max_dev = qemu_opt_get_number(opts, "slots", 0);
     }
     dc->register_memory = dimm_bus_register_memory;
+    dc->get_free_slot = dimm_bus_get_free_slot;
 }
 
 static const TypeInfo dimm_bus_info = {
@@ -59,7 +100,7 @@ static Property dimm_properties[] = {
     DEFINE_PROP_UINT64("start", DimmDevice, start, 0),
     DEFINE_PROP_SIZE("size", DimmDevice, size, DEFAULT_DIMMSIZE),
     DEFINE_PROP_UINT32("node", DimmDevice, node, 0),
-    DEFINE_PROP_INT32("slot", DimmDevice, slot, 0),
+    DEFINE_PROP_INT32("slot", DimmDevice, slot, -1),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -69,6 +110,7 @@ static void dimm_realize(DeviceState *dev, Error **errp)
     DimmBus *bus = DIMM_BUS(qdev_get_parent_bus(dev));
     BusClass *bc = BUS_GET_CLASS(bus);
     DimmBusClass *dc = DIMM_BUS_GET_CLASS(bus);
+    int *slot_hint;
 
     if (!dev->id) {
         error_setg(errp, "missing 'id' property");
@@ -79,6 +121,13 @@ static void dimm_realize(DeviceState *dev, Error **errp)
         error_setg(errp, "maximum allowed slot is: %d", bc->max_dev - 1);
         return;
     }
+    g_assert(dc->get_free_slot);
+    slot_hint = dimm->slot < 0 ? NULL : &dimm->slot;
+    dimm->slot = dc->get_free_slot(bus, slot_hint, errp);
+    if (error_is_set(errp)) {
+        return;
+    }
+
 
     memory_region_init_ram(&dimm->mr, dev->id, dimm->size);
 
diff --git a/include/hw/mem-hotplug/dimm.h b/include/hw/mem-hotplug/dimm.h
index 84d6ba6..d8d11a3 100644
--- a/include/hw/mem-hotplug/dimm.h
+++ b/include/hw/mem-hotplug/dimm.h
@@ -35,6 +35,7 @@
  * @size: amount of memory mapped at @start.
  * @node: numa node to which @DimmDevice is attached.
  * @slot: slot number into which @DimmDevice is plugged in.
+ * Default value: -1, means that slot is auto-allocated.
  */
 typedef struct DimmDevice {
     DeviceState qdev;
@@ -69,11 +70,15 @@ typedef struct DimmBus {
 
 /**
  * DimmBusClass:
+ * @get_free_slot: returns a not occupied slot number. If @hint is provided,
+ * it tries to return slot specified by @hint if it's not busy or returns
+ * error in @errp.
  * @register_memory: map @DimmDevice into hot-plugable address space
  */
 typedef struct DimmBusClass {
     BusClass parent_class;
 
+    int (*get_free_slot)(DimmBus *bus, const int *hint, Error **errp);
     void (*register_memory)(DimmBus *bus, DimmDevice *dimm, Error **errp);
 } DimmBusClass;
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 11/16] dimm: add busy address check and address auto-allocation
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
                   ` (9 preceding siblings ...)
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation Igor Mammedov
@ 2013-07-23 16:23 ` Igor Mammedov
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 12/16] dimm: introduce memory added notifier Igor Mammedov
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Igor Mammedov @ 2013-07-23 16:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: vasilis.liaskovitis, hutao, pbonzini

- if 'start' property is not specified on -device/device_add command,
treat default value as request for assigning DimmDevice to
the first free memory region.

- if 'start' is provided with -device/device_add command, attempt to
use it or fail command if it's already occupied or falls inside
of an existing DimmDevice memory region.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/mem-hotplug/dimm.c         |   71 +++++++++++++++++++++++++++++++++++++++++
 include/hw/mem-hotplug/dimm.h |    6 +++
 2 files changed, 77 insertions(+), 0 deletions(-)

diff --git a/hw/mem-hotplug/dimm.c b/hw/mem-hotplug/dimm.c
index c716906..7e3e383 100644
--- a/hw/mem-hotplug/dimm.c
+++ b/hw/mem-hotplug/dimm.c
@@ -21,6 +21,7 @@
 #include "hw/mem-hotplug/dimm.h"
 #include "qemu/config-file.h"
 #include "qemu/bitmap.h"
+#include "qemu/range.h"
 
 static void dimm_bus_initfn(Object *obj)
 {
@@ -67,6 +68,64 @@ out:
     return slot;
 }
 
+static gint dimm_bus_addr_sort(gconstpointer a, gconstpointer b)
+{
+    DimmDevice *x = DIMM(a);
+    DimmDevice *y = DIMM(b);
+
+    return x->start - y->start;
+}
+
+static int dimm_bus_built_dimm_list(DeviceState *dev, void *opaque)
+{
+    GSList **list = opaque;
+
+    if (dev->realized) { /* only realized DIMMs matter */
+        *list = g_slist_insert_sorted(*list, dev, dimm_bus_addr_sort);
+    }
+    return 0;
+}
+
+static hwaddr dimm_bus_get_free_addr(DimmBus *bus, const hwaddr *hint,
+                                     uint64_t size, Error **errp)
+{
+    GSList *list = NULL, *item;
+    hwaddr new_start, ret;
+    uint64_t as_size;
+
+    qbus_walk_children(BUS(bus), dimm_bus_built_dimm_list, NULL, &list);
+
+    if (hint) {
+        new_start = *hint;
+    } else {
+        new_start = bus->base;
+    }
+
+    /* find address range that will fit new DIMM */
+    for (item = list; item; item = g_slist_next(item)) {
+        DimmDevice *dimm = item->data;
+        if (ranges_overlap(dimm->start, dimm->size, new_start, size)) {
+            if (hint) {
+                DeviceState *d = DEVICE(dimm);
+                error_setg(errp, "address range conflicts with '%s'", d->id);
+                break;
+            }
+            new_start = dimm->start + dimm->size;
+        }
+    }
+    ret = new_start;
+
+    g_slist_free(list);
+
+    as_size = memory_region_size(&bus->as);
+    if ((new_start + size) > (bus->base + as_size)) {
+        error_setg(errp, "can't add memory beyond 0x%" PRIx64,
+                   bus->base + as_size);
+    }
+
+    return ret;
+}
+
 static void dimm_bus_register_memory(DimmBus *bus, DimmDevice *dimm,
                                      Error **errp)
 {
@@ -85,6 +144,7 @@ static void dimm_bus_class_init(ObjectClass *klass, void *data)
     }
     dc->register_memory = dimm_bus_register_memory;
     dc->get_free_slot = dimm_bus_get_free_slot;
+    dc->get_free_addr = dimm_bus_get_free_addr;
 }
 
 static const TypeInfo dimm_bus_info = {
@@ -111,6 +171,7 @@ static void dimm_realize(DeviceState *dev, Error **errp)
     BusClass *bc = BUS_GET_CLASS(bus);
     DimmBusClass *dc = DIMM_BUS_GET_CLASS(bus);
     int *slot_hint;
+    hwaddr *start_hint;
 
     if (!dev->id) {
         error_setg(errp, "missing 'id' property");
@@ -128,6 +189,16 @@ static void dimm_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    start_hint = !dimm->start ? NULL : &dimm->start;
+    if (start_hint && (dimm->start < bus->base)) {
+        error_setg(errp, "can't map DIMM below: 0x%" PRIx64, bus->base);
+        return;
+    }
+    g_assert(dc->get_free_addr);
+    dimm->start = dc->get_free_addr(bus, start_hint, dimm->size, errp);
+    if (error_is_set(errp)) {
+        return;
+    }
 
     memory_region_init_ram(&dimm->mr, dev->id, dimm->size);
 
diff --git a/include/hw/mem-hotplug/dimm.h b/include/hw/mem-hotplug/dimm.h
index d8d11a3..55b1571 100644
--- a/include/hw/mem-hotplug/dimm.h
+++ b/include/hw/mem-hotplug/dimm.h
@@ -32,6 +32,7 @@
 /**
  * DimmBus:
  * @start: starting physical address, where @DimmDevice is mapped.
+ * Default value: 0, means that address is auto-allocated.
  * @size: amount of memory mapped at @start.
  * @node: numa node to which @DimmDevice is attached.
  * @slot: slot number into which @DimmDevice is plugged in.
@@ -73,6 +74,9 @@ typedef struct DimmBus {
  * @get_free_slot: returns a not occupied slot number. If @hint is provided,
  * it tries to return slot specified by @hint if it's not busy or returns
  * error in @errp.
+ * @get_free_addr: returns address where @DimmDevice of specified size
+ * might be mapped. If @hint is specified it returns hinted address if
+ * region is available or error in @errp.
  * @register_memory: map @DimmDevice into hot-plugable address space
  */
 typedef struct DimmBusClass {
@@ -80,6 +84,8 @@ typedef struct DimmBusClass {
 
     int (*get_free_slot)(DimmBus *bus, const int *hint, Error **errp);
     void (*register_memory)(DimmBus *bus, DimmDevice *dimm, Error **errp);
+    hwaddr (*get_free_addr)(DimmBus *bus, const hwaddr *hint,
+                            uint64_t size, Error **errp);
 } DimmBusClass;
 
 #endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 12/16] dimm: introduce memory added notifier
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
                   ` (10 preceding siblings ...)
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 11/16] dimm: add busy address check and address auto-allocation Igor Mammedov
@ 2013-07-23 16:23 ` Igor Mammedov
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 13/16] acpi/piix4: introduce memory hot-plug interface QEMU<->ACPI BIOS Igor Mammedov
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Igor Mammedov @ 2013-07-23 16:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: vasilis.liaskovitis, hutao, pbonzini

it will notify ACPI subsystem about added DimmDevice

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/mem-hotplug/dimm.c   |   13 +++++++++++++
 include/sysemu/sysemu.h |    3 +++
 stubs/Makefile.objs     |    1 +
 stubs/memory_hotplug.c  |    6 ++++++
 4 files changed, 23 insertions(+), 0 deletions(-)
 create mode 100644 stubs/memory_hotplug.c

diff --git a/hw/mem-hotplug/dimm.c b/hw/mem-hotplug/dimm.c
index 7e3e383..4ecc590 100644
--- a/hw/mem-hotplug/dimm.c
+++ b/hw/mem-hotplug/dimm.c
@@ -22,6 +22,17 @@
 #include "qemu/config-file.h"
 #include "qemu/bitmap.h"
 #include "qemu/range.h"
+#include "qemu/notify.h"
+#include "sysemu/sysemu.h"
+
+/* Memory hot-plug notifiers */
+static NotifierList mem_added_notifiers =
+    NOTIFIER_LIST_INITIALIZER(mem_added_notifiers);
+
+void qemu_register_mem_added_notifier(Notifier *notifier)
+{
+    notifier_list_add(&mem_added_notifiers, notifier);
+}
 
 static void dimm_bus_initfn(Object *obj)
 {
@@ -204,6 +215,8 @@ static void dimm_realize(DeviceState *dev, Error **errp)
 
     g_assert(dc->register_memory);
     dc->register_memory(bus, dimm, errp);
+
+    notifier_list_notify(&mem_added_notifiers, dev);
 }
 
 static void dimm_class_init(ObjectClass *klass, void *data)
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 2fb71af..9abbf71 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -156,6 +156,9 @@ void drive_hot_add(Monitor *mon, const QDict *qdict);
 /* CPU hotplug */
 void qemu_register_cpu_added_notifier(Notifier *notifier);
 
+/* Memory hotplug */
+void qemu_register_mem_added_notifier(Notifier *notifier);
+
 /* pcie aer error injection */
 void pcie_aer_inject_error_print(Monitor *mon, const QObject *data);
 int do_pcie_aer_inject_error(Monitor *mon,
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 9b701b4..2476c3d 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -25,3 +25,4 @@ stub-obj-y += vm-stop.o
 stub-obj-y += vmstate.o
 stub-obj-$(CONFIG_WIN32) += fd-register.o
 stub-obj-y += cpus.o
+stub-obj-y += memory_hotplug.o
diff --git a/stubs/memory_hotplug.c b/stubs/memory_hotplug.c
new file mode 100644
index 0000000..60844f1
--- /dev/null
+++ b/stubs/memory_hotplug.c
@@ -0,0 +1,6 @@
+#include "qemu/notify.h"
+#include "sysemu/sysemu.h"
+
+void qemu_register_mem_added_notifier(Notifier *notifier)
+{
+}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 13/16] acpi/piix4: introduce memory hot-plug interface QEMU<->ACPI BIOS
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
                   ` (11 preceding siblings ...)
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 12/16] dimm: introduce memory added notifier Igor Mammedov
@ 2013-07-23 16:23 ` Igor Mammedov
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 14/16] pc: ACPI BIOS: implement memory hotplug interface Igor Mammedov
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Igor Mammedov @ 2013-07-23 16:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: vasilis.liaskovitis, hutao, pbonzini

- implements QEMU part of memory hotplug protocol described at
  docs/specs/acpi_mem_hotplug.txt
- handles memory add notification event.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 docs/specs/acpi_mem_hotplug.txt |   38 +++++++++
 hw/acpi/piix4.c                 |  170 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 207 insertions(+), 1 deletions(-)
 create mode 100644 docs/specs/acpi_mem_hotplug.txt

diff --git a/docs/specs/acpi_mem_hotplug.txt b/docs/specs/acpi_mem_hotplug.txt
new file mode 100644
index 0000000..33b7aa0
--- /dev/null
+++ b/docs/specs/acpi_mem_hotplug.txt
@@ -0,0 +1,38 @@
+QEMU<->ACPI BIOS memory hotplug interface
+--------------------------------------
+
+ACPI BIOS GPE.3 handler is dedicated for notifying OS of memory hot-add
+or hot-remove events. Read-only.
+
+Memory Dimm hot-plug interface (IO port 0xaf80-0xaf97, 1-4 byte access):
+---------------------------------------------------------------
+0xaf80:
+  read access:
+      [0x0-0x3] Lo part of memory device phys address
+      [0x4-0x7] Hi part of memory device phys address
+      [0x8-0xb] Lo part of memory device size in bytes
+      [0xc-0xf] Hi part of memory device size in bytes
+      [0x14] Memory hot-plug interface version
+      [0x15] Memory device status fields
+          bits:
+              1: device is enabled, end may be used by guest code
+              2: device insert event, used by ACPI BIOS to distinguish
+                 device for which no device check event to OSPM was issued
+      [0x16-0x17] reserved
+
+  write access:
+      [0x0-0x3] Memory device slot selector, selects active memory device.
+                All following accesses to other registers in 0xaf80-0xaf97
+                region will read/store data from/to selected memory device.
+      [0x4-0x7] OST event code reported by OSPM
+      [0x8-0xb] OST status code reported by OSPM
+      [0x15] Memory device status fields
+          bits:
+              2: if set to 1 clears device insert event, set by ACPI BIOS
+                 after it has sent device check event to OSPM for
+                 seleted memory device
+
+Selecting memory device slot beyond present range has no effect on platform:
+   - following write access to memory hot-plug registers except of
+     [0x0-0x3] is ignored;
+   - following read access to memory hot-plug registers returns 0
diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index c077a7a..f58a5bf 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -29,6 +29,8 @@
 #include "exec/ioport.h"
 #include "hw/nvram/fw_cfg.h"
 #include "exec/address-spaces.h"
+#include "hw/mem-hotplug/dimm.h"
+#include "qemu/config-file.h"
 
 //#define DEBUG
 
@@ -50,9 +52,12 @@
 
 #define PIIX4_PROC_BASE 0xaf00
 #define PIIX4_PROC_LEN 32
+#define PIIX4_MEM_BASE 0xaf80
+#define PIIX4_MEM_LEN 24
 
 #define PIIX4_PCI_HOTPLUG_STATUS 2
 #define PIIX4_CPU_HOTPLUG_STATUS 4
+#define PIIX4_MEM_HOTPLUG_STATUS 8
 
 struct pci_status {
     uint32_t up; /* deprecated, maintained for migration compatibility */
@@ -63,6 +68,20 @@ typedef struct CPUStatus {
     uint8_t sts[PIIX4_PROC_LEN];
 } CPUStatus;
 
+typedef struct MemStatus {
+    DimmDevice *dimm;
+    bool is_enabled;
+    bool is_inserting;
+    uint32_t ost_event;
+    uint32_t ost_status;
+} MemStatus;
+
+typedef struct mem_hotplug_state {
+    uint32_t selector;
+    uint32_t dev_count;
+    MemStatus *devs;
+} mem_hotplug_state;
+
 typedef struct PIIX4PMState {
     PCIDevice dev;
 
@@ -70,6 +89,7 @@ typedef struct PIIX4PMState {
     MemoryRegion io_gpe;
     MemoryRegion io_pci;
     MemoryRegion io_cpu;
+    MemoryRegion io_mem;
     ACPIREGS ar;
 
     APMState apm;
@@ -96,6 +116,9 @@ typedef struct PIIX4PMState {
     Notifier cpu_added_notifier;
 
     PcGuestInfo *guest_info;
+
+    Notifier mem_added_notifier;
+    mem_hotplug_state  gpe_mem;
 } PIIX4PMState;
 
 static void piix4_acpi_system_hot_add_init(MemoryRegion *parent,
@@ -115,7 +138,8 @@ static void pm_update_sci(PIIX4PMState *s)
                    ACPI_BITMASK_GLOBAL_LOCK_ENABLE |
                    ACPI_BITMASK_TIMER_ENABLE)) != 0) ||
         (((s->ar.gpe.sts[0] & s->ar.gpe.en[0]) &
-          (PIIX4_PCI_HOTPLUG_STATUS | PIIX4_CPU_HOTPLUG_STATUS)) != 0);
+          (PIIX4_PCI_HOTPLUG_STATUS | PIIX4_CPU_HOTPLUG_STATUS |
+          PIIX4_MEM_HOTPLUG_STATUS)) != 0);
 
     qemu_set_irq(s->irq, sci_level);
     /* schedule a timer interruption if needed */
@@ -706,6 +730,143 @@ static void piix4_init_cpu_status(CPUState *cpu, void *data)
     g->sts[id / 8] |= (1 << (id % 8));
 }
 
+static uint64_t piix4_hp_mem_read(void *opaque, hwaddr addr, unsigned int size)
+{
+    PIIX4PMState *s = opaque;
+    uint32_t val = 0;
+    mem_hotplug_state *mem_st = &s->gpe_mem;
+    MemStatus *mdev;
+
+    if (mem_st->selector >= mem_st->dev_count) {
+        return 0;
+    }
+
+    mdev = &mem_st->devs[mem_st->selector];
+    switch (addr) {
+    case 0x0: /* Lo part of phys address where DIMM is mapped */
+        val = mdev->dimm->start;
+        break;
+    case 0x4: /* Hi part of phys address where DIMM is mapped */
+        val = mdev->dimm->start >> 32;
+        break;
+    case 0x8: /* Lo part of DIMM size */
+        val = mdev->dimm->size;
+        break;
+    case 0xc: /* Hi part of DIMM size */
+        val = mdev->dimm->size >> 32;
+        break;
+    case 0x10: /* node proximity for _PXM method */
+        val = mdev->dimm->node;
+        break;
+    case 0x14: /* intf version */
+        val = 1;
+        break;
+    case 0x15: /* pack and return is_* fields */
+        val |= mdev->is_enabled   ? 1 : 0;
+        val |= mdev->is_inserting ? 2 : 0;
+        break;
+    }
+    return val;
+}
+
+static void piix4_hp_mem_write(void *opaque, hwaddr addr, uint64_t data,
+                             unsigned int size)
+{
+    PIIX4PMState *s = opaque;
+    mem_hotplug_state *mem_st = &s->gpe_mem;
+    MemStatus *mdev;
+
+    if (!mem_st->dev_count) {
+        return;
+    }
+
+    if (addr) {
+        if (mem_st->selector >= mem_st->dev_count) {
+            return;
+        }
+    }
+
+    switch (addr) {
+    case 0x0: /* DIMM slot selector */
+        mem_st->selector = data;
+        break;
+    case 0x4: /* _OST event  */
+        mdev = &mem_st->devs[mem_st->selector];
+        if (data == 1) {
+            /* TODO: handle device insert OST event */
+        } else if (data == 3) {
+            /* TODO: handle device remove OST event */
+        }
+        mdev->ost_event = data;
+        break;
+    case 0x8: /* _OST status */
+        mdev = &mem_st->devs[mem_st->selector];
+        mdev->ost_status = data;
+        /* TODO: report async error */
+        /* TODO: implement memory removal on guest signal */
+        break;
+    case 0x15:
+        mdev = &mem_st->devs[mem_st->selector];
+        if (data & 2) { /* clear insert event */
+            mdev->is_inserting  = false;
+        }
+        break;
+    }
+
+}
+static const MemoryRegionOps mem_hotplug_ops = {
+    .read = piix4_hp_mem_read,
+    .write = piix4_hp_mem_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .valid = {
+        .min_access_size = 1,
+        .max_access_size = 4,
+    },
+};
+
+static void piix4_init_mem_status(PIIX4PMState *s)
+{
+    mem_hotplug_state *mem_st = &s->gpe_mem;
+    QemuOpts *opts = qemu_opts_find(qemu_find_opts("memory-opts"), NULL);
+
+    if (!opts) { /* no -m x,... was passed to cmd line so no men hotplug */
+        return;
+    }
+
+    mem_st->dev_count = qemu_opt_get_number(opts, "slots", 0);
+
+    if (!mem_st->dev_count) {
+        return;
+    }
+
+    mem_st->devs = g_malloc0(sizeof(MemStatus) * mem_st->dev_count);
+}
+
+static void piix4_mem_added_req(Notifier *n, void *opaque)
+{
+    PIIX4PMState *s = container_of(n, PIIX4PMState, mem_added_notifier);
+    DeviceState *dev = DEVICE(opaque);
+    DimmDevice *dimm = DIMM(dev);
+    mem_hotplug_state *mem_st = &s->gpe_mem;
+    MemStatus *mdev;
+
+    if (dimm->slot >= mem_st->dev_count) {
+        return;
+    }
+
+    mdev = &mem_st->devs[dimm->slot];
+    object_property_add_link(OBJECT(s), dev->id, TYPE_DIMM,
+                             (Object **) &mdev->dimm, NULL);
+    object_property_set_link(OBJECT(s), OBJECT(dimm), dev->id, NULL);
+
+    mdev->is_enabled = true;
+    mdev->is_inserting = true;
+
+    /* do ACPI magic */
+    s->ar.gpe.sts[0] |= PIIX4_MEM_HOTPLUG_STATUS;
+    pm_update_sci(s);
+}
+
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
                                 PCIHotplugState state);
 
@@ -728,6 +889,13 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion *parent,
     memory_region_add_subregion(parent, PIIX4_PROC_BASE, &s->io_cpu);
     s->cpu_added_notifier.notify = piix4_cpu_added_req;
     qemu_register_cpu_added_notifier(&s->cpu_added_notifier);
+
+    piix4_init_mem_status(s);
+    memory_region_init_io(&s->io_mem, &mem_hotplug_ops, s,
+                          "apci-mem-hotplug", PIIX4_MEM_LEN);
+    memory_region_add_subregion(parent, PIIX4_MEM_BASE, &s->io_mem);
+    s->mem_added_notifier.notify = piix4_mem_added_req;
+    qemu_register_mem_added_notifier(&s->mem_added_notifier);
 }
 
 static void enable_device(PIIX4PMState *s, int slot)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 14/16] pc: ACPI BIOS: implement memory hotplug interface
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
                   ` (12 preceding siblings ...)
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 13/16] acpi/piix4: introduce memory hot-plug interface QEMU<->ACPI BIOS Igor Mammedov
@ 2013-07-23 16:23 ` Igor Mammedov
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 15/16] pc: update acpi-dsdt.hex.generated and add ssdt-mem.hex.generated Igor Mammedov
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Igor Mammedov @ 2013-07-23 16:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: vasilis.liaskovitis, hutao, pbonzini

- provides static DSDT object for memory hotplug
- SSDT template for memory devices and runtime generator
  of them in SSDT table.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/i386/Makefile.objs             |    2 +-
 hw/i386/acpi-build.c              |   33 +++++++++
 hw/i386/acpi-dsdt-mem-hotplug.dsl |  144 +++++++++++++++++++++++++++++++++++++
 hw/i386/acpi-dsdt.dsl             |    5 +-
 hw/i386/ssdt-mem.dsl              |   76 +++++++++++++++++++
 5 files changed, 258 insertions(+), 2 deletions(-)
 create mode 100644 hw/i386/acpi-dsdt-mem-hotplug.dsl
 create mode 100644 hw/i386/ssdt-mem.dsl

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index 2ab2572..fa919e6 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -6,7 +6,7 @@ obj-$(CONFIG_XEN) += xen_domainbuild.o xen_machine_pv.o
 obj-y += kvmvapic.o
 obj-y += acpi-build.o
 obj-y += bios-linker-loader.o
-hw/i386/acpi-build.o: hw/i386/acpi-build.c hw/i386/acpi-dsdt.hex hw/i386/ssdt-proc.hex hw/i386/ssdt-pcihp.hex hw/i386/ssdt-misc.hex hw/i386/q35-acpi-dsdt.hex
+hw/i386/acpi-build.o: hw/i386/acpi-build.c hw/i386/acpi-dsdt.hex hw/i386/ssdt-proc.hex hw/i386/ssdt-pcihp.hex hw/i386/ssdt-misc.hex hw/i386/q35-acpi-dsdt.hex hw/i386/ssdt-mem.hex
 hw/i386/pc_piix.o: hw/i386/pc_piix.c hw/i386/acpi-dsdt.hex
 hw/i386/pc_q35.o: hw/i386/pc_q35.c hw/i386/q35-acpi-dsdt.hex
 
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index bc44f95..7d7fa1f 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -35,6 +35,7 @@
 #include "hw/nvram/fw_cfg.h"
 #include "hw/i386/bios-linker-loader.h"
 #include "hw/loader.h"
+#include "qemu/config-file.h"
 
 #define ACPI_BUILD_APPNAME  "Bochs"
 #define ACPI_BUILD_APPNAME6 "BOCHS "
@@ -267,6 +268,17 @@ acpi_encode_len(uint8_t *ssdt_ptr, int length, int bytes)
 #define ACPI_PCIHP_SIZEOF (*ssdt_pcihp_end - *ssdt_pcihp_start)
 #define ACPI_PCIHP_AML (ssdp_pcihp_aml + *ssdt_pcihp_start)
 
+#include "hw/i386/ssdt-mem.hex"
+
+/* 0x5B 0x82 DeviceOp PkgLength NameString DimmID */
+#define ACPI_MEM_AML (ssdm_mem_aml + *ssdt_mem_start)
+#define ACPI_MEM_SIZEOF (*ssdt_mem_end - *ssdt_mem_start)
+#define ACPI_MEM_OFFSET_HEX (*ssdt_mem_name - *ssdt_mem_start + 2)
+#define ACPI_MEM_OFFSET_ID (*ssdt_mem_id - *ssdt_mem_start + 7)
+#define ACPI_MEM_DEV_COUNT_OBJ (ssdm_mem_aml + *ssdt_mem_count_name - 1)
+#define ACPI_MEM_DEV_COUNT_OFFSET (*ssdt_mem_count - *ssdt_mem_count_name + 1)
+#define ACPI_MEM_DEV_COUNT_SIZE (*ssdt_mem_start - *ssdt_mem_count_name + 1)
+
 #define ACPI_SSDT_SIGNATURE 0x54445353 /* SSDT */
 #define ACPI_SSDT_HEADER_LENGTH 36
 
@@ -326,11 +338,16 @@ build_ssdt(GArray *table_data, GArray *linker,
            FWCfgState *fw_cfg, PcGuestInfo *guest_info)
 {
     int acpi_cpus = MIN(0xff, guest_info->apic_id_limit);
+    QemuOpts *opts = qemu_opts_find(qemu_find_opts("memory-opts"), NULL);
+    uint32_t acpi_mem_devs = opts ? qemu_opt_get_number(opts, "slots", 0) : 0;
     int length = (sizeof(ssdp_misc_aml)                     /* _S3_ / _S4_ / _S5_ */
                   + (1+3+4)                                 /* Scope(_SB_) */
                   + (acpi_cpus * ACPI_PROC_SIZEOF)               /* procs */
                   + (1+2+5+(12*acpi_cpus))                  /* NTFY */
                   + (6+2+1+(1*acpi_cpus))                   /* CPON */
+                  + (1+2+5+(12*acpi_mem_devs))              /* MTFY */
+                  + ACPI_MEM_DEV_COUNT_SIZE                 /* MDNR const */
+                  + (acpi_mem_devs * ACPI_MEM_SIZEOF)       /* mem devices */
                   + (1+3+4)                                 /* Scope(PCI0) */
                   + ((PCI_SLOT_MAX - 1) * ACPI_PCIHP_SIZEOF)        /* slots */
                   + (1+2+5+(12*(PCI_SLOT_MAX - 1))));          /* PCNT */
@@ -407,6 +424,22 @@ build_ssdt(GArray *table_data, GArray *linker,
     for (i=0; i<acpi_cpus; i++)
         *(ssdt_ptr++) = (test_bit(i, guest_info->found_cpus)) ? 0x01 : 0x00;
 
+    /* set number of mem devices. i.e. declare Name(MDNR, nb_memdevs) */
+    memcpy(ssdt_ptr, ACPI_MEM_DEV_COUNT_OBJ, ACPI_MEM_DEV_COUNT_SIZE);
+    memcpy(ssdt_ptr + ACPI_MEM_DEV_COUNT_OFFSET, &acpi_mem_devs, 4);
+    ssdt_ptr += ACPI_MEM_DEV_COUNT_SIZE;
+
+    /* build mem devices and notifiers for them */
+    for (i = 0; i < acpi_mem_devs; i++) {
+        char id[5];
+        snprintf(id, sizeof(id), "%02x", i);
+        memcpy(ssdt_ptr, ACPI_MEM_AML, ACPI_MEM_SIZEOF);
+        memcpy(ssdt_ptr + ACPI_MEM_OFFSET_HEX, id, 2);
+        memcpy(ssdt_ptr + ACPI_MEM_OFFSET_ID, id, 2);
+        ssdt_ptr += ACPI_MEM_SIZEOF;
+    }
+    ssdt_ptr = build_notify(ssdt_ptr, "MTFY", 0, acpi_mem_devs, "MP00", 2);
+
     /* build Scope(PCI0) opcode */
     *(ssdt_ptr++) = 0x10; /* ScopeOp */
     ssdt_ptr = acpi_encode_len(ssdt_ptr, length - (ssdt_ptr - ssdt), 3);
diff --git a/hw/i386/acpi-dsdt-mem-hotplug.dsl b/hw/i386/acpi-dsdt-mem-hotplug.dsl
new file mode 100644
index 0000000..3803edb
--- /dev/null
+++ b/hw/i386/acpi-dsdt-mem-hotplug.dsl
@@ -0,0 +1,144 @@
+/*
+ * Memory hotplug ACPI DSDT static objects definitions
+ *
+ * Copyright (C) 2013 Red Hat Inc
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+Scope(\_SB) {
+    /* Objects provided by run-time generated SSDT */
+    External(MTFY, MethodObj)
+    External(MDNR, IntObj)
+
+    /* Memory hotplug IO registers */
+    OperationRegion(HPMR, SystemIO, 0xaf80, 24)
+    Field (HPMR, DWordAcc, NoLock, Preserve)
+    {
+        MRBL, 32, // DIMM start addr Low word, read only
+        MRBH, 32, // DIMM start addr Hi word, read only
+        MRLL, 32, // DIMM size Low word, read only
+        MRLH, 32, // DIMM size Hi word, read only
+        MPX, 32,  // DIMM node proximity, read only
+    }
+    Field (HPMR, ByteAcc, NoLock, Preserve)
+    {
+        Offset(20),
+        MVER, 8, // Interface version
+        MES,  1, // 1 if DIMM enabled for _STA, read only
+        MINS, 1, // (read) 1 if DIMM has a insert event. (write) 1 after MTFY()
+        MRMV, 1, // 1 if DIMM has a remove request, read only
+    }
+
+    Mutex (MLCK, 0)
+    Field (HPMR, DWordAcc, NoLock, Preserve)
+    {
+        MSEL, 32,  // DIMM selector, write only
+        MOEV, 32,  // _OST event code, write only
+        MOSC, 32,  // _OST status code, write only
+    }
+
+    Method(MESC, 0) {
+        Store(Zero, Local0) // Mem devs iterrator
+
+        Acquire(MLCK, 0xFFFF)
+        while (LLess(Local0, MDNR)) {
+            Store(Local0, MSEL) // select Local0 DIMM
+            If (LEqual(MINS, One)) { // Memory device needs check
+                \_SB.MTFY(Local0, 1)
+                Store(1, MINS)
+            }
+            If (LEqual(MRMV, One)) { // Ejection request
+                \_SB.MTFY(Local0, 3)
+            }
+            Add(Local0, One, Local0) // goto next DIMM
+        }
+        Release(MLCK)
+        Return(One)
+    }
+
+    Method (MRST, 1) {
+        Store(Zero, Local0)
+
+        Acquire(MLCK, 0xFFFF)
+        Store(ToInteger(Arg0), MSEL) // select DIMM
+
+        If (LEqual(MES, One)) {
+            Store(0xF, Local0)
+        }
+
+        Release(MLCK)
+        Return(Local0)
+    }
+
+    Method(MCRS, 1) {
+        Acquire(MLCK, 0xFFFF)
+        Store(ToInteger(Arg0), MSEL) // select DIMM
+
+        Name(MR64, ResourceTemplate() {
+            QWordMemory(ResourceProducer, PosDecode, MinFixed, MaxFixed,
+            Cacheable, ReadWrite,
+            0x0000000000000000,        // Address Space Granularity
+            0x0000000000000000,        // Address Range Minimum
+            0xFFFFFFFFFFFFFFFE,        // Address Range Maximum
+            0x0000000000000000,        // Address Translation Offset
+            0xFFFFFFFFFFFFFFFF,        // Address Length
+            ,, MW64, AddressRangeMemory, TypeStatic)
+        })
+
+        CreateDWordField(MR64, 14, MINL)
+        CreateDWordField(MR64, 18, MINH)
+        CreateDWordField(MR64, 38, LENL)
+        CreateDWordField(MR64, 42, LENH)
+        CreateDWordField(MR64, 22, MAXL)
+        CreateDWordField(MR64, 26, MAXH)
+
+        Store(MRBH, MINH)
+        Store(MRBL, MINL)
+        Store(MRLH, LENH)
+        Store(MRLL, LENL)
+
+        // 64-bit math: MAX = MIN + LEN - 1
+        Add(MINL, LENL, MAXL)
+        Add(MINH, LENH, MAXH)
+        If (Or(LLess(MAXL, MINL), LLess(MAXL, LENL))) {
+            Add(MAXH, 1, MAXH)
+        }
+        If (LEqual(MAXL, Zero)) {
+            Subtract(MAXH, One, MAXH)
+            Store(0xFFFFFFFF, MAXL)
+        } Else {
+            Subtract(MAXL, One, MAXL)
+        }
+
+        Release(MLCK)
+        Return(MR64)
+    }
+
+    Method (MPXM, 1) {
+        Acquire(MLCK, 0xFFFF)
+        Store(ToInteger(Arg0), MSEL) // select DIMM
+        Store(MPX, Local0)
+        Release(MLCK)
+        Return(Local0)
+    }
+
+    Method(MOST, 4) {
+        Acquire(MLCK, 0xFFFF)
+        Store(ToInteger(Arg0), MSEL) // select DIMM
+        Store(Arg1, MOEV)
+        Store(Arg2, MOSC)
+        Release(MLCK)
+    }
+}
diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
index 90efce0..726332f 100644
--- a/hw/i386/acpi-dsdt.dsl
+++ b/hw/i386/acpi-dsdt.dsl
@@ -294,6 +294,7 @@ DefinitionBlock (
     }
 
 #include "acpi-dsdt-cpu-hotplug.dsl"
+#include "acpi-dsdt-mem-hotplug.dsl"
 
 
 /****************************************************************
@@ -313,7 +314,9 @@ DefinitionBlock (
             // CPU hotplug event
             \_SB.PRSC()
         }
-        Method(_L03) {
+        Method(_E03) {
+            // Memory hotplug event
+            \_SB.MESC()
         }
         Method(_L04) {
         }
diff --git a/hw/i386/ssdt-mem.dsl b/hw/i386/ssdt-mem.dsl
new file mode 100644
index 0000000..c70aa1b
--- /dev/null
+++ b/hw/i386/ssdt-mem.dsl
@@ -0,0 +1,76 @@
+/*
+ * Memory hotplug ACPI DSDT static objects definitions
+ *
+ * Copyright ProfitBricks GmbH 2012
+ * Copyright (C) 2013 Red Hat Inc
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+/* This file is the basis for the ssdt_mem[] variable in src/acpi.c.
+ * It defines the contents of the memory device object.  At
+ * runtime, a dynamically generated SSDT will contain one copy of this
+ * AML snippet for every possible memory device in the system.  The
+ * objects will be placed in the \_SB_ namespace.
+ *
+ * In addition to the aml code generated from this file, the
+ * src/acpi.c file creates a MTFY method with an entry for each memdevice:
+ *     Method(MTFY, 2) {
+ *         If (LEqual(Arg0, 0x00)) { Notify(MP00, Arg1) }
+ *         If (LEqual(Arg0, 0x01)) { Notify(MP01, Arg1) }
+ *         ...
+ *     }
+ */
+ACPI_EXTRACT_ALL_CODE ssdm_mem_aml
+
+DefinitionBlock ("ssdt-mem.aml", "SSDT", 0x02, "BXPC", "CSSDT", 0x1)
+/*  v------------------ DO NOT EDIT ------------------v */
+{
+    ACPI_EXTRACT_NAME_STRING      ssdt_mem_count_name
+    ACPI_EXTRACT_NAME_DWORD_CONST ssdt_mem_count
+    Name(MDNR, 0xFFFFFFFF)
+
+    ACPI_EXTRACT_DEVICE_START ssdt_mem_start
+    ACPI_EXTRACT_DEVICE_END ssdt_mem_end
+    ACPI_EXTRACT_DEVICE_STRING ssdt_mem_name
+    Device(MPAA) {
+        ACPI_EXTRACT_NAME_STRING ssdt_mem_id
+        Name(_UID, "0xAA")
+/*  ^------------------ DO NOT EDIT ------------------^
+ * Don't change the above without also updating the C code.
+ */
+        Name(_HID, EISAID("PNP0C80"))
+
+        External(MCRS, MethodObj)
+        External(MRST, MethodObj)
+        External(MOST, MethodObj)
+        External(MPXM, MethodObj)
+
+        Method(_CRS, 0) {
+            Return (MCRS(_UID))
+        }
+
+        Method (_STA, 0) {
+            Return (MRST(_UID))
+        }
+
+        Method (_PXM, 0) {
+            Return (MPXM(_UID))
+        }
+
+        Method (_OST, 3) {
+            MOST(_UID, Arg0, Arg1, Arg2)
+        }
+    }
+}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 15/16] pc: update acpi-dsdt.hex.generated and add ssdt-mem.hex.generated
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
                   ` (13 preceding siblings ...)
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 14/16] pc: ACPI BIOS: implement memory hotplug interface Igor Mammedov
@ 2013-07-23 16:23 ` Igor Mammedov
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 16/16] pc: ACPI BIOS: reserve SRAT entry for hotplug mem hole Igor Mammedov
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Igor Mammedov @ 2013-07-23 16:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: vasilis.liaskovitis, hutao, pbonzini

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/i386/acpi-dsdt.hex.generated |  633 ++++++++++++++++++++++++++++++++++++++-
 hw/i386/ssdt-mem.hex.generated  |  161 ++++++++++
 2 files changed, 785 insertions(+), 9 deletions(-)
 create mode 100644 hw/i386/ssdt-mem.hex.generated

diff --git a/hw/i386/acpi-dsdt.hex.generated b/hw/i386/acpi-dsdt.hex.generated
index 68cab3e..39fbe41 100644
--- a/hw/i386/acpi-dsdt.hex.generated
+++ b/hw/i386/acpi-dsdt.hex.generated
@@ -3,12 +3,12 @@ static unsigned char AcpiDsdtAmlCode[] = {
 0x53,
 0x44,
 0x54,
-0x37,
-0x11,
+0x9e,
+0x13,
 0x0,
 0x0,
 0x1,
-0xe1,
+0x5,
 0x42,
 0x58,
 0x50,
@@ -31,9 +31,9 @@ static unsigned char AcpiDsdtAmlCode[] = {
 0x4e,
 0x54,
 0x4c,
-0x28,
-0x5,
-0x10,
+0x23,
+0x1,
+0x9,
 0x20,
 0x10,
 0x49,
@@ -4248,8 +4248,613 @@ static unsigned char AcpiDsdtAmlCode[] = {
 0x75,
 0x60,
 0x10,
+0x4c,
+0x25,
+0x5f,
+0x53,
+0x42,
+0x5f,
+0x5b,
+0x80,
+0x48,
+0x50,
+0x4d,
+0x52,
+0x1,
+0xb,
+0x80,
+0xaf,
+0xa,
+0x18,
+0x5b,
+0x81,
+0x1f,
+0x48,
+0x50,
+0x4d,
+0x52,
+0x3,
+0x4d,
+0x52,
+0x42,
+0x4c,
+0x20,
+0x4d,
+0x52,
+0x42,
+0x48,
+0x20,
+0x4d,
+0x52,
+0x4c,
+0x4c,
+0x20,
+0x4d,
+0x52,
+0x4c,
+0x48,
+0x20,
+0x4d,
+0x50,
+0x58,
+0x5f,
+0x20,
+0x5b,
+0x81,
+0x1d,
+0x48,
+0x50,
+0x4d,
+0x52,
+0x1,
+0x0,
+0x40,
+0xa,
+0x4d,
+0x56,
+0x45,
+0x52,
+0x8,
+0x4d,
+0x45,
+0x53,
+0x5f,
+0x1,
+0x4d,
+0x49,
 0x4e,
-0x9,
+0x53,
+0x1,
+0x4d,
+0x52,
+0x4d,
+0x56,
+0x1,
+0x5b,
+0x1,
+0x4d,
+0x4c,
+0x43,
+0x4b,
+0x0,
+0x5b,
+0x81,
+0x15,
+0x48,
+0x50,
+0x4d,
+0x52,
+0x3,
+0x4d,
+0x53,
+0x45,
+0x4c,
+0x20,
+0x4d,
+0x4f,
+0x45,
+0x56,
+0x20,
+0x4d,
+0x4f,
+0x53,
+0x43,
+0x20,
+0x14,
+0x4f,
+0x4,
+0x4d,
+0x45,
+0x53,
+0x43,
+0x0,
+0x70,
+0x0,
+0x60,
+0x5b,
+0x23,
+0x4d,
+0x4c,
+0x43,
+0x4b,
+0xff,
+0xff,
+0xa2,
+0x34,
+0x95,
+0x60,
+0x4d,
+0x44,
+0x4e,
+0x52,
+0x70,
+0x60,
+0x4d,
+0x53,
+0x45,
+0x4c,
+0xa0,
+0x13,
+0x93,
+0x4d,
+0x49,
+0x4e,
+0x53,
+0x1,
+0x4d,
+0x54,
+0x46,
+0x59,
+0x60,
+0x1,
+0x70,
+0x1,
+0x4d,
+0x49,
+0x4e,
+0x53,
+0xa0,
+0xe,
+0x93,
+0x4d,
+0x52,
+0x4d,
+0x56,
+0x1,
+0x4d,
+0x54,
+0x46,
+0x59,
+0x60,
+0xa,
+0x3,
+0x72,
+0x60,
+0x1,
+0x60,
+0x5b,
+0x27,
+0x4d,
+0x4c,
+0x43,
+0x4b,
+0xa4,
+0x1,
+0x14,
+0x2d,
+0x4d,
+0x52,
+0x53,
+0x54,
+0x1,
+0x70,
+0x0,
+0x60,
+0x5b,
+0x23,
+0x4d,
+0x4c,
+0x43,
+0x4b,
+0xff,
+0xff,
+0x70,
+0x99,
+0x68,
+0x0,
+0x4d,
+0x53,
+0x45,
+0x4c,
+0xa0,
+0xb,
+0x93,
+0x4d,
+0x45,
+0x53,
+0x5f,
+0x1,
+0x70,
+0xa,
+0xf,
+0x60,
+0x5b,
+0x27,
+0x4d,
+0x4c,
+0x43,
+0x4b,
+0xa4,
+0x60,
+0x14,
+0x4f,
+0x11,
+0x4d,
+0x43,
+0x52,
+0x53,
+0x1,
+0x5b,
+0x23,
+0x4d,
+0x4c,
+0x43,
+0x4b,
+0xff,
+0xff,
+0x70,
+0x99,
+0x68,
+0x0,
+0x4d,
+0x53,
+0x45,
+0x4c,
+0x8,
+0x4d,
+0x52,
+0x36,
+0x34,
+0x11,
+0x33,
+0xa,
+0x30,
+0x8a,
+0x2b,
+0x0,
+0x0,
+0xc,
+0x3,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0xfe,
+0xff,
+0xff,
+0xff,
+0xff,
+0xff,
+0xff,
+0xff,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0xff,
+0xff,
+0xff,
+0xff,
+0xff,
+0xff,
+0xff,
+0xff,
+0x79,
+0x0,
+0x8a,
+0x4d,
+0x52,
+0x36,
+0x34,
+0xa,
+0xe,
+0x4d,
+0x49,
+0x4e,
+0x4c,
+0x8a,
+0x4d,
+0x52,
+0x36,
+0x34,
+0xa,
+0x12,
+0x4d,
+0x49,
+0x4e,
+0x48,
+0x8a,
+0x4d,
+0x52,
+0x36,
+0x34,
+0xa,
+0x26,
+0x4c,
+0x45,
+0x4e,
+0x4c,
+0x8a,
+0x4d,
+0x52,
+0x36,
+0x34,
+0xa,
+0x2a,
+0x4c,
+0x45,
+0x4e,
+0x48,
+0x8a,
+0x4d,
+0x52,
+0x36,
+0x34,
+0xa,
+0x16,
+0x4d,
+0x41,
+0x58,
+0x4c,
+0x8a,
+0x4d,
+0x52,
+0x36,
+0x34,
+0xa,
+0x1a,
+0x4d,
+0x41,
+0x58,
+0x48,
+0x70,
+0x4d,
+0x52,
+0x42,
+0x48,
+0x4d,
+0x49,
+0x4e,
+0x48,
+0x70,
+0x4d,
+0x52,
+0x42,
+0x4c,
+0x4d,
+0x49,
+0x4e,
+0x4c,
+0x70,
+0x4d,
+0x52,
+0x4c,
+0x48,
+0x4c,
+0x45,
+0x4e,
+0x48,
+0x70,
+0x4d,
+0x52,
+0x4c,
+0x4c,
+0x4c,
+0x45,
+0x4e,
+0x4c,
+0x72,
+0x4d,
+0x49,
+0x4e,
+0x4c,
+0x4c,
+0x45,
+0x4e,
+0x4c,
+0x4d,
+0x41,
+0x58,
+0x4c,
+0x72,
+0x4d,
+0x49,
+0x4e,
+0x48,
+0x4c,
+0x45,
+0x4e,
+0x48,
+0x4d,
+0x41,
+0x58,
+0x48,
+0xa0,
+0x1f,
+0x7d,
+0x95,
+0x4d,
+0x41,
+0x58,
+0x4c,
+0x4d,
+0x49,
+0x4e,
+0x4c,
+0x95,
+0x4d,
+0x41,
+0x58,
+0x4c,
+0x4c,
+0x45,
+0x4e,
+0x4c,
+0x0,
+0x72,
+0x4d,
+0x41,
+0x58,
+0x48,
+0x1,
+0x4d,
+0x41,
+0x58,
+0x48,
+0xa0,
+0x17,
+0x93,
+0x4d,
+0x41,
+0x58,
+0x4c,
+0x0,
+0x74,
+0x4d,
+0x41,
+0x58,
+0x48,
+0x1,
+0x4d,
+0x41,
+0x58,
+0x48,
+0x70,
+0xff,
+0x4d,
+0x41,
+0x58,
+0x4c,
+0xa1,
+0xb,
+0x74,
+0x4d,
+0x41,
+0x58,
+0x4c,
+0x1,
+0x4d,
+0x41,
+0x58,
+0x4c,
+0x5b,
+0x27,
+0x4d,
+0x4c,
+0x43,
+0x4b,
+0xa4,
+0x4d,
+0x52,
+0x36,
+0x34,
+0x14,
+0x24,
+0x4d,
+0x50,
+0x58,
+0x4d,
+0x1,
+0x5b,
+0x23,
+0x4d,
+0x4c,
+0x43,
+0x4b,
+0xff,
+0xff,
+0x70,
+0x99,
+0x68,
+0x0,
+0x4d,
+0x53,
+0x45,
+0x4c,
+0x70,
+0x4d,
+0x50,
+0x58,
+0x5f,
+0x60,
+0x5b,
+0x27,
+0x4d,
+0x4c,
+0x43,
+0x4b,
+0xa4,
+0x60,
+0x14,
+0x28,
+0x4d,
+0x4f,
+0x53,
+0x54,
+0x4,
+0x5b,
+0x23,
+0x4d,
+0x4c,
+0x43,
+0x4b,
+0xff,
+0xff,
+0x70,
+0x99,
+0x68,
+0x0,
+0x4d,
+0x53,
+0x45,
+0x4c,
+0x70,
+0x69,
+0x4d,
+0x4f,
+0x45,
+0x56,
+0x70,
+0x6a,
+0x4d,
+0x4f,
+0x53,
+0x43,
+0x5b,
+0x27,
+0x4d,
+0x4c,
+0x43,
+0x4b,
+0x10,
+0x48,
+0xa,
 0x5f,
 0x47,
 0x50,
@@ -4316,12 +4921,22 @@ static unsigned char AcpiDsdtAmlCode[] = {
 0x53,
 0x43,
 0x14,
-0x6,
+0x10,
 0x5f,
-0x4c,
+0x45,
 0x30,
 0x33,
 0x0,
+0x5c,
+0x2e,
+0x5f,
+0x53,
+0x42,
+0x5f,
+0x4d,
+0x45,
+0x53,
+0x43,
 0x14,
 0x6,
 0x5f,
diff --git a/hw/i386/ssdt-mem.hex.generated b/hw/i386/ssdt-mem.hex.generated
new file mode 100644
index 0000000..265c49e
--- /dev/null
+++ b/hw/i386/ssdt-mem.hex.generated
@@ -0,0 +1,161 @@
+static unsigned char ssdt_mem_start[] = {
+0x2e
+};
+static unsigned char ssdt_mem_count_name[] = {
+0x25
+};
+static unsigned char ssdt_mem_end[] = {
+0x8d
+};
+static unsigned char ssdt_mem_id[] = {
+0x37
+};
+static unsigned char ssdt_mem_name[] = {
+0x32
+};
+static unsigned char ssdt_mem_count[] = {
+0x2a
+};
+static unsigned char ssdm_mem_aml[] = {
+0x53,
+0x53,
+0x44,
+0x54,
+0x8d,
+0x0,
+0x0,
+0x0,
+0x2,
+0x19,
+0x42,
+0x58,
+0x50,
+0x43,
+0x0,
+0x0,
+0x43,
+0x53,
+0x53,
+0x44,
+0x54,
+0x0,
+0x0,
+0x0,
+0x1,
+0x0,
+0x0,
+0x0,
+0x49,
+0x4e,
+0x54,
+0x4c,
+0x23,
+0x1,
+0x9,
+0x20,
+0x8,
+0x4d,
+0x44,
+0x4e,
+0x52,
+0xc,
+0xff,
+0xff,
+0xff,
+0xff,
+0x5b,
+0x82,
+0x4d,
+0x5,
+0x4d,
+0x50,
+0x41,
+0x41,
+0x8,
+0x5f,
+0x55,
+0x49,
+0x44,
+0xd,
+0x30,
+0x78,
+0x41,
+0x41,
+0x0,
+0x8,
+0x5f,
+0x48,
+0x49,
+0x44,
+0xc,
+0x41,
+0xd0,
+0xc,
+0x80,
+0x14,
+0xf,
+0x5f,
+0x43,
+0x52,
+0x53,
+0x0,
+0xa4,
+0x4d,
+0x43,
+0x52,
+0x53,
+0x5f,
+0x55,
+0x49,
+0x44,
+0x14,
+0xf,
+0x5f,
+0x53,
+0x54,
+0x41,
+0x0,
+0xa4,
+0x4d,
+0x52,
+0x53,
+0x54,
+0x5f,
+0x55,
+0x49,
+0x44,
+0x14,
+0xf,
+0x5f,
+0x50,
+0x58,
+0x4d,
+0x0,
+0xa4,
+0x4d,
+0x50,
+0x58,
+0x4d,
+0x5f,
+0x55,
+0x49,
+0x44,
+0x14,
+0x11,
+0x5f,
+0x4f,
+0x53,
+0x54,
+0x3,
+0x4d,
+0x4f,
+0x53,
+0x54,
+0x5f,
+0x55,
+0x49,
+0x44,
+0x68,
+0x69,
+0x6a
+};
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 16/16] pc: ACPI BIOS: reserve SRAT entry for hotplug mem hole
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
                   ` (14 preceding siblings ...)
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 15/16] pc: update acpi-dsdt.hex.generated and add ssdt-mem.hex.generated Igor Mammedov
@ 2013-07-23 16:23 ` Igor Mammedov
  2013-07-24  9:52 ` [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Hu Tao
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Igor Mammedov @ 2013-07-23 16:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: vasilis.liaskovitis, hutao, pbonzini

Needed for Windows to use hotplugged memory device, otherwise
it complains that server is not configured for memory hotplug.
Tests shows that aftewards it uses dynamically provided
proximity value from _PXM() method.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/i386/acpi-build.c |   31 +++++++++++++++++++++++--------
 1 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 7d7fa1f..69e8e68 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -477,15 +477,19 @@ build_hpet(GArray *table_data, GArray *linker)
                  (void*)hpet, ACPI_HPET_SIGNATURE, sizeof(*hpet), 1);
 }
 
+#define SRAT_MEM_ENABLED 1
+#define SRAT_MEM_HOTPLUG 2
+
 static void
-acpi_build_srat_memory(AcpiSratMemoryAffinity *numamem,
-                       uint64_t base, uint64_t len, int node, int enabled)
+acpi_build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
+                       uint64_t len, int node, bool enabled, bool hotplug)
 {
     numamem->type = ACPI_SRAT_MEMORY;
     numamem->length = sizeof(*numamem);
     memset(numamem->proximity, 0, 4);
     numamem->proximity[0] = node;
-    numamem->flags = cpu_to_le32(!!enabled);
+    numamem->flags |= enabled ? cpu_to_le32(SRAT_MEM_ENABLED) : 0;
+    numamem->flags |= hotplug ? cpu_to_le32(SRAT_MEM_HOTPLUG) : 0;
     numamem->base_addr = cpu_to_le64(base);
     numamem->range_length = cpu_to_le64(len);
 }
@@ -503,10 +507,13 @@ build_srat(GArray *table_data, GArray *linker,
     int srat_size;
     int slots;
     uint64_t mem_len, mem_base, next_base;
+    uint64_t hotplug_mem_size = guest_info->hotplug_mem_win.end -
+        guest_info->hotplug_mem_win.begin;
 
     srat_size = sizeof(*srat) +
         sizeof(AcpiSratProcessorAffinity) * guest_info->apic_id_limit +
-        sizeof(AcpiSratMemoryAffinity) * (guest_info->numa_nodes + 2);
+        sizeof(AcpiSratMemoryAffinity) * (guest_info->numa_nodes + 2 +
+        (hotplug_mem_size ? 1 : 0));
 
     srat = acpi_data_push(table_data, srat_size);
     srat->reserved1 = cpu_to_le32(1);
@@ -535,7 +542,7 @@ build_srat(GArray *table_data, GArray *linker,
     slots = 0;
     next_base = 0;
 
-    acpi_build_srat_memory(numamem, 0, 640*1024, 0, 1);
+    acpi_build_srat_memory(numamem, 0, 640*1024, 0, true, false);
     next_base = 1024 * 1024;
     numamem++;
     slots++;
@@ -551,7 +558,8 @@ build_srat(GArray *table_data, GArray *linker,
             next_base > guest_info->ram_size) {
             mem_len -= next_base - guest_info->ram_size;
             if (mem_len > 0) {
-                acpi_build_srat_memory(numamem, mem_base, mem_len, i-1, 1);
+                acpi_build_srat_memory(numamem, mem_base, mem_len, i-1,
+                                       true, false);
                 numamem++;
                 slots++;
             }
@@ -559,15 +567,22 @@ build_srat(GArray *table_data, GArray *linker,
             mem_len = next_base - guest_info->ram_size;
             next_base += (1ULL << 32) - guest_info->ram_size;
         }
-        acpi_build_srat_memory(numamem, mem_base, mem_len, i-1, 1);
+        acpi_build_srat_memory(numamem, mem_base, mem_len, i-1, true, false);
         numamem++;
         slots++;
     }
+
     for (; slots < guest_info->numa_nodes + 2; slots++) {
-        acpi_build_srat_memory(numamem, 0, 0, 0, 0);
+        acpi_build_srat_memory(numamem, 0, 0, 0, false, false);
         numamem++;
     }
 
+    /* allocate fake hotplug region upto maxmem for Windows */
+    if (hotplug_mem_size) {
+        acpi_build_srat_memory(numamem, guest_info->hotplug_mem_win.begin,
+                               hotplug_mem_size, 0, true, true);
+    }
+
     build_header(linker, table_data,
                  (void*)srat, ACPI_SRAT_SIGNATURE, srat_size, 1);
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation Igor Mammedov
@ 2013-07-23 17:09   ` Paolo Bonzini
  2013-07-24  8:36     ` Igor Mammedov
  0 siblings, 1 reply; 48+ messages in thread
From: Paolo Bonzini @ 2013-07-23 17:09 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: vasilis.liaskovitis, hutao, qemu-devel

Il 23/07/2013 18:23, Igor Mammedov ha scritto:
> - if slot property is not specified on -device/device_add command,
> treat default value as request for assigning DimmDevice to
> the first free slot.

Even with "-m" instead of "-numa mem", I think this is problematic
because we still need to separate the host and guest parts of the DIMM
device.  "-numa mem" (or the QMP command that Wanlong added) will be
necessary to allocate memory on the host side before adding a DIMM.

So slots will have three states: free (created with "-m"), allocated (a
free slot moves to this state with "-numa mem...,populated=no" when
migrating, or with the QMP command for regular hotplug), populated (an
allocated slot moves to this state with "-device dimm").

You would be able to plug a DIMM only into an allocated slot, and the
size will be specified on the slot rather than the DIMM device.

In general, I don't think free slots should be managed by the DimmBus,
and host vs. guest separation should be there even if we accept your
"-m" extension (doesn't look bad at all, I must say).

Paolo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 03/16] vl: convert -m to qemu_opts_parse()
  2013-07-23 16:22 ` [Qemu-devel] [PATCH 03/16] vl: convert -m to qemu_opts_parse() Igor Mammedov
@ 2013-07-23 17:11   ` Paolo Bonzini
  2013-07-24  8:40     ` Igor Mammedov
  0 siblings, 1 reply; 48+ messages in thread
From: Paolo Bonzini @ 2013-07-23 17:11 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: vasilis.liaskovitis, hutao, qemu-devel

Il 23/07/2013 18:22, Igor Mammedov ha scritto:
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  qemu-options.hx |    9 +++++++--
>  vl.c            |   53 +++++++++++++++++++++++++++++++++++++++++++++++------
>  2 files changed, 54 insertions(+), 8 deletions(-)
> 
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 137a39b..f799b3d 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -210,8 +210,13 @@ use is discouraged as it may be removed from future versions.
>  ETEXI
>  
>  DEF("m", HAS_ARG, QEMU_OPTION_m,
> -    "-m megs         set virtual RAM size to megs MB [default="
> -    stringify(DEFAULT_RAM_SIZE) "]\n", QEMU_ARCH_ALL)
> +    "-m [mem=]megs[,slots=n,maxmem=size]\n"
> +    "                set virtual RAM size to megs MB [default="
> +    stringify(DEFAULT_RAM_SIZE) "]\n"
> +    "                mem=start-up memory amount\n"
> +    "                slots=maximum number of hotplug slots\n"
> +    "                maxmem=maximum total amount of memory\n",
> +    QEMU_ARCH_ALL)
>  STEXI
>  @item -m @var{megs}
>  @findex -m
> diff --git a/vl.c b/vl.c
> index bf0c658..16c6f1e 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -516,6 +516,27 @@ static QemuOptsList qemu_realtime_opts = {
>      },
>  };
>  
> +static QemuOptsList qemu_mem_opts = {
> +    .name = "memory-opts",
> +    .implied_opt_name = "mem",
> +    .head = QTAILQ_HEAD_INITIALIZER(qemu_mem_opts.head),

This should have

    .merge_lists = true,

Paolo

> +    .desc = {
> +        {
> +            .name = "mem",
> +            .type = QEMU_OPT_SIZE,
> +        },
> +        {
> +            .name = "slots",
> +            .type = QEMU_OPT_NUMBER,
> +        },
> +        {
> +            .name = "maxmem",
> +            .type = QEMU_OPT_SIZE,
> +        },
> +        { /* end of list */ }
> +    },
> +};
> +
>  const char *qemu_get_vm_name(void)
>  {
>      return qemu_name;
> @@ -2933,6 +2954,7 @@ int main(int argc, char **argv, char **envp)
>      qemu_add_opts(&qemu_object_opts);
>      qemu_add_opts(&qemu_tpmdev_opts);
>      qemu_add_opts(&qemu_realtime_opts);
> +    qemu_add_opts(&qemu_mem_opts);
>  
>      runstate_init();
>  
> @@ -3224,21 +3246,40 @@ int main(int argc, char **argv, char **envp)
>                  exit(0);
>                  break;
>              case QEMU_OPTION_m: {
> -                int64_t value;
>                  uint64_t sz;
> -                char *end;
> +                const char *end;
> +                char *s;
>  
> -                value = strtosz(optarg, &end);
> -                if (value < 0 || *end) {
> -                    fprintf(stderr, "qemu: invalid ram size: %s\n", optarg);
> +                opts = qemu_opts_parse(qemu_find_opts("memory-opts"),
> +                                       optarg, 1);
> +                if (!opts) {
>                      exit(1);
>                  }
> -                sz = QEMU_ALIGN_UP((uint64_t)value, 8192);
> +
> +                /* fixup legacy sugffix-less format */
> +                end = qemu_opt_get(opts, "mem");
> +                if (g_ascii_isdigit(end[strlen(end) - 1])) {
> +                    s = g_strconcat(end, "M", NULL);
> +                    qemu_opt_set(opts, "mem", s);
> +                    g_free(s);
> +                }
> +
> +                sz = QEMU_ALIGN_UP(qemu_opt_get_size(opts, "mem", 0), 8192);
>                  ram_size = sz;
>                  if (ram_size != sz) {
>                      fprintf(stderr, "qemu: ram size too large\n");
>                      exit(1);
>                  }
> +                /* store aligned value for future use */
> +                s = g_strdup_printf("%" PRIu64, sz);
> +                qemu_opt_set(opts, "mem", s);
> +                g_free(s);
> +
> +                sz = qemu_opt_get_size(opts, "maxmem", ram_size);
> +                if (sz < ram_size) {
> +                    fprintf(stderr, "qemu: maxmem must be > initial memory\n");
> +                    exit(1);
> +                }
>                  break;
>              }
>  #ifdef CONFIG_TPM
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation
  2013-07-23 17:09   ` Paolo Bonzini
@ 2013-07-24  8:36     ` Igor Mammedov
  2013-07-24  9:41       ` Paolo Bonzini
  0 siblings, 1 reply; 48+ messages in thread
From: Igor Mammedov @ 2013-07-24  8:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: vasilis.liaskovitis, hutao, qemu-devel

On Tue, 23 Jul 2013 19:09:26 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 23/07/2013 18:23, Igor Mammedov ha scritto:
> > - if slot property is not specified on -device/device_add command,
> > treat default value as request for assigning DimmDevice to
> > the first free slot.
> 
> Even with "-m" instead of "-numa mem", I think this is problematic
> because we still need to separate the host and guest parts of the DIMM
> device.  "-numa mem" (or the QMP command that Wanlong added) will be
> necessary to allocate memory on the host side before adding a DIMM.
why not do host allocation part at the same time when DIMM is added, is
there a real need to separate DIMM device?

I probably miss something but -numa mem option and co aside what problem
couldn't be solved during DIMM device initialization and would require
a split DIMM device?

> 
> So slots will have three states: free (created with "-m"), allocated (a
> free slot moves to this state with "-numa mem...,populated=no" when
> migrating, or with the QMP command for regular hotplug), populated (an
> allocated slot moves to this state with "-device dimm").
> 
> You would be able to plug a DIMM only into an allocated slot, and the
> size will be specified on the slot rather than the DIMM device.
'slot' property is there only for migration sake to provide stable
numeric ID for QEMU<->ACPI BIOS interface. It's not used for any other
purpose and wasn't intended for any other usage..

on baremetal slot has noting to do with size of plugged in DIMM, why we
would model it other way if it only brings problems: like predefined size,
allocated, free etc. I think slot should be either free or busy.


> 
> In general, I don't think free slots should be managed by the DimmBus,
> and host vs. guest separation should be there even if we accept your
> "-m" extension (doesn't look bad at all, I must say).
> 
> Paolo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 03/16] vl: convert -m to qemu_opts_parse()
  2013-07-23 17:11   ` Paolo Bonzini
@ 2013-07-24  8:40     ` Igor Mammedov
  2013-07-24  9:04       ` Paolo Bonzini
  0 siblings, 1 reply; 48+ messages in thread
From: Igor Mammedov @ 2013-07-24  8:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: vasilis.liaskovitis, hutao, qemu-devel

On Tue, 23 Jul 2013 19:11:31 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 23/07/2013 18:22, Igor Mammedov ha scritto:
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  qemu-options.hx |    9 +++++++--
> >  vl.c            |   53 +++++++++++++++++++++++++++++++++++++++++++++++------
> >  2 files changed, 54 insertions(+), 8 deletions(-)
> > 
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 137a39b..f799b3d 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -210,8 +210,13 @@ use is discouraged as it may be removed from future versions.
> >  ETEXI
> >  
> >  DEF("m", HAS_ARG, QEMU_OPTION_m,
> > -    "-m megs         set virtual RAM size to megs MB [default="
> > -    stringify(DEFAULT_RAM_SIZE) "]\n", QEMU_ARCH_ALL)
> > +    "-m [mem=]megs[,slots=n,maxmem=size]\n"
> > +    "                set virtual RAM size to megs MB [default="
> > +    stringify(DEFAULT_RAM_SIZE) "]\n"
> > +    "                mem=start-up memory amount\n"
> > +    "                slots=maximum number of hotplug slots\n"
> > +    "                maxmem=maximum total amount of memory\n",
> > +    QEMU_ARCH_ALL)
> >  STEXI
> >  @item -m @var{megs}
> >  @findex -m
> > diff --git a/vl.c b/vl.c
> > index bf0c658..16c6f1e 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -516,6 +516,27 @@ static QemuOptsList qemu_realtime_opts = {
> >      },
> >  };
> >  
> > +static QemuOptsList qemu_mem_opts = {
> > +    .name = "memory-opts",
> > +    .implied_opt_name = "mem",
> > +    .head = QTAILQ_HEAD_INITIALIZER(qemu_mem_opts.head),
> 
> This should have
> 
>     .merge_lists = true,

Just to clarify: is it to allow syntax like?
 -m 512 -m slots=X -m maxmem=Y

> 
> Paolo
> 
> > +    .desc = {
> > +        {
> > +            .name = "mem",
> > +            .type = QEMU_OPT_SIZE,
> > +        },
> > +        {
> > +            .name = "slots",
> > +            .type = QEMU_OPT_NUMBER,
> > +        },
> > +        {
> > +            .name = "maxmem",
> > +            .type = QEMU_OPT_SIZE,
> > +        },
> > +        { /* end of list */ }
> > +    },
> > +};
> > +
> >  const char *qemu_get_vm_name(void)
> >  {
> >      return qemu_name;
> > @@ -2933,6 +2954,7 @@ int main(int argc, char **argv, char **envp)
> >      qemu_add_opts(&qemu_object_opts);
> >      qemu_add_opts(&qemu_tpmdev_opts);
> >      qemu_add_opts(&qemu_realtime_opts);
> > +    qemu_add_opts(&qemu_mem_opts);
> >  
> >      runstate_init();
> >  
> > @@ -3224,21 +3246,40 @@ int main(int argc, char **argv, char **envp)
> >                  exit(0);
> >                  break;
> >              case QEMU_OPTION_m: {
> > -                int64_t value;
> >                  uint64_t sz;
> > -                char *end;
> > +                const char *end;
> > +                char *s;
> >  
> > -                value = strtosz(optarg, &end);
> > -                if (value < 0 || *end) {
> > -                    fprintf(stderr, "qemu: invalid ram size: %s\n", optarg);
> > +                opts = qemu_opts_parse(qemu_find_opts("memory-opts"),
> > +                                       optarg, 1);
> > +                if (!opts) {
> >                      exit(1);
> >                  }
> > -                sz = QEMU_ALIGN_UP((uint64_t)value, 8192);
> > +
> > +                /* fixup legacy sugffix-less format */
> > +                end = qemu_opt_get(opts, "mem");
> > +                if (g_ascii_isdigit(end[strlen(end) - 1])) {
> > +                    s = g_strconcat(end, "M", NULL);
> > +                    qemu_opt_set(opts, "mem", s);
> > +                    g_free(s);
> > +                }
> > +
> > +                sz = QEMU_ALIGN_UP(qemu_opt_get_size(opts, "mem", 0), 8192);
> >                  ram_size = sz;
> >                  if (ram_size != sz) {
> >                      fprintf(stderr, "qemu: ram size too large\n");
> >                      exit(1);
> >                  }
> > +                /* store aligned value for future use */
> > +                s = g_strdup_printf("%" PRIu64, sz);
> > +                qemu_opt_set(opts, "mem", s);
> > +                g_free(s);
> > +
> > +                sz = qemu_opt_get_size(opts, "maxmem", ram_size);
> > +                if (sz < ram_size) {
> > +                    fprintf(stderr, "qemu: maxmem must be > initial memory\n");
> > +                    exit(1);
> > +                }
> >                  break;
> >              }
> >  #ifdef CONFIG_TPM
> > 
> 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 03/16] vl: convert -m to qemu_opts_parse()
  2013-07-24  8:40     ` Igor Mammedov
@ 2013-07-24  9:04       ` Paolo Bonzini
  2013-07-24  9:27         ` Igor Mammedov
  0 siblings, 1 reply; 48+ messages in thread
From: Paolo Bonzini @ 2013-07-24  9:04 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: vasilis.liaskovitis, hutao, qemu-devel

Il 24/07/2013 10:40, Igor Mammedov ha scritto:
> On Tue, 23 Jul 2013 19:11:31 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> Il 23/07/2013 18:22, Igor Mammedov ha scritto:
>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>>> ---
>>>  qemu-options.hx |    9 +++++++--
>>>  vl.c            |   53 +++++++++++++++++++++++++++++++++++++++++++++++------
>>>  2 files changed, 54 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/qemu-options.hx b/qemu-options.hx
>>> index 137a39b..f799b3d 100644
>>> --- a/qemu-options.hx
>>> +++ b/qemu-options.hx
>>> @@ -210,8 +210,13 @@ use is discouraged as it may be removed from future versions.
>>>  ETEXI
>>>  
>>>  DEF("m", HAS_ARG, QEMU_OPTION_m,
>>> -    "-m megs         set virtual RAM size to megs MB [default="
>>> -    stringify(DEFAULT_RAM_SIZE) "]\n", QEMU_ARCH_ALL)
>>> +    "-m [mem=]megs[,slots=n,maxmem=size]\n"
>>> +    "                set virtual RAM size to megs MB [default="
>>> +    stringify(DEFAULT_RAM_SIZE) "]\n"
>>> +    "                mem=start-up memory amount\n"
>>> +    "                slots=maximum number of hotplug slots\n"
>>> +    "                maxmem=maximum total amount of memory\n",
>>> +    QEMU_ARCH_ALL)
>>>  STEXI
>>>  @item -m @var{megs}
>>>  @findex -m
>>> diff --git a/vl.c b/vl.c
>>> index bf0c658..16c6f1e 100644
>>> --- a/vl.c
>>> +++ b/vl.c
>>> @@ -516,6 +516,27 @@ static QemuOptsList qemu_realtime_opts = {
>>>      },
>>>  };
>>>  
>>> +static QemuOptsList qemu_mem_opts = {
>>> +    .name = "memory-opts",
>>> +    .implied_opt_name = "mem",
>>> +    .head = QTAILQ_HEAD_INITIALIZER(qemu_mem_opts.head),
>>
>> This should have
>>
>>     .merge_lists = true,
> 
> Just to clarify: is it to allow syntax like?
>  -m 512 -m slots=X -m maxmem=Y

Yes.  In general, if "id" doesn't make sense the QemuOptsList should
have merge_lists=true.

Paolo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 03/16] vl: convert -m to qemu_opts_parse()
  2013-07-24  9:04       ` Paolo Bonzini
@ 2013-07-24  9:27         ` Igor Mammedov
  0 siblings, 0 replies; 48+ messages in thread
From: Igor Mammedov @ 2013-07-24  9:27 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: vasilis.liaskovitis, hutao, qemu-devel

On Wed, 24 Jul 2013 11:04:14 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 24/07/2013 10:40, Igor Mammedov ha scritto:
> > On Tue, 23 Jul 2013 19:11:31 +0200
> > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > 
> >> Il 23/07/2013 18:22, Igor Mammedov ha scritto:
> >>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> >>> ---
> >>>  qemu-options.hx |    9 +++++++--
> >>>  vl.c            |   53 +++++++++++++++++++++++++++++++++++++++++++++++------
> >>>  2 files changed, 54 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/qemu-options.hx b/qemu-options.hx
> >>> index 137a39b..f799b3d 100644
> >>> --- a/qemu-options.hx
> >>> +++ b/qemu-options.hx
> >>> @@ -210,8 +210,13 @@ use is discouraged as it may be removed from future versions.
> >>>  ETEXI
> >>>  
> >>>  DEF("m", HAS_ARG, QEMU_OPTION_m,
> >>> -    "-m megs         set virtual RAM size to megs MB [default="
> >>> -    stringify(DEFAULT_RAM_SIZE) "]\n", QEMU_ARCH_ALL)
> >>> +    "-m [mem=]megs[,slots=n,maxmem=size]\n"
> >>> +    "                set virtual RAM size to megs MB [default="
> >>> +    stringify(DEFAULT_RAM_SIZE) "]\n"
> >>> +    "                mem=start-up memory amount\n"
> >>> +    "                slots=maximum number of hotplug slots\n"
> >>> +    "                maxmem=maximum total amount of memory\n",
> >>> +    QEMU_ARCH_ALL)
> >>>  STEXI
> >>>  @item -m @var{megs}
> >>>  @findex -m
> >>> diff --git a/vl.c b/vl.c
> >>> index bf0c658..16c6f1e 100644
> >>> --- a/vl.c
> >>> +++ b/vl.c
> >>> @@ -516,6 +516,27 @@ static QemuOptsList qemu_realtime_opts = {
> >>>      },
> >>>  };
> >>>  
> >>> +static QemuOptsList qemu_mem_opts = {
> >>> +    .name = "memory-opts",
> >>> +    .implied_opt_name = "mem",
> >>> +    .head = QTAILQ_HEAD_INITIALIZER(qemu_mem_opts.head),
> >>
> >> This should have
> >>
> >>     .merge_lists = true,
> > 
> > Just to clarify: is it to allow syntax like?
> >  -m 512 -m slots=X -m maxmem=Y
> 
> Yes.  In general, if "id" doesn't make sense the QemuOptsList should
> have merge_lists=true.

Thanks, pushed to memhp-wip branch

> Paolo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation
  2013-07-24  8:36     ` Igor Mammedov
@ 2013-07-24  9:41       ` Paolo Bonzini
  2013-07-24 11:34         ` Igor Mammedov
  0 siblings, 1 reply; 48+ messages in thread
From: Paolo Bonzini @ 2013-07-24  9:41 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: vasilis.liaskovitis, hutao, qemu-devel

Il 24/07/2013 10:36, Igor Mammedov ha scritto:
> On Tue, 23 Jul 2013 19:09:26 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> Il 23/07/2013 18:23, Igor Mammedov ha scritto:
>>> - if slot property is not specified on -device/device_add command,
>>> treat default value as request for assigning DimmDevice to
>>> the first free slot.
>>
>> Even with "-m" instead of "-numa mem", I think this is problematic
>> because we still need to separate the host and guest parts of the DIMM
>> device.  "-numa mem" (or the QMP command that Wanlong added) will be
>> necessary to allocate memory on the host side before adding a DIMM.
> why not do host allocation part at the same time when DIMM is added, is
> there a real need to separate DIMM device?
> 
> I probably miss something but -numa mem option and co aside what problem
> couldn't be solved during DIMM device initialization and would require
> a split DIMM device?

Because otherwise, every option we add to "-numa mem" will have to be
added to "-device dimm".  For example,

   -device dimm,policy=interleave

makes no sense to me.

In fact, this is no different from having to do drive_add or netdev_add
before device_add.  First you tell QEMU about the host resources to use,
then you add the guest device and bind the device to those resources.

>> So slots will have three states: free (created with "-m"), allocated (a
>> free slot moves to this state with "-numa mem...,populated=no" when
>> migrating, or with the QMP command for regular hotplug), populated (an
>> allocated slot moves to this state with "-device dimm").
>>
>> You would be able to plug a DIMM only into an allocated slot, and the
>> size will be specified on the slot rather than the DIMM device.
> 'slot' property is there only for migration sake to provide stable
> numeric ID for QEMU<->ACPI BIOS interface. It's not used for any other
> purpose and wasn't intended for any other usage..

How would you otherwise refer to the memory you want to affect in a
set-mem-policy monitor command?

> on baremetal slot has noting to do with size of plugged in DIMM,

On baremetal slots also belong to a specific NUMA node, for what it's
worth.  There are going to be differences with baremetal no matter what.

> why we
> would model it other way if it only brings problems: like predefined size,

It doesn't have to be predefined.  In the previous discussions (and also
based on Vasilis and Hu Tao's implementations) I assumed predefined slot
sizes.  Now I understand the benefit of having a simpler command-line
with "-m", but then in return you need three slot states instead of just
unpopulated/populated.

So you'd just do

   set-mem-policy 0 size=2G      # free->allocated
   device_add dimm,slotid=0      # allocated->populated

to hotplug a 2G DIMM.  And you'll be able to pin it to host NUMA nodes,
and assign it to guest NUMA nodes, like this:

   set-mem-policy 0 size=2G,nodeid=1,policy=membind host-nodes=0-1
   device_add dimm,slotid=0

Again, this is the same as drive_add/device_add.

Paolo

> allocated, free etc. I think slot should be either free or busy.
> 
> 
>>
>> In general, I don't think free slots should be managed by the DimmBus,
>> and host vs. guest separation should be there even if we accept your
>> "-m" extension (doesn't look bad at all, I must say).
>>
>> Paolo
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
                   ` (15 preceding siblings ...)
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 16/16] pc: ACPI BIOS: reserve SRAT entry for hotplug mem hole Igor Mammedov
@ 2013-07-24  9:52 ` Hu Tao
  2013-07-24 10:02   ` Igor Mammedov
  2013-08-02 12:35 ` Anthony Liguori
  2013-09-11  4:01 ` Hu Tao
  18 siblings, 1 reply; 48+ messages in thread
From: Hu Tao @ 2013-07-24  9:52 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: vasilis.liaskovitis, pbonzini, qemu-devel

v6 doesn't work here, things are going fine until online hotplugged
memory in guest.

steps:

1. qemu cmd:

  ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 512,maxmem=2G,slots=1 \
  -hda /mnt/data/libvirt-images/hut-rhel6.3.img -L ../pc-bios-memhp/

  (bios is from MST's acpi tree)

2. hot-plug a dimm:

  device_adddimm,id=d0,size=1G

3. online hotplugged memory(in guest):

  echo 'onlone' > /sys/devices/system/memory/memory/32/state

then after several seconds the console prints error messages like:

  nommu_map_sg: overflow 107c15000+4096 of device mask ffffffff
  ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
  ata1.00: cmd ca/00:10:d0:0d:a4/00:00:00:00:00/e0 tag 0 dma 8192 out
           res 50/00:00:08:09:e0/00:00:00:00:00/e0 Emask 0x40 (internal error)
  ata1.00: configured for MWDMA2
  ata1: EH complete

  (repeat)

and can't do any disk I/O.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug
  2013-07-24  9:52 ` [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Hu Tao
@ 2013-07-24 10:02   ` Igor Mammedov
  2013-07-24 10:58     ` Vasilis Liaskovitis
  0 siblings, 1 reply; 48+ messages in thread
From: Igor Mammedov @ 2013-07-24 10:02 UTC (permalink / raw)
  To: Hu Tao; +Cc: vasilis.liaskovitis, pbonzini, qemu-devel

On Wed, 24 Jul 2013 17:52:50 +0800
Hu Tao <hutao@cn.fujitsu.com> wrote:

> v6 doesn't work here, things are going fine until online hotplugged
> memory in guest.
> 
> steps:
> 
> 1. qemu cmd:
> 
>   ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 512,maxmem=2G,slots=1 \
>   -hda /mnt/data/libvirt-images/hut-rhel6.3.img -L ../pc-bios-memhp/
> 
>   (bios is from MST's acpi tree)
> 
> 2. hot-plug a dimm:
> 
>   device_adddimm,id=d0,size=1G
> 
> 3. online hotplugged memory(in guest):
> 
>   echo 'onlone' > /sys/devices/system/memory/memory/32/state
> 
> then after several seconds the console prints error messages like:
> 
>   nommu_map_sg: overflow 107c15000+4096 of device mask ffffffff
>   ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
>   ata1.00: cmd ca/00:10:d0:0d:a4/00:00:00:00:00/e0 tag 0 dma 8192 out
>            res 50/00:00:08:09:e0/00:00:00:00:00/e0 Emask 0x40 (internal error)
>   ata1.00: configured for MWDMA2
>   ata1: EH complete
> 
>   (repeat)
> 
> and can't do any disk I/O.
Looks like a guest bug where it tries to use high memory but assumes low one.
if you boot guest with initial memory 4Gb then it wont hit issue or use FC18
which doesn't have this problem.



> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug
  2013-07-24 10:02   ` Igor Mammedov
@ 2013-07-24 10:58     ` Vasilis Liaskovitis
  0 siblings, 0 replies; 48+ messages in thread
From: Vasilis Liaskovitis @ 2013-07-24 10:58 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: Hu Tao, qemu-devel, pbonzini

On Wed, Jul 24, 2013 at 12:02:46PM +0200, Igor Mammedov wrote:
> On Wed, 24 Jul 2013 17:52:50 +0800
> Hu Tao <hutao@cn.fujitsu.com> wrote:
> 
> > v6 doesn't work here, things are going fine until online hotplugged
> > memory in guest.
> > 
> > steps:
> > 
> > 1. qemu cmd:
> > 
> >   ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 512,maxmem=2G,slots=1 \
> >   -hda /mnt/data/libvirt-images/hut-rhel6.3.img -L ../pc-bios-memhp/
> > 
> >   (bios is from MST's acpi tree)
> > 
> > 2. hot-plug a dimm:
> > 
> >   device_adddimm,id=d0,size=1G
> > 
> > 3. online hotplugged memory(in guest):
> > 
> >   echo 'onlone' > /sys/devices/system/memory/memory/32/state
> > 
> > then after several seconds the console prints error messages like:
> > 
> >   nommu_map_sg: overflow 107c15000+4096 of device mask ffffffff
> >   ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> >   ata1.00: cmd ca/00:10:d0:0d:a4/00:00:00:00:00/e0 tag 0 dma 8192 out
> >            res 50/00:00:08:09:e0/00:00:00:00:00/e0 Emask 0x40 (internal error)
> >   ata1.00: configured for MWDMA2
> >   ata1: EH complete
> > 
> >   (repeat)
> > 
> > and can't do any disk I/O.
> Looks like a guest bug where it tries to use high memory but assumes low one.

yes. Iirc booting the guest kernel with "swiotlb=force" option could also
work around this.

> if you boot guest with initial memory 4Gb then it wont hit issue or use FC18
> which doesn't have this problem.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation
  2013-07-24  9:41       ` Paolo Bonzini
@ 2013-07-24 11:34         ` Igor Mammedov
  2013-07-24 12:41           ` Paolo Bonzini
  0 siblings, 1 reply; 48+ messages in thread
From: Igor Mammedov @ 2013-07-24 11:34 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: vasilis.liaskovitis, hutao, qemu-devel

On Wed, 24 Jul 2013 11:41:04 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 24/07/2013 10:36, Igor Mammedov ha scritto:
> > On Tue, 23 Jul 2013 19:09:26 +0200
> > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > 
> >> Il 23/07/2013 18:23, Igor Mammedov ha scritto:
> >>> - if slot property is not specified on -device/device_add command,
> >>> treat default value as request for assigning DimmDevice to
> >>> the first free slot.
> >>
> >> Even with "-m" instead of "-numa mem", I think this is problematic
> >> because we still need to separate the host and guest parts of the DIMM
> >> device.  "-numa mem" (or the QMP command that Wanlong added) will be
> >> necessary to allocate memory on the host side before adding a DIMM.
> > why not do host allocation part at the same time when DIMM is added, is
> > there a real need to separate DIMM device?
> > 
> > I probably miss something but -numa mem option and co aside what problem
> > couldn't be solved during DIMM device initialization and would require
> > a split DIMM device?
> 
> Because otherwise, every option we add to "-numa mem" will have to be
> added to "-device dimm".  For example,
> 
>    -device dimm,policy=interleave
if it's feature of DIMM device sure, if it is not lets find a better
place for it. See below for an alternative approach.

> 
> makes no sense to me.
> 
> In fact, this is no different from having to do drive_add or netdev_add
> before device_add.  First you tell QEMU about the host resources to use,
> then you add the guest device and bind the device to those resources.
> 
> >> So slots will have three states: free (created with "-m"), allocated (a
> >> free slot moves to this state with "-numa mem...,populated=no" when
> >> migrating, or with the QMP command for regular hotplug), populated (an
> >> allocated slot moves to this state with "-device dimm").
> >>
> >> You would be able to plug a DIMM only into an allocated slot, and the
> >> size will be specified on the slot rather than the DIMM device.
> > 'slot' property is there only for migration sake to provide stable
> > numeric ID for QEMU<->ACPI BIOS interface. It's not used for any other
> > purpose and wasn't intended for any other usage..
> 
> How would you otherwise refer to the memory you want to affect in a
> set-mem-policy monitor command?
could be 'id' property or even better a QOM path

> 
> > on baremetal slot has noting to do with size of plugged in DIMM,
> 
> On baremetal slots also belong to a specific NUMA node, for what it's
> worth.  There are going to be differences with baremetal no matter what.
sure we can deviate here, but I don't see full picture yet so I'm trying
to find justification for it first and asking questions. Maybe a better
solution will be found.

> 
> > why we
> > would model it other way if it only brings problems: like predefined size,
> 
> It doesn't have to be predefined.  In the previous discussions (and also
> based on Vasilis and Hu Tao's implementations) I assumed predefined slot
> sizes.  Now I understand the benefit of having a simpler command-line
> with "-m", but then in return you need three slot states instead of just
> unpopulated/populated.
> 
> So you'd just do
> 
>    set-mem-policy 0 size=2G      # free->allocated
>    device_add dimm,slotid=0      # allocated->populated
> 
> to hotplug a 2G DIMM.  And you'll be able to pin it to host NUMA nodes,
> and assign it to guest NUMA nodes, like this:
> 
>    set-mem-policy 0 size=2G,nodeid=1,policy=membind host-nodes=0-1
>    device_add dimm,slotid=0
Do policy and other -numa mem properties belong to a particular DIMM device
or rather to a particular NUMA node?

How about following idea: guest-node maps to a specific host-node, then
when we plug DIMM, guest node provides information on policies and whatever
to the creator of DIMM device (via DimmBus and/or mhc) which allocates
memory, applies policies and binds new memory to a specific host node.
That would eliminate 2 stage approach.

in this case DIMM device only needs to specify where it's plugged in, using
'node' property (now number but could become QOM path to NUMA node object).

Ideally it would be QOM hierarchy:

/nodeX/@dimmbus/dimm_device
where even 'node' property would become obsolete, just specify right
bus to attach DIMM device to.

PS:
we need a similar QOM hierarchy for CPUs as well to sort out
-numa cpus=ids mess.

> 
> Again, this is the same as drive_add/device_add.
> 
> Paolo
> 
> > allocated, free etc. I think slot should be either free or busy.
> > 
> > 
> >>
> >> In general, I don't think free slots should be managed by the DimmBus,
> >> and host vs. guest separation should be there even if we accept your
> >> "-m" extension (doesn't look bad at all, I must say).
> >>
> >> Paolo
> > 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation
  2013-07-24 11:34         ` Igor Mammedov
@ 2013-07-24 12:41           ` Paolo Bonzini
  2013-07-26  7:38             ` Igor Mammedov
  0 siblings, 1 reply; 48+ messages in thread
From: Paolo Bonzini @ 2013-07-24 12:41 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: vasilis.liaskovitis, hutao, qemu-devel

Il 24/07/2013 13:34, Igor Mammedov ha scritto:
> On Wed, 24 Jul 2013 11:41:04 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> Il 24/07/2013 10:36, Igor Mammedov ha scritto:
>>> On Tue, 23 Jul 2013 19:09:26 +0200
>>> Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>
>>>> Il 23/07/2013 18:23, Igor Mammedov ha scritto:
>>>>> - if slot property is not specified on -device/device_add command,
>>>>> treat default value as request for assigning DimmDevice to
>>>>> the first free slot.
>>>>
>>>> Even with "-m" instead of "-numa mem", I think this is problematic
>>>> because we still need to separate the host and guest parts of the DIMM
>>>> device.  "-numa mem" (or the QMP command that Wanlong added) will be
>>>> necessary to allocate memory on the host side before adding a DIMM.
>>> why not do host allocation part at the same time when DIMM is added, is
>>> there a real need to separate DIMM device?
>>>
>>> I probably miss something but -numa mem option and co aside what problem
>>> couldn't be solved during DIMM device initialization and would require
>>> a split DIMM device?
>>
>> Because otherwise, every option we add to "-numa mem" will have to be
>> added to "-device dimm".  For example,
>>
>>    -device dimm,policy=interleave
> if it's feature of DIMM device sure, if it is not lets find a better
> place for it. See below for an alternative approach.
> 
>>
>> makes no sense to me.
>>
>> In fact, this is no different from having to do drive_add or netdev_add
>> before device_add.  First you tell QEMU about the host resources to use,
>> then you add the guest device and bind the device to those resources.
>>
>>>> So slots will have three states: free (created with "-m"), allocated (a
>>>> free slot moves to this state with "-numa mem...,populated=no" when
>>>> migrating, or with the QMP command for regular hotplug), populated (an
>>>> allocated slot moves to this state with "-device dimm").
>>>>
>>>> You would be able to plug a DIMM only into an allocated slot, and the
>>>> size will be specified on the slot rather than the DIMM device.
>>> 'slot' property is there only for migration sake to provide stable
>>> numeric ID for QEMU<->ACPI BIOS interface. It's not used for any other
>>> purpose and wasn't intended for any other usage..
>>
>> How would you otherwise refer to the memory you want to affect in a
>> set-mem-policy monitor command?
> could be 'id' property or even better a QOM path
> 
>>
>>> on baremetal slot has noting to do with size of plugged in DIMM,
>>
>> On baremetal slots also belong to a specific NUMA node, for what it's
>> worth.  There are going to be differences with baremetal no matter what.
> sure we can deviate here, but I don't see full picture yet so I'm trying
> to find justification for it first and asking questions. Maybe a better
> solution will be found.
> 
>>
>>> why we
>>> would model it other way if it only brings problems: like predefined size,
>>
>> It doesn't have to be predefined.  In the previous discussions (and also
>> based on Vasilis and Hu Tao's implementations) I assumed predefined slot
>> sizes.  Now I understand the benefit of having a simpler command-line
>> with "-m", but then in return you need three slot states instead of just
>> unpopulated/populated.
>>
>> So you'd just do
>>
>>    set-mem-policy 0 size=2G      # free->allocated
>>    device_add dimm,slotid=0      # allocated->populated
>>
>> to hotplug a 2G DIMM.  And you'll be able to pin it to host NUMA nodes,
>> and assign it to guest NUMA nodes, like this:
>>
>>    set-mem-policy 0 size=2G,nodeid=1,policy=membind host-nodes=0-1
>>    device_add dimm,slotid=0
> Do policy and other -numa mem properties belong to a particular DIMM device
> or rather to a particular NUMA node?
> 
> How about following idea: guest-node maps to a specific host-node, then
> when we plug DIMM, guest node provides information on policies and whatever
> to the creator of DIMM device (via DimmBus and/or mhc) which allocates
> memory, applies policies and binds new memory to a specific host node.
> That would eliminate 2 stage approach.

It makes sense.  My main worry is not to deviate from what we've been
doing for drives and netdevs (because that's a proven design).  Both
"-numa mem" and this proposal satisfy that goal.

I originally proposed "-numa mem" because Vasilis and Hu's patches were
relying on specifying predefined sizes for all slots.  So "-numa mem"
was a good fit for both memory hotplug (done Hu's way) and NUMA policy.
 It also simplified the command line which had a lot of "mem-" prefixed
options.

With the approach you suggest it may not be necessary at all, and we can
go back to just "-numa
node,cpus=0,mem=1G,mem-policy=membind,mem-hostnodes=0-1,cpu-hostnodes=0"
or something like that.

Whether it is workable, it depends on what granularity Wanlong/Hu want.

There may be some scenarios where per-slot policies make sense.  For
example, imagine that in general you want memory to be bound to the
corresponding host node.  It turns out some nodes are now fully
committed and others are free, and you need more memory on a VM.  You
can hotplug that memory without really caring about binding and
momentarily suffer some performance loss.

I agree that specifying the policy on every hotplug complicates
management and may be overkill.  But then, most guests are not NUMA at
all and you would hardly perceive the difference, you would just have to
separate

    set-mem-policy 0 size=2G
    device_add dimm,slot=0

instead of

    device_add dimm,slot,size=2G

which is not a big chore.

> in this case DIMM device only needs to specify where it's plugged in, using
> 'node' property (now number but could become QOM path to NUMA node object).

Yeah, then it's the same as the id.

Paolo

> Ideally it would be QOM hierarchy:
> 
> /nodeX/@dimmbus/dimm_device
> where even 'node' property would become obsolete, just specify right
> bus to attach DIMM device to.
> 
> PS:
> we need a similar QOM hierarchy for CPUs as well to sort out
> -numa cpus=ids mess.
> 
>>
>> Again, this is the same as drive_add/device_add.
>>
>> Paolo
>>
>>> allocated, free etc. I think slot should be either free or busy.
>>>
>>>
>>>>
>>>> In general, I don't think free slots should be managed by the DimmBus,
>>>> and host vs. guest separation should be there even if we accept your
>>>> "-m" extension (doesn't look bad at all, I must say).
>>>>
>>>> Paolo
>>>
>>
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 04/16] qapi: make visit_type_size fallback to type_int
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 04/16] qapi: make visit_type_size fallback to type_int Igor Mammedov
@ 2013-07-25  6:41   ` Hu Tao
  2013-07-25 11:35     ` Igor Mammedov
  0 siblings, 1 reply; 48+ messages in thread
From: Hu Tao @ 2013-07-25  6:41 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: vasilis.liaskovitis, pbonzini, qemu-devel

On Tue, Jul 23, 2013 at 06:23:00PM +0200, Igor Mammedov wrote:
> From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> 
> Currently visit_type_size checks if the visitor's type_size function pointer is
> NULL. If not, it calls it, otherwise it calls v->type_uint64(). But neither of
> these pointers are ever set. Fallback to calling v->type_int() in this third
> (default) case.
> 
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  qapi/qapi-visit-core.c |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c
> index 401ee6e..fcacaff 100644
> --- a/qapi/qapi-visit-core.c
> +++ b/qapi/qapi-visit-core.c
> @@ -238,8 +238,17 @@ void visit_type_int64(Visitor *v, int64_t *obj, const char *name, Error **errp)
>  
>  void visit_type_size(Visitor *v, uint64_t *obj, const char *name, Error **errp)
>  {
> +    int64_t value;
>      if (!error_is_set(errp)) {
> -        (v->type_size ? v->type_size : v->type_uint64)(v, obj, name, errp);
> +        if (v->type_size) {
> +            v->type_size(v, obj, name, errp);
> +        } else if (v->type_uint64) {
> +            v->type_uint64(v, obj, name, errp);
> +        } else {
> +            value = *obj;
> +            v->type_int(v, &value, name, errp);
> +            *obj = value;
> +        }
>      }
>  }

This doesn't address comment from Michael Roth, quoted below:

---
I'd recommend just doing:

  if (v->type_size) {
      v->type_size(v, obj, name, errp);
  } else {
      visit_type_uint64(v, obj, name, errp);
  }

visit_type_uint64() already handles the fallback to visit_type_int() so no
need to duplicate.
---

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 06/16] dimm: implement dimm device abstraction
  2013-07-23 16:23 ` [Qemu-devel] [PATCH 06/16] dimm: implement dimm device abstraction Igor Mammedov
@ 2013-07-25  6:52   ` Hu Tao
  0 siblings, 0 replies; 48+ messages in thread
From: Hu Tao @ 2013-07-25  6:52 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: vasilis.liaskovitis, pbonzini, qemu-devel

On Tue, Jul 23, 2013 at 06:23:02PM +0200, Igor Mammedov wrote:
> From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

<...>

> +
> +/**
> + * DimmBus:

DimmDevice

> + * @start: starting physical address, where @DimmDevice is mapped.
> + * @size: amount of memory mapped at @start.
> + * @node: numa node to which @DimmDevice is attached.
> + * @slot: slot number into which @DimmDevice is plugged in.
> + */
> +typedef struct DimmDevice {
> +    DeviceState qdev;
> +    ram_addr_t start;
> +    ram_addr_t size;
> +    uint32_t node;
> +    int32_t slot;
> +    MemoryRegion mr;
> +} DimmDevice;
> +
> +typedef struct DimmDeviceClass {
> +    DeviceClass parent_class;
> +} DimmDeviceClass;
> +
> +#define TYPE_DIMM_BUS "dimmbus"
> +#define DIMM_BUS(obj) OBJECT_CHECK(DimmBus, (obj), TYPE_DIMM_BUS)
> +#define DIMM_BUS_CLASS(klass) \
> +    OBJECT_CLASS_CHECK(DimmBusClass, (klass), TYPE_DIMM_BUS)
> +#define DIMM_BUS_GET_CLASS(obj) \
> +    OBJECT_GET_CLASS(DimmBusClass, (obj), TYPE_DIMM_BUS)
> +
> +/**
> + * DimmBus:
> + */
> +typedef struct DimmBus {
> +    BusState qbus;
> +} DimmBus;
> +
> +#endif
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 04/16] qapi: make visit_type_size fallback to type_int
  2013-07-25  6:41   ` Hu Tao
@ 2013-07-25 11:35     ` Igor Mammedov
  0 siblings, 0 replies; 48+ messages in thread
From: Igor Mammedov @ 2013-07-25 11:35 UTC (permalink / raw)
  To: Hu Tao; +Cc: vasilis.liaskovitis, pbonzini, qemu-devel

On Thu, 25 Jul 2013 14:41:36 +0800
Hu Tao <hutao@cn.fujitsu.com> wrote:

> On Tue, Jul 23, 2013 at 06:23:00PM +0200, Igor Mammedov wrote:
> > From: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> > 
> > Currently visit_type_size checks if the visitor's type_size function pointer is
> > NULL. If not, it calls it, otherwise it calls v->type_uint64(). But neither of
> > these pointers are ever set. Fallback to calling v->type_int() in this third
> > (default) case.
> > 
> > Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  qapi/qapi-visit-core.c |   11 ++++++++++-
> >  1 files changed, 10 insertions(+), 1 deletions(-)
> > 
> > diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c
> > index 401ee6e..fcacaff 100644
> > --- a/qapi/qapi-visit-core.c
> > +++ b/qapi/qapi-visit-core.c
> > @@ -238,8 +238,17 @@ void visit_type_int64(Visitor *v, int64_t *obj, const char *name, Error **errp)
> >  
> >  void visit_type_size(Visitor *v, uint64_t *obj, const char *name, Error **errp)
> >  {
> > +    int64_t value;
> >      if (!error_is_set(errp)) {
> > -        (v->type_size ? v->type_size : v->type_uint64)(v, obj, name, errp);
> > +        if (v->type_size) {
> > +            v->type_size(v, obj, name, errp);
> > +        } else if (v->type_uint64) {
> > +            v->type_uint64(v, obj, name, errp);
> > +        } else {
> > +            value = *obj;
> > +            v->type_int(v, &value, name, errp);
> > +            *obj = value;
> > +        }
> >      }
> >  }
> 
> This doesn't address comment from Michael Roth, quoted below:
> 
> ---
> I'd recommend just doing:
> 
>   if (v->type_size) {
>       v->type_size(v, obj, name, errp);
>   } else {
>       visit_type_uint64(v, obj, name, errp);
>   }
> 
> visit_type_uint64() already handles the fallback to visit_type_int() so no
> need to duplicate.
> ---
> 

I guess we can just drop this patch.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation
  2013-07-24 12:41           ` Paolo Bonzini
@ 2013-07-26  7:38             ` Igor Mammedov
  2013-07-26  9:26               ` Paolo Bonzini
  0 siblings, 1 reply; 48+ messages in thread
From: Igor Mammedov @ 2013-07-26  7:38 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: vasilis.liaskovitis, hutao, qemu-devel

On Wed, 24 Jul 2013 14:41:36 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 24/07/2013 13:34, Igor Mammedov ha scritto:
> > On Wed, 24 Jul 2013 11:41:04 +0200
> > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > 
> >> Il 24/07/2013 10:36, Igor Mammedov ha scritto:
> >>> On Tue, 23 Jul 2013 19:09:26 +0200
> >>> Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>>
> >>>> Il 23/07/2013 18:23, Igor Mammedov ha scritto:
> >>>>> - if slot property is not specified on -device/device_add command,
> >>>>> treat default value as request for assigning DimmDevice to
> >>>>> the first free slot.
> >>>>
> >>>> Even with "-m" instead of "-numa mem", I think this is problematic
> >>>> because we still need to separate the host and guest parts of the DIMM
> >>>> device.  "-numa mem" (or the QMP command that Wanlong added) will be
> >>>> necessary to allocate memory on the host side before adding a DIMM.
> >>> why not do host allocation part at the same time when DIMM is added, is
> >>> there a real need to separate DIMM device?
> >>>
> >>> I probably miss something but -numa mem option and co aside what problem
> >>> couldn't be solved during DIMM device initialization and would require
> >>> a split DIMM device?
> >>
> >> Because otherwise, every option we add to "-numa mem" will have to be
> >> added to "-device dimm".  For example,
> >>
> >>    -device dimm,policy=interleave
> > if it's feature of DIMM device sure, if it is not lets find a better
> > place for it. See below for an alternative approach.
> > 
> >>
> >> makes no sense to me.
> >>
> >> In fact, this is no different from having to do drive_add or netdev_add
> >> before device_add.  First you tell QEMU about the host resources to use,
> >> then you add the guest device and bind the device to those resources.
> >>
> >>>> So slots will have three states: free (created with "-m"), allocated (a
> >>>> free slot moves to this state with "-numa mem...,populated=no" when
> >>>> migrating, or with the QMP command for regular hotplug), populated (an
> >>>> allocated slot moves to this state with "-device dimm").
> >>>>
> >>>> You would be able to plug a DIMM only into an allocated slot, and the
> >>>> size will be specified on the slot rather than the DIMM device.
> >>> 'slot' property is there only for migration sake to provide stable
> >>> numeric ID for QEMU<->ACPI BIOS interface. It's not used for any other
> >>> purpose and wasn't intended for any other usage..
> >>
> >> How would you otherwise refer to the memory you want to affect in a
> >> set-mem-policy monitor command?
> > could be 'id' property or even better a QOM path
> > 
> >>
> >>> on baremetal slot has noting to do with size of plugged in DIMM,
> >>
> >> On baremetal slots also belong to a specific NUMA node, for what it's
> >> worth.  There are going to be differences with baremetal no matter what.
> > sure we can deviate here, but I don't see full picture yet so I'm trying
> > to find justification for it first and asking questions. Maybe a better
> > solution will be found.
> > 
> >>
> >>> why we
> >>> would model it other way if it only brings problems: like predefined size,
> >>
> >> It doesn't have to be predefined.  In the previous discussions (and also
> >> based on Vasilis and Hu Tao's implementations) I assumed predefined slot
> >> sizes.  Now I understand the benefit of having a simpler command-line
> >> with "-m", but then in return you need three slot states instead of just
> >> unpopulated/populated.
> >>
> >> So you'd just do
> >>
> >>    set-mem-policy 0 size=2G      # free->allocated
> >>    device_add dimm,slotid=0      # allocated->populated
> >>
> >> to hotplug a 2G DIMM.  And you'll be able to pin it to host NUMA nodes,
> >> and assign it to guest NUMA nodes, like this:
> >>
> >>    set-mem-policy 0 size=2G,nodeid=1,policy=membind host-nodes=0-1
> >>    device_add dimm,slotid=0
> > Do policy and other -numa mem properties belong to a particular DIMM device
> > or rather to a particular NUMA node?
> > 
> > How about following idea: guest-node maps to a specific host-node, then
> > when we plug DIMM, guest node provides information on policies and whatever
> > to the creator of DIMM device (via DimmBus and/or mhc) which allocates
> > memory, applies policies and binds new memory to a specific host node.
> > That would eliminate 2 stage approach.
> 
> It makes sense.  My main worry is not to deviate from what we've been
> doing for drives and netdevs (because that's a proven design).  Both
> "-numa mem" and this proposal satisfy that goal.
> 
> I originally proposed "-numa mem" because Vasilis and Hu's patches were
> relying on specifying predefined sizes for all slots.  So "-numa mem"
> was a good fit for both memory hotplug (done Hu's way) and NUMA policy.
>  It also simplified the command line which had a lot of "mem-" prefixed
> options.
> 
> With the approach you suggest it may not be necessary at all, and we can
> go back to just "-numa
> node,cpus=0,mem=1G,mem-policy=membind,mem-hostnodes=0-1,cpu-hostnodes=0"
> or something like that.
> 
> Whether it is workable, it depends on what granularity Wanlong/Hu want.
> 
> There may be some scenarios where per-slot policies make sense.  For
> example, imagine that in general you want memory to be bound to the
> corresponding host node.  It turns out some nodes are now fully
> committed and others are free, and you need more memory on a VM.  You
> can hotplug that memory without really caring about binding and
> momentarily suffer some performance loss.
Perhaps denying memory add and suggesting node migration to a node with
more memory would be right approach, otherwise user is bound to be hit by
cross node penalty.

> 
> I agree that specifying the policy on every hotplug complicates
> management and may be overkill.  But then, most guests are not NUMA at
> all and you would hardly perceive the difference, you would just have to
> separate
> 
>     set-mem-policy 0 size=2G
>     device_add dimm,slot=0
I'm confused, size is inherent property of generic DimmDevice and policies
are NUMA specific of node.
If there is need for per DIMM policies, I'd store NUMA specific policy inside
object that implements it (allocates memory), and apply them to DIMM when
it's realized.

    set-mem-policy nodeid=X,mem-hostnodes=Z,dev=dimmY
    device_add dimm,id=dimmY,size=2G,node=X

> instead of
> 
>     device_add dimm,slot,size=2G
> 
> which is not a big chore.
> 
> > in this case DIMM device only needs to specify where it's plugged in, using
> > 'node' property (now number but could become QOM path to NUMA node object).
> 
> Yeah, then it's the same as the id.
> 
> Paolo
> 
> > Ideally it would be QOM hierarchy:
> > 
> > /nodeX/@dimmbus/dimm_device
> > where even 'node' property would become obsolete, just specify right
> > bus to attach DIMM device to.
> > 
> > PS:
> > we need a similar QOM hierarchy for CPUs as well to sort out
> > -numa cpus=ids mess.
> > 
> >>
> >> Again, this is the same as drive_add/device_add.
> >>
> >> Paolo
> >>
> >>> allocated, free etc. I think slot should be either free or busy.
> >>>
> >>>
> >>>>
> >>>> In general, I don't think free slots should be managed by the DimmBus,
> >>>> and host vs. guest separation should be there even if we accept your
> >>>> "-m" extension (doesn't look bad at all, I must say).
> >>>>
> >>>> Paolo
> >>>
> >>
> > 
> 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation
  2013-07-26  7:38             ` Igor Mammedov
@ 2013-07-26  9:26               ` Paolo Bonzini
  2013-07-26 12:51                 ` Igor Mammedov
  0 siblings, 1 reply; 48+ messages in thread
From: Paolo Bonzini @ 2013-07-26  9:26 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: vasilis.liaskovitis, hutao, qemu-devel

Il 26/07/2013 09:38, Igor Mammedov ha scritto:
> Perhaps denying memory add and suggesting node migration to a node with
> more memory would be right approach, otherwise user is bound to be hit by
> cross node penalty.

Or better, the user can first change the policy from "bind" to
"preferred", and then hotplug.

>> > I agree that specifying the policy on every hotplug complicates
>> > management and may be overkill.  But then, most guests are not NUMA at
>> > all and you would hardly perceive the difference, you would just have to
>> > separate
>> > 
>> >     set-mem-policy 0 size=2G
>> >     device_add dimm,slot=0
> I'm confused, size is inherent property of generic DimmDevice and policies
> are NUMA specific of node.

No, the size is not a property of the DimmDevice, it is a property of
the host object that backs the DimmDevice.  This is like the size is not
a property of a disk, but rather of the image that backs it.

Now, there may be a good reason to remove the host/guest distinction in
the case of memory, but I'm still not sure this is the case.

In the case of NUMA policies, I talked to Andrea Arcangeli and indeed
his suggestion was to have a single policy per guest node (typically
bind or preferred if guest node size <= host node size, interleave if
guest node size > host node size).

However, there are other properties of the memory object (e.g. hugetlbfs
path) that could be customized for every slot.

Paolo

> If there is need for per DIMM policies, I'd store NUMA specific policy inside
> object that implements it (allocates memory), and apply them to DIMM when
> it's realized.
> 
>     set-mem-policy nodeid=X,mem-hostnodes=Z,dev=dimmY
>     device_add dimm,id=dimmY,size=2G,node=X
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation
  2013-07-26  9:26               ` Paolo Bonzini
@ 2013-07-26 12:51                 ` Igor Mammedov
  2013-07-26 14:37                   ` Paolo Bonzini
  0 siblings, 1 reply; 48+ messages in thread
From: Igor Mammedov @ 2013-07-26 12:51 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: vasilis.liaskovitis, hutao, qemu-devel

On Fri, 26 Jul 2013 11:26:16 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 26/07/2013 09:38, Igor Mammedov ha scritto:
> > Perhaps denying memory add and suggesting node migration to a node with
> > more memory would be right approach, otherwise user is bound to be hit by
> > cross node penalty.
> 
> Or better, the user can first change the policy from "bind" to
> "preferred", and then hotplug.
> 
> >> > I agree that specifying the policy on every hotplug complicates
> >> > management and may be overkill.  But then, most guests are not NUMA at
> >> > all and you would hardly perceive the difference, you would just have to
> >> > separate
> >> > 
> >> >     set-mem-policy 0 size=2G
> >> >     device_add dimm,slot=0
> > I'm confused, size is inherent property of generic DimmDevice and policies
> > are NUMA specific of node.
> 
> No, the size is not a property of the DimmDevice, it is a property of
> the host object that backs the DimmDevice.  This is like the size is not
> a property of a disk, but rather of the image that backs it.
> 
> Now, there may be a good reason to remove the host/guest distinction in
> the case of memory, but I'm still not sure this is the case.
I don't like split model in this case because it's complicates interface
and confuses user. On bare-metal when user adds DIMM, it definitely has
size property. So when user adds DIMM device like:

     set-mem-policy 0 size=2G,somehostprop=y
     device_add dimm,slot=0

it's not very clear/intuitive to what 'size' relates to. On contrary:

     set-mem-policy 0 somehostprop=y
     device_add dimm,slot=0,size=2G

clearly says that we are adding 2Gb DIMM and allocator of memory that
bakes up DIMM, takes care about host specific details isolating them
from DIMM device model (which resembles baremetal one as much as possible).

> In the case of NUMA policies, I talked to Andrea Arcangeli and indeed
> his suggestion was to have a single policy per guest node (typically
> bind or preferred if guest node size <= host node size, interleave if
> guest node size > host node size).
with current implementation we could inherit DimmBusNumaAware from DimmBus,
and replace allocator to do that, not touching DIMM device model at all.

> 
> However, there are other properties of the memory object (e.g. hugetlbfs
> path) that could be customized for every slot.
Why not keep this mapping in object that allocates memory (DimmBus) and
allow it apply them to allocated memory when DIMM device is being added?

It still would be a split model but DimmDevice interface and implementation
would stay stable and the same (i.e. device_add dimm,id,size,slot,start,node)
whether we use per DIMM host specific policies, NUMA node policies or not care
about them at all.

> Paolo
> 
> > If there is need for per DIMM policies, I'd store NUMA specific policy inside
> > object that implements it (allocates memory), and apply them to DIMM when
> > it's realized.
> > 
> >     set-mem-policy nodeid=X,mem-hostnodes=Z,dev=dimmY
> >     device_add dimm,id=dimmY,size=2G,node=X
> > 
> 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation
  2013-07-26 12:51                 ` Igor Mammedov
@ 2013-07-26 14:37                   ` Paolo Bonzini
  2013-08-03 13:56                     ` Andreas Färber
  2013-08-06  7:13                     ` Markus Armbruster
  0 siblings, 2 replies; 48+ messages in thread
From: Paolo Bonzini @ 2013-07-26 14:37 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: vasilis.liaskovitis, hutao, qemu-devel

Il 26/07/2013 14:51, Igor Mammedov ha scritto:
> On Fri, 26 Jul 2013 11:26:16 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> Il 26/07/2013 09:38, Igor Mammedov ha scritto:
>>> Perhaps denying memory add and suggesting node migration to a node with
>>> more memory would be right approach, otherwise user is bound to be hit by
>>> cross node penalty.
>>
>> Or better, the user can first change the policy from "bind" to
>> "preferred", and then hotplug.
>>
>>>>> I agree that specifying the policy on every hotplug complicates
>>>>> management and may be overkill.  But then, most guests are not NUMA at
>>>>> all and you would hardly perceive the difference, you would just have to
>>>>> separate
>>>>>
>>>>>     set-mem-policy 0 size=2G
>>>>>     device_add dimm,slot=0
>>> I'm confused, size is inherent property of generic DimmDevice and policies
>>> are NUMA specific of node.
>>
>> No, the size is not a property of the DimmDevice, it is a property of
>> the host object that backs the DimmDevice.  This is like the size is not
>> a property of a disk, but rather of the image that backs it.
>>
>> Now, there may be a good reason to remove the host/guest distinction in
>> the case of memory, but I'm still not sure this is the case.
> I don't like split model in this case because it's complicates interface
> and confuses user. On bare-metal when user adds DIMM, it definitely has
> size property. So when user adds DIMM device like:
> 
>      set-mem-policy 0 size=2G,somehostprop=y
>      device_add dimm,slot=0
> 
> it's not very clear/intuitive to what 'size' relates to. On contrary:
> 
>      set-mem-policy 0 somehostprop=y
>      device_add dimm,slot=0,size=2G
> 
> clearly says that we are adding 2Gb DIMM and allocator of memory that
> bakes up DIMM, takes care about host specific details isolating them
> from DIMM device model (which resembles baremetal one as much as possible).

How is this different from

    drive-add id=foo,aio=native
    device-add virtio-block,drive=foo,file=/vm/foo.img

We clearly do not do that, we put the file in the drive-add.

>> In the case of NUMA policies, I talked to Andrea Arcangeli and indeed
>> his suggestion was to have a single policy per guest node (typically
>> bind or preferred if guest node size <= host node size, interleave if
>> guest node size > host node size).
> with current implementation we could inherit DimmBusNumaAware from DimmBus,
> and replace allocator to do that, not touching DIMM device model at all.

The guest bus should surely not be aware of NUMA policies in the host.

Remember here the guest may have just one node, but you want it bound to
one particular node of the host.

>> However, there are other properties of the memory object (e.g. hugetlbfs
>> path) that could be customized for every slot.
> 
> Why not keep this mapping in object that allocates memory (DimmBus) and
> allow it apply them to allocated memory when DIMM device is being added?

It's not DimmBus that allocates memory.  Allocating memory is a host
action, and even if it is triggered by DimmBus, it is not _done_ by DimmBus.

> It still would be a split model but DimmDevice interface and implementation
> would stay stable and the same (i.e. device_add dimm,id,size,slot,start,node)
> whether we use per DIMM host specific policies, NUMA node policies or not care
> about them at all.

I'm not sure how.  Clean separation of host vs. guest properties and
actions is the only thing that makes DimmDevice stable.

In fact, the same separation is present in the real world as well.  The
DimmDevice is not the physical memory chips, it is just a guest-visible
representation of it.  Size is a property of the physical memory chips.

Paolo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
                   ` (16 preceding siblings ...)
  2013-07-24  9:52 ` [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Hu Tao
@ 2013-08-02 12:35 ` Anthony Liguori
  2013-08-07 14:14   ` Erlon Cruz
  2013-08-09 17:19   ` Anthony Liguori
  2013-09-11  4:01 ` Hu Tao
  18 siblings, 2 replies; 48+ messages in thread
From: Anthony Liguori @ 2013-08-02 12:35 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel; +Cc: vasilis.liaskovitis, pbonzini

Applied.  Thanks.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 02/16] vl: set default ram_size during variable initialization
  2013-07-23 16:22 ` [Qemu-devel] [PATCH 02/16] vl: set default ram_size during variable initialization Igor Mammedov
@ 2013-08-02 20:33   ` Andreas Färber
  2013-09-09 14:06     ` Igor Mammedov
  0 siblings, 1 reply; 48+ messages in thread
From: Andreas Färber @ 2013-08-02 20:33 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: vasilis.liaskovitis, hutao, qemu-devel, pbonzini

Am 23.07.2013 18:22, schrieb Igor Mammedov:
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  vl.c |    7 +------
>  1 files changed, 1 insertions(+), 6 deletions(-)
> 
> diff --git a/vl.c b/vl.c
> index 8190504..bf0c658 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -2947,7 +2947,7 @@ int main(int argc, char **argv, char **envp)
>      module_call_init(MODULE_INIT_MACHINE);
>      machine = find_default_machine();
>      cpu_model = NULL;
> -    ram_size = 0;
> +    ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
>      snapshot = 0;
>      cyls = heads = secs = 0;
>      translation = BIOS_ATA_TRANSLATION_AUTO;
> @@ -4064,11 +4064,6 @@ int main(int argc, char **argv, char **envp)
>          exit(1);
>      }
>  
> -    /* init the memory */
> -    if (ram_size == 0) {
> -        ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
> -    }
> -
>      if (qemu_opts_foreach(qemu_find_opts("device"), device_help_func, NULL, 0)
>          != 0) {
>          exit(0);

Commit message doesn't give any explanation why?

What happens with -m 0? My guess is the old code translates that to the
default size, where by intializing the default earlier it would stay.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation
  2013-07-26 14:37                   ` Paolo Bonzini
@ 2013-08-03 13:56                     ` Andreas Färber
  2013-09-11 15:12                       ` Igor Mammedov
  2013-08-06  7:13                     ` Markus Armbruster
  1 sibling, 1 reply; 48+ messages in thread
From: Andreas Färber @ 2013-08-03 13:56 UTC (permalink / raw)
  To: Paolo Bonzini, Igor Mammedov
  Cc: vasilis.liaskovitis, hutao, qemu-devel, Anthony Liguori

Am 26.07.2013 16:37, schrieb Paolo Bonzini:
> Il 26/07/2013 14:51, Igor Mammedov ha scritto:
>> On Fri, 26 Jul 2013 11:26:16 +0200
>> Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> Il 26/07/2013 09:38, Igor Mammedov ha scritto:
>>>>>> I agree that specifying the policy on every hotplug complicates
>>>>>> management and may be overkill.  But then, most guests are not NUMA at
>>>>>> all and you would hardly perceive the difference, you would just have to
>>>>>> separate
>>>>>>
>>>>>>     set-mem-policy 0 size=2G
>>>>>>     device_add dimm,slot=0
>>>> I'm confused, size is inherent property of generic DimmDevice and policies
>>>> are NUMA specific of node.
>>>
>>> No, the size is not a property of the DimmDevice, it is a property of
>>> the host object that backs the DimmDevice.  This is like the size is not
>>> a property of a disk, but rather of the image that backs it.
>>>
>>> Now, there may be a good reason to remove the host/guest distinction in
>>> the case of memory, but I'm still not sure this is the case.
>> I don't like split model in this case because it's complicates interface
>> and confuses user. On bare-metal when user adds DIMM, it definitely has
>> size property. So when user adds DIMM device like:
>>
>>      set-mem-policy 0 size=2G,somehostprop=y
>>      device_add dimm,slot=0
>>
>> it's not very clear/intuitive to what 'size' relates to. On contrary:
>>
>>      set-mem-policy 0 somehostprop=y
>>      device_add dimm,slot=0,size=2G
>>
>> clearly says that we are adding 2Gb DIMM and allocator of memory that
>> bakes up DIMM, takes care about host specific details isolating them
>> from DIMM device model (which resembles baremetal one as much as possible).
> 
> How is this different from
> 
>     drive-add id=foo,aio=native
>     device-add virtio-block,drive=foo,file=/vm/foo.img
> 
> We clearly do not do that, we put the file in the drive-add.

The difference is that a user can understand drive-add, whereas
set-mem-policy in this use case is really hard to grasp as a prereq. If
it were
  memory-add id=foo,size=2G
  device-add dimm,slot=0,mem=foo
that may be different, but that is unhandy implementation-wise because
we assume initial RAM to be in one chunk and want to partition it into
guest NUMA nodes IIUC. Or has that changed?

I don't care if we name the device DIMM or MemoryBar, point is it should
represent the physical thing one plugs into a physical board, to match
what admins do with physical servers. And that physical bar has a size,
as Igor points out. It should not just represent some virtual
hot-plugged policy thing.

The drive is separate from the device because we can exchange the CD
media while leaving the device connected to ATA/AHCI/SCSI (and it allows
us to keep file, cache, etc. properties in a central place). Admins buy
servers/boards and memory bars, they won't solder chips onto it nor
change the NUMA configuration of the board while running, so we should
consider it immutable once created. If we unplug physical DIMMs, the
data can be considered lost (ignoring deep cooling tricks etc.), so no
point to transfer memory data from one DIMM to another.

Would we seriously want to exchange the memory a DIMM is backed by while
leaving the DIMM plugged? Rather the entity that owns the DIMM slots (as
opposed to DIMMs) has the guest NUMA policy configured for the
unpopulated slots. The slot has a maximum size and the DIMM has a size
less or equal to slot's maximum size.

I just wonder whether we need a DimmBus at all? Because if we get the
slot specified as in your examples then we could just set some dimm[n]
link<DIMM> on realize (question is where). We had a discussion once
about a missing callback when a link property is set to a new value, not
sure if that has been resolved meantime?
Can't think of a situation where we would have multiple DimmBus'es, only
some cases where it's on a SoC or SoM and not directly on the mainboard,
i.e. not /machine/dimm[n].
Code seems to be copying ICC bus, but ICC bus was added because APIC
needed a hot-pluggable bus to replace SysBus.

Hadn't Anthony and Ping Fang factored out a memory controller on the
i440fx? Patch 9 seems to add the bus directly to the i440fx PCI device.

Patch 13 seems to use a Notifier for PIIX4 ACPI rather than having the
bus handle hotplug itself and bus / memory controller communicating with
ACPI as necessary (layering violation). For the CPU we initially had no
bus at all to handle such event (we still don't outside x86).

What I am not finding in this series is a translation from -m to initial
non-hotplugged DIMMs?

Some random other remarks on the series:
* s/"dimmbus"/"dimm-bus"/g
* s/klass/oc/g -- there were loud protests against k* (class reserved)
* gtk-doc markup for parent fields is missing in header
* s/qdev/parent_obj/, s/qbus/parent_obj/ -- don't invite to use them
* s/dc/dbc/g -- dc is for DeviceClass
* DimmBusClass::register_memory() is never overwritten? In that case it
could well be just a static function called from realizefn - only
disctinction I can think of is how to allocate MemoryRegion. A
caller-settable DimmBus::slot_plugged() for i440fx to notify piix4 would
seem to solve the issue of the Notifier elegantly. PCIBus has such hooks.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation
  2013-07-26 14:37                   ` Paolo Bonzini
  2013-08-03 13:56                     ` Andreas Färber
@ 2013-08-06  7:13                     ` Markus Armbruster
  1 sibling, 0 replies; 48+ messages in thread
From: Markus Armbruster @ 2013-08-06  7:13 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: vasilis.liaskovitis, Igor Mammedov, qemu-devel, hutao

Paolo Bonzini <pbonzini@redhat.com> writes:

> Il 26/07/2013 14:51, Igor Mammedov ha scritto:
>> On Fri, 26 Jul 2013 11:26:16 +0200
>> Paolo Bonzini <pbonzini@redhat.com> wrote:
>> 
>>> Il 26/07/2013 09:38, Igor Mammedov ha scritto:
>>>> Perhaps denying memory add and suggesting node migration to a node with
>>>> more memory would be right approach, otherwise user is bound to be hit by
>>>> cross node penalty.
>>>
>>> Or better, the user can first change the policy from "bind" to
>>> "preferred", and then hotplug.
>>>
>>>>>> I agree that specifying the policy on every hotplug complicates
>>>>>> management and may be overkill.  But then, most guests are not NUMA at
>>>>>> all and you would hardly perceive the difference, you would just have to
>>>>>> separate
>>>>>>
>>>>>>     set-mem-policy 0 size=2G
>>>>>>     device_add dimm,slot=0
>>>> I'm confused, size is inherent property of generic DimmDevice and policies
>>>> are NUMA specific of node.
>>>
>>> No, the size is not a property of the DimmDevice, it is a property of
>>> the host object that backs the DimmDevice.  This is like the size is not
>>> a property of a disk, but rather of the image that backs it.

If you excuse my pedantry: size *is* a property of the block device
model (guest part, a.k.a. frontend).  We just choose to derive it from
the image size (host part, a.k.a. backend) instead of making it
separately configurable.

If we wanted to make it separately configurable (overriding the implicit
image size), I'd certainly argue to make it a device model property.

>>> Now, there may be a good reason to remove the host/guest distinction in
>>> the case of memory, but I'm still not sure this is the case.
>> I don't like split model in this case because it's complicates interface
>> and confuses user. On bare-metal when user adds DIMM, it definitely has
>> size property. So when user adds DIMM device like:
>> 
>>      set-mem-policy 0 size=2G,somehostprop=y
>>      device_add dimm,slot=0
>> 
>> it's not very clear/intuitive to what 'size' relates to. On contrary:
>> 
>>      set-mem-policy 0 somehostprop=y
>>      device_add dimm,slot=0,size=2G
>> 
>> clearly says that we are adding 2Gb DIMM and allocator of memory that
>> bakes up DIMM, takes care about host specific details isolating them
>> from DIMM device model (which resembles baremetal one as much as possible).
>
> How is this different from
>
>     drive-add id=foo,aio=native
>     device-add virtio-block,drive=foo,file=/vm/foo.img
>
> We clearly do not do that, we put the file in the drive-add.

Yes, because the image file is a backend property, not a frontend
property.

>>> In the case of NUMA policies, I talked to Andrea Arcangeli and indeed
>>> his suggestion was to have a single policy per guest node (typically
>>> bind or preferred if guest node size <= host node size, interleave if
>>> guest node size > host node size).
>> with current implementation we could inherit DimmBusNumaAware from DimmBus,
>> and replace allocator to do that, not touching DIMM device model at all.
>
> The guest bus should surely not be aware of NUMA policies in the host.
>
> Remember here the guest may have just one node, but you want it bound to
> one particular node of the host.
>
>>> However, there are other properties of the memory object (e.g. hugetlbfs
>>> path) that could be customized for every slot.
>> 
>> Why not keep this mapping in object that allocates memory (DimmBus) and
>> allow it apply them to allocated memory when DIMM device is being added?
>
> It's not DimmBus that allocates memory.  Allocating memory is a host
> action, and even if it is triggered by DimmBus, it is not _done_ by DimmBus.
>
>> It still would be a split model but DimmDevice interface and implementation
>> would stay stable and the same (i.e. device_add dimm,id,size,slot,start,node)
>> whether we use per DIMM host specific policies, NUMA node policies or not care
>> about them at all.
>
> I'm not sure how.  Clean separation of host vs. guest properties and
> actions is the only thing that makes DimmDevice stable.
>
> In fact, the same separation is present in the real world as well.  The
> DimmDevice is not the physical memory chips, it is just a guest-visible
> representation of it.  Size is a property of the physical memory chips.

Without fully understanding the issues in this specific case: clean
separation of host vs. guest matters.  Old code didn't care, and we
spent a lot of sweat on cleaning it up, dancing to the merry backward
compatibility polka all the way.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug
  2013-08-02 12:35 ` Anthony Liguori
@ 2013-08-07 14:14   ` Erlon Cruz
  2013-08-09 17:19   ` Anthony Liguori
  1 sibling, 0 replies; 48+ messages in thread
From: Erlon Cruz @ 2013-08-07 14:14 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Vasilis Liaskovitis, Igor Mammedov, pbonzini, qemu-devel, Hu Tao

[-- Attachment #1: Type: text/plain, Size: 171 bytes --]

On Fri, Aug 2, 2013 at 9:35 AM, Anthony Liguori <aliguori@us.ibm.com> wrote:

> Applied.  Thanks.
>
> Where can I find this branch?


> Regards,
>
> Anthony Liguori
>
>
>

[-- Attachment #2: Type: text/html, Size: 633 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug
  2013-08-02 12:35 ` Anthony Liguori
  2013-08-07 14:14   ` Erlon Cruz
@ 2013-08-09 17:19   ` Anthony Liguori
  1 sibling, 0 replies; 48+ messages in thread
From: Anthony Liguori @ 2013-08-09 17:19 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel; +Cc: vasilis.liaskovitis, pbonzini

Anthony Liguori <aliguori@us.ibm.com> writes:

> Applied.  Thanks.
>

Something went bad here... Apologies, it was not applied.

Regards,

Anthony Liguori

> Regards,
>
> Anthony Liguori

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 02/16] vl: set default ram_size during variable initialization
  2013-08-02 20:33   ` Andreas Färber
@ 2013-09-09 14:06     ` Igor Mammedov
  2013-09-09 14:31       ` Paolo Bonzini
  0 siblings, 1 reply; 48+ messages in thread
From: Igor Mammedov @ 2013-09-09 14:06 UTC (permalink / raw)
  To: Andreas Färber; +Cc: vasilis.liaskovitis, hutao, qemu-devel, pbonzini

On Fri, 02 Aug 2013 22:33:24 +0200
Andreas Färber <afaerber@suse.de> wrote:

> Am 23.07.2013 18:22, schrieb Igor Mammedov:
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  vl.c |    7 +------
> >  1 files changed, 1 insertions(+), 6 deletions(-)
> > 
> > diff --git a/vl.c b/vl.c
> > index 8190504..bf0c658 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -2947,7 +2947,7 @@ int main(int argc, char **argv, char **envp)
> >      module_call_init(MODULE_INIT_MACHINE);
> >      machine = find_default_machine();
> >      cpu_model = NULL;
> > -    ram_size = 0;
> > +    ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
> >      snapshot = 0;
> >      cyls = heads = secs = 0;
> >      translation = BIOS_ATA_TRANSLATION_AUTO;
> > @@ -4064,11 +4064,6 @@ int main(int argc, char **argv, char **envp)
> >          exit(1);
> >      }
> >  
> > -    /* init the memory */
> > -    if (ram_size == 0) {
> > -        ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
> > -    }
> > -
> >      if (qemu_opts_foreach(qemu_find_opts("device"), device_help_func, NULL, 0)
> >          != 0) {
> >          exit(0);
> 
> Commit message doesn't give any explanation why?
it was intended as cleanup

> 
> What happens with -m 0? My guess is the old code translates that to the
> default size, where by intializing the default earlier it would stay.
patch is broken in this aspect. It aborts on start up with -m 0

The question is if -m 0 is correct value, perhaps QEMU should exit with
error message in this case, instead of silent fallback to default?

> 
> Andreas
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 02/16] vl: set default ram_size during variable initialization
  2013-09-09 14:06     ` Igor Mammedov
@ 2013-09-09 14:31       ` Paolo Bonzini
  2013-09-09 15:26         ` Igor Mammedov
  0 siblings, 1 reply; 48+ messages in thread
From: Paolo Bonzini @ 2013-09-09 14:31 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: vasilis.liaskovitis, hutao, Andreas Färber, qemu-devel

Il 09/09/2013 16:06, Igor Mammedov ha scritto:
> On Fri, 02 Aug 2013 22:33:24 +0200
> Andreas Färber <afaerber@suse.de> wrote:
> 
>> Am 23.07.2013 18:22, schrieb Igor Mammedov:
>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>>> ---
>>>  vl.c |    7 +------
>>>  1 files changed, 1 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/vl.c b/vl.c
>>> index 8190504..bf0c658 100644
>>> --- a/vl.c
>>> +++ b/vl.c
>>> @@ -2947,7 +2947,7 @@ int main(int argc, char **argv, char **envp)
>>>      module_call_init(MODULE_INIT_MACHINE);
>>>      machine = find_default_machine();
>>>      cpu_model = NULL;
>>> -    ram_size = 0;
>>> +    ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
>>>      snapshot = 0;
>>>      cyls = heads = secs = 0;
>>>      translation = BIOS_ATA_TRANSLATION_AUTO;
>>> @@ -4064,11 +4064,6 @@ int main(int argc, char **argv, char **envp)
>>>          exit(1);
>>>      }
>>>  
>>> -    /* init the memory */
>>> -    if (ram_size == 0) {
>>> -        ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
>>> -    }
>>> -
>>>      if (qemu_opts_foreach(qemu_find_opts("device"), device_help_func, NULL, 0)
>>>          != 0) {
>>>          exit(0);
>>
>> Commit message doesn't give any explanation why?
> it was intended as cleanup
> 
>>
>> What happens with -m 0? My guess is the old code translates that to the
>> default size, where by intializing the default earlier it would stay.
> patch is broken in this aspect. It aborts on start up with -m 0
> 
> The question is if -m 0 is correct value, perhaps QEMU should exit with
> error message in this case, instead of silent fallback to default?

I guess we have to keep it for backwards compatibility.

Paolo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 02/16] vl: set default ram_size during variable initialization
  2013-09-09 14:31       ` Paolo Bonzini
@ 2013-09-09 15:26         ` Igor Mammedov
  0 siblings, 0 replies; 48+ messages in thread
From: Igor Mammedov @ 2013-09-09 15:26 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: vasilis.liaskovitis, hutao, Andreas Färber, qemu-devel

On Mon, 09 Sep 2013 16:31:03 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 09/09/2013 16:06, Igor Mammedov ha scritto:
> > On Fri, 02 Aug 2013 22:33:24 +0200
> > Andreas Färber <afaerber@suse.de> wrote:
> > 
> >> Am 23.07.2013 18:22, schrieb Igor Mammedov:
> >>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> >>> ---
> >>>  vl.c |    7 +------
> >>>  1 files changed, 1 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/vl.c b/vl.c
> >>> index 8190504..bf0c658 100644
> >>> --- a/vl.c
> >>> +++ b/vl.c
> >>> @@ -2947,7 +2947,7 @@ int main(int argc, char **argv, char **envp)
> >>>      module_call_init(MODULE_INIT_MACHINE);
> >>>      machine = find_default_machine();
> >>>      cpu_model = NULL;
> >>> -    ram_size = 0;
> >>> +    ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
> >>>      snapshot = 0;
> >>>      cyls = heads = secs = 0;
> >>>      translation = BIOS_ATA_TRANSLATION_AUTO;
> >>> @@ -4064,11 +4064,6 @@ int main(int argc, char **argv, char **envp)
> >>>          exit(1);
> >>>      }
> >>>  
> >>> -    /* init the memory */
> >>> -    if (ram_size == 0) {
> >>> -        ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
> >>> -    }
> >>> -
> >>>      if (qemu_opts_foreach(qemu_find_opts("device"), device_help_func, NULL, 0)
> >>>          != 0) {
> >>>          exit(0);
> >>
> >> Commit message doesn't give any explanation why?
> > it was intended as cleanup
> > 
> >>
> >> What happens with -m 0? My guess is the old code translates that to the
> >> default size, where by intializing the default earlier it would stay.
> > patch is broken in this aspect. It aborts on start up with -m 0
> > 
> > The question is if -m 0 is correct value, perhaps QEMU should exit with
> > error message in this case, instead of silent fallback to default?
> 
> I guess we have to keep it for backwards compatibility.
I could swap this patch with 3/16 and add -m 0 compat logic there. Then
it would be ok to move default to initialization time + cleanup.

> 
> Paolo
> 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug
  2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
                   ` (17 preceding siblings ...)
  2013-08-02 12:35 ` Anthony Liguori
@ 2013-09-11  4:01 ` Hu Tao
  2013-09-17 12:29   ` Igor Mammedov
  18 siblings, 1 reply; 48+ messages in thread
From: Hu Tao @ 2013-09-11  4:01 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: vasilis.liaskovitis, pbonzini, qemu-devel

On Tue, Jul 23, 2013 at 06:22:56PM +0200, Igor Mammedov wrote:
> As opposed to previous approach,
> This series allows to hotplug 'arbitrary' DIMM devices specifying size,
> NUMA node mapping, slot and address where to map it, at runtime.
> 
> Due to ACPI limitation there is need to specify a number of possible
> DIMM devices. For this task -m option was extended to support
> following format:
> 
>   -m [mem=]RamSize[,slots=N,maxmem=M]
> 
> To allow memory hotplug user must specify a pair additional parameters:
>     'slots' - number of possible increments
>     'maxmem' - max possible total memory size QEMU is allowed to use,
>                including RamSize.
> 
> minimal monitor command syntax to hotplug DIMM device:
> 
>   device_add dimm,id=dimmX
> 
> DIMM device provides following properties that could be used with
> device_add / -device to alter default behavior:
> 
>   id    - unique string identifying device [mandatory]
>   slot  - number in range [0-slots) [optional], if not specified
>           the first free slot is used
>   node  - NUMA node id [optional] (default: 0)
>   size  - amount of memory to add [optional] (default: 1Gb)
>   start - guest's physical address where to plug DIMM [optional],
>           if not specified the first gap in hotplug memory region
>           that fits DIMM is used
> 
>  -device option could be used for adding potentially hotunplugable DIMMs
> and also for specifying hotplugged DIMMs in migration case (not tested).
> 
> Current implementation supports only x86-64 variant and places hotplug
> memory region above 4Gb before 64-bit PCI hole.
> 
> Tested guests:
>  - Fedora 19x64
>  - Windows 2012DCx64
>  - Windows 2008DCx64
> 
> Known limitations/bugs/TODOs:
>  - only hot-add supported
>  - q35 is not supported yet
>  - max number of supported DIMM devices 255 (due to ACPI object name
>    limit), could be increased creating several containers and putting
>    DIMMs there. (exercise for future) 
>  - failed hotplug action consumes 1 slot (device_add doesn't delete
>    device if realize failed)
>  - e820 table doesn't include DIMM devices added with -device /
>    (or after reboot devices added with device_add)
>  - Windows 2008 remembers DIMM configuration, so if DIMM with other
>    start/size is added into the same slot, it refuses to use it insisting
>    on old mapping.

With this series we can hotplug memory of arbitrary size, but Linux
expects a minimum size of hotpluggable memory. Take 128M in x86_64 for
example, if first hotplug 64M memory (less than 128M) than we can't add
another 64M memory. So the question is should we have a lower limit of
hotplugged memory in qemu (which is easy but it's not qemu's problem,
and different OS/hardware may have different limits)? Or maybe we can
fix it in Linux (which is difficult and I'm not sure it'll casue any
compatibility problem)?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation
  2013-08-03 13:56                     ` Andreas Färber
@ 2013-09-11 15:12                       ` Igor Mammedov
  0 siblings, 0 replies; 48+ messages in thread
From: Igor Mammedov @ 2013-09-11 15:12 UTC (permalink / raw)
  To: Andreas Färber
  Cc: vasilis.liaskovitis, Paolo Bonzini, qemu-devel, Anthony Liguori, hutao

On Sat, 03 Aug 2013 15:56:50 +0200
Andreas Färber <afaerber@suse.de> wrote:

> Am 26.07.2013 16:37, schrieb Paolo Bonzini:
> > Il 26/07/2013 14:51, Igor Mammedov ha scritto:
> >> On Fri, 26 Jul 2013 11:26:16 +0200
> >> Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>> Il 26/07/2013 09:38, Igor Mammedov ha scritto:
> >>>>>> I agree that specifying the policy on every hotplug complicates
> >>>>>> management and may be overkill.  But then, most guests are not NUMA at
> >>>>>> all and you would hardly perceive the difference, you would just have to
> >>>>>> separate
> >>>>>>
> >>>>>>     set-mem-policy 0 size=2G
> >>>>>>     device_add dimm,slot=0
> >>>> I'm confused, size is inherent property of generic DimmDevice and policies
> >>>> are NUMA specific of node.
> >>>
> >>> No, the size is not a property of the DimmDevice, it is a property of
> >>> the host object that backs the DimmDevice.  This is like the size is not
> >>> a property of a disk, but rather of the image that backs it.
> >>>
> >>> Now, there may be a good reason to remove the host/guest distinction in
> >>> the case of memory, but I'm still not sure this is the case.
> >> I don't like split model in this case because it's complicates interface
> >> and confuses user. On bare-metal when user adds DIMM, it definitely has
> >> size property. So when user adds DIMM device like:
> >>
> >>      set-mem-policy 0 size=2G,somehostprop=y
> >>      device_add dimm,slot=0
> >>
> >> it's not very clear/intuitive to what 'size' relates to. On contrary:
> >>
> >>      set-mem-policy 0 somehostprop=y
> >>      device_add dimm,slot=0,size=2G
> >>
> >> clearly says that we are adding 2Gb DIMM and allocator of memory that
> >> bakes up DIMM, takes care about host specific details isolating them
> >> from DIMM device model (which resembles baremetal one as much as possible).
> > 
> > How is this different from
> > 
> >     drive-add id=foo,aio=native
> >     device-add virtio-block,drive=foo,file=/vm/foo.img
> > 
> > We clearly do not do that, we put the file in the drive-add.
> 
> The difference is that a user can understand drive-add, whereas
> set-mem-policy in this use case is really hard to grasp as a prereq. If
> it were
>   memory-add id=foo,size=2G
>   device-add dimm,slot=0,mem=foo
> that may be different, but that is unhandy implementation-wise because
> we assume initial RAM to be in one chunk and want to partition it into
> guest NUMA nodes IIUC. Or has that changed?
> 
> I don't care if we name the device DIMM or MemoryBar, point is it should
> represent the physical thing one plugs into a physical board, to match
> what admins do with physical servers. And that physical bar has a size,
> as Igor points out. It should not just represent some virtual
> hot-plugged policy thing.
> 
> The drive is separate from the device because we can exchange the CD
> media while leaving the device connected to ATA/AHCI/SCSI (and it allows
> us to keep file, cache, etc. properties in a central place). Admins buy
> servers/boards and memory bars, they won't solder chips onto it nor
> change the NUMA configuration of the board while running, so we should
> consider it immutable once created. If we unplug physical DIMMs, the
> data can be considered lost (ignoring deep cooling tricks etc.), so no
> point to transfer memory data from one DIMM to another.
> 
> Would we seriously want to exchange the memory a DIMM is backed by while
> leaving the DIMM plugged? Rather the entity that owns the DIMM slots (as
> opposed to DIMMs) has the guest NUMA policy configured for the
> unpopulated slots. The slot has a maximum size and the DIMM has a size
> less or equal to slot's maximum size.
maximum size per slot is hardware limitation, we can ignore it for virtual
hardware case. I've added 'slot' option only for migration case, so we could
replicate environment on receiving side (it's interface connecting QEMU
and ACPI objects in guest).
The thing is even if slot was an object providing NUMA policies and etc.,
I'd like to keep it dynamic and specifying such properties in runtime
rather then at startup time to keep it flexible and avoid not necessary
limitations wich lead again to backend/frontend separation.

If DimmDevice is separated on backend/frontend parts than for simplicity
'size' option could go to memory-add command as well.

So taking previous feedback, would be following interface acceptable?

memory-add id=foo,size=x,numa-node=y
device-add dimm,slot=z,mem=foo

memory-add would:
 1. allocate host memory
 2. apply policy for a specified node

device-add would create DimmDevice which:
 1. will own memory region
 2. have properties:
     slot - addressing specific ACPI memory device through
            QEMU/ACPI OSPM layers [default: auto selected]
     start - address where memory is mapped [default: auto selected]
 3. serve as proxy object for back-end size and proximity properties

> I just wonder whether we need a DimmBus at all? Because if we get the
> slot specified as in your examples then we could just set some dimm[n]
> link<DIMM> on realize (question is where). We had a discussion once
> about a missing callback when a link property is set to a new value, not
> sure if that has been resolved meantime?
> Can't think of a situation where we would have multiple DimmBus'es, only
> some cases where it's on a SoC or SoM and not directly on the mainboard,
> i.e. not /machine/dimm[n].
> Code seems to be copying ICC bus, but ICC bus was added because APIC
> needed a hot-pluggable bus to replace SysBus.
Bus here is not only source of implicit links but also a gateway that provides
address space for mapping DimmDevice-s in acceptable way.

I've tried /machine/link<dimm> looking for simpler solution, but result
was more hackish, so I've dumped idea and returned to original Bus approach
with established usage pattern.

> 
> Hadn't Anthony and Ping Fang factored out a memory controller on the
> i440fx? Patch 9 seems to add the bus directly to the i440fx PCI device.
not upstream so far, there was ignored RFC by Hu Tao
[http://lists.nongnu.org/archive/html/qemu-devel/2013-06/msg03500.html]

This series provides memory hotplug in the least invasive way, and it
would be trivial to move to factored out memory controller once it's upstream.

> Patch 13 seems to use a Notifier for PIIX4 ACPI rather than having the
> bus handle hotplug itself and bus / memory controller communicating with
> ACPI as necessary (layering violation). For the CPU we initially had no
> bus at all to handle such event (we still don't outside x86).
I'll look at it.

> 
> What I am not finding in this series is a translation from -m to initial
> non-hotplugged DIMMs?
it wasn't goal of this series for it would require factored out a memory
controller, so not to increase mess in current code.

> 
> Some random other remarks on the series:
> * s/"dimmbus"/"dimm-bus"/g
> * s/klass/oc/g -- there were loud protests against k* (class reserved)
> * gtk-doc markup for parent fields is missing in header
> * s/qdev/parent_obj/, s/qbus/parent_obj/ -- don't invite to use them
> * s/dc/dbc/g -- dc is for DeviceClass
> * DimmBusClass::register_memory() is never overwritten? In that case it
> could well be just a static function called from realizefn - only
It's not much of over-engineering and keeps things object oriented.

> disctinction I can think of is how to allocate MemoryRegion. A
> caller-settable DimmBus::slot_plugged() for i440fx to notify piix4 would
> seem to solve the issue of the Notifier elegantly. PCIBus has such hooks.
Thanks for a tip, I'll certainly try it.

> 
> Regards,
> Andreas
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug
  2013-09-11  4:01 ` Hu Tao
@ 2013-09-17 12:29   ` Igor Mammedov
  0 siblings, 0 replies; 48+ messages in thread
From: Igor Mammedov @ 2013-09-17 12:29 UTC (permalink / raw)
  To: Hu Tao; +Cc: vasilis.liaskovitis, pbonzini, qemu-devel

On Wed, 11 Sep 2013 12:01:44 +0800
Hu Tao <hutao@cn.fujitsu.com> wrote:

> On Tue, Jul 23, 2013 at 06:22:56PM +0200, Igor Mammedov wrote:
> > As opposed to previous approach,
> > This series allows to hotplug 'arbitrary' DIMM devices specifying size,
> > NUMA node mapping, slot and address where to map it, at runtime.
> > 
> > Due to ACPI limitation there is need to specify a number of possible
> > DIMM devices. For this task -m option was extended to support
> > following format:
> > 
> >   -m [mem=]RamSize[,slots=N,maxmem=M]
> > 
> > To allow memory hotplug user must specify a pair additional parameters:
> >     'slots' - number of possible increments
> >     'maxmem' - max possible total memory size QEMU is allowed to use,
> >                including RamSize.
> > 
> > minimal monitor command syntax to hotplug DIMM device:
> > 
> >   device_add dimm,id=dimmX
> > 
> > DIMM device provides following properties that could be used with
> > device_add / -device to alter default behavior:
> > 
> >   id    - unique string identifying device [mandatory]
> >   slot  - number in range [0-slots) [optional], if not specified
> >           the first free slot is used
> >   node  - NUMA node id [optional] (default: 0)
> >   size  - amount of memory to add [optional] (default: 1Gb)
> >   start - guest's physical address where to plug DIMM [optional],
> >           if not specified the first gap in hotplug memory region
> >           that fits DIMM is used
> > 
> >  -device option could be used for adding potentially hotunplugable DIMMs
> > and also for specifying hotplugged DIMMs in migration case (not tested).
> > 
> > Current implementation supports only x86-64 variant and places hotplug
> > memory region above 4Gb before 64-bit PCI hole.
> > 
> > Tested guests:
> >  - Fedora 19x64
> >  - Windows 2012DCx64
> >  - Windows 2008DCx64
> > 
> > Known limitations/bugs/TODOs:
> >  - only hot-add supported
> >  - q35 is not supported yet
> >  - max number of supported DIMM devices 255 (due to ACPI object name
> >    limit), could be increased creating several containers and putting
> >    DIMMs there. (exercise for future) 
> >  - failed hotplug action consumes 1 slot (device_add doesn't delete
> >    device if realize failed)
> >  - e820 table doesn't include DIMM devices added with -device /
> >    (or after reboot devices added with device_add)
> >  - Windows 2008 remembers DIMM configuration, so if DIMM with other
> >    start/size is added into the same slot, it refuses to use it insisting
> >    on old mapping.
> 
> With this series we can hotplug memory of arbitrary size, but Linux
> expects a minimum size of hotpluggable memory. Take 128M in x86_64 for
> example, if first hotplug 64M memory (less than 128M) than we can't add
> another 64M memory. So the question is should we have a lower limit of
> hotplugged memory in qemu (which is easy but it's not qemu's problem,
> and different OS/hardware may have different limits)? Or maybe we can
> fix it in Linux (which is difficult and I'm not sure it'll casue any
> compatibility problem)?
I'd say it's not QEMU problem, in real hardware there is/was 64Mb DIMMs and
other OSes might handle this small amount just fine (tested with Windows server 2012).

It's upto linux kernel to fix bug or management tools which know what OS is being
installed to implement workaround/limit.

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2013-09-17 12:30 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
2013-07-23 16:22 ` [Qemu-devel] [PATCH 01/16] pc: use pci_hole64 info consistently Igor Mammedov
2013-07-23 16:22 ` [Qemu-devel] [PATCH 02/16] vl: set default ram_size during variable initialization Igor Mammedov
2013-08-02 20:33   ` Andreas Färber
2013-09-09 14:06     ` Igor Mammedov
2013-09-09 14:31       ` Paolo Bonzini
2013-09-09 15:26         ` Igor Mammedov
2013-07-23 16:22 ` [Qemu-devel] [PATCH 03/16] vl: convert -m to qemu_opts_parse() Igor Mammedov
2013-07-23 17:11   ` Paolo Bonzini
2013-07-24  8:40     ` Igor Mammedov
2013-07-24  9:04       ` Paolo Bonzini
2013-07-24  9:27         ` Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 04/16] qapi: make visit_type_size fallback to type_int Igor Mammedov
2013-07-25  6:41   ` Hu Tao
2013-07-25 11:35     ` Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 05/16] qdev: Add SIZE type to qdev properties Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 06/16] dimm: implement dimm device abstraction Igor Mammedov
2013-07-25  6:52   ` Hu Tao
2013-07-23 16:23 ` [Qemu-devel] [PATCH 07/16] dimm: map DimmDevice into DimBus provided address space Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 08/16] pc: piix: make hotplug memory gap in high memory Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 09/16] pc: i440fx: add DimmBus to chipset and map it into hotplug memory region Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation Igor Mammedov
2013-07-23 17:09   ` Paolo Bonzini
2013-07-24  8:36     ` Igor Mammedov
2013-07-24  9:41       ` Paolo Bonzini
2013-07-24 11:34         ` Igor Mammedov
2013-07-24 12:41           ` Paolo Bonzini
2013-07-26  7:38             ` Igor Mammedov
2013-07-26  9:26               ` Paolo Bonzini
2013-07-26 12:51                 ` Igor Mammedov
2013-07-26 14:37                   ` Paolo Bonzini
2013-08-03 13:56                     ` Andreas Färber
2013-09-11 15:12                       ` Igor Mammedov
2013-08-06  7:13                     ` Markus Armbruster
2013-07-23 16:23 ` [Qemu-devel] [PATCH 11/16] dimm: add busy address check and address auto-allocation Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 12/16] dimm: introduce memory added notifier Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 13/16] acpi/piix4: introduce memory hot-plug interface QEMU<->ACPI BIOS Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 14/16] pc: ACPI BIOS: implement memory hotplug interface Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 15/16] pc: update acpi-dsdt.hex.generated and add ssdt-mem.hex.generated Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 16/16] pc: ACPI BIOS: reserve SRAT entry for hotplug mem hole Igor Mammedov
2013-07-24  9:52 ` [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Hu Tao
2013-07-24 10:02   ` Igor Mammedov
2013-07-24 10:58     ` Vasilis Liaskovitis
2013-08-02 12:35 ` Anthony Liguori
2013-08-07 14:14   ` Erlon Cruz
2013-08-09 17:19   ` Anthony Liguori
2013-09-11  4:01 ` Hu Tao
2013-09-17 12:29   ` Igor Mammedov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.