All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/21] ACPI memory hotplug
@ 2012-07-11 10:31 ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

This is v2 of the ACPI memory hotplug prototype for x86_64 target.

Changes v1->v2

- memory map is automatically calculated for hotplug dimms. Dimms are added from
top-of-memory skipping the pci hole at [PCI_HOLE_START, 4G).
- Renamed from "-memslot" to "-dimm". Commands changed to "dimm_add", "dimm_del".
- Seabios ejection array reduced to a byte. Use extraction macros for dimm ssdt.
- additional SRAT paravirt info does not break previous SRAT fw_cfg layout.
- Documentation of new acpi_piix4 registers and paravirt data.
- add ACPI _OST support for _OST enabled guests. This allows qemu to receive
notification for success / failure of memory hot-add and hot-remove operations.
Guest needs to support _OST (https://lkml.org/lkml/2012/6/25/321)
- add monitor info command to report total guest memory (initial + hot-added)
- add command line options and monitor commands for batch dimm creation/population

Overview:

Dimm devices are modeled with a new qemu command line 

"-dimm id=name,size=sz,node=pxm,populated=on|off"

As already mentioned, the starting physical address for all dimms is calculated
automatically from top of memory, skipping the pci hole at [PCI_HOLE_START, 4G).
Node is defining numa proximity for this dimm. When not defined it defaults
to zero.
"-dimm id=dimm0,size=512M,node=0,populated=off"
will define a 512M memory slot belonging to numa node 0.

Dimms are added or removed with a new hmp command "dimm_add/dimm_del":
Hot-add syntax: "dimm_add id"
Hot-remove syntax: "dimm_del id"

Issues:

- Live migration works as long as populated field is changed to "on" for
hotplugged dimms at the destination qemu command line (patch 12/21 lifts
this requirement). The DimmState structure does not yet define a
VMStateDescription, but i assume this is the preferred way to pass state
for migration.

- Dimms are abstracted as qdevices attached to the main system bus. However,
memory hotplugging has its own side channel ignoring main_system_bus's hotplug
incapability. A cleaner integration is still needed, probably attaching memory
devices as children-links of an acpi-capable device (in the pc case acpi_piix4)
instead of the system bus (TBD). Then device_add/device_del instead of new
commands can hopefully be used.

Comments/review welcome.

series is based on uq/master for qemu-kvm, and master for seabios. Can be found
also at:
http://github.com/vliaskov/qemu-kvm/commits/memhp-v2
http://github.com/vliaskov/seabios/commits/memhp-v2

Vasilis Liaskovitis (14):
  dimm: Implement memory device abstraction
  acpi_piix4: Implement memory device hotplug registers
  pc: calculate dimm physical addresses and adjust memory map
  pc: Add dimm paravirt SRAT info
  Implement "-dimm" command line option
  Implement dimm_add and dimm_del commands for hmp and qmp
  fix live-migration when "populated=on" is missing
  Implement memory hotplug notification lists
  acpi_piix4: _OST dimm support
  acpi_piix4: Update dimm state on VM reboot
  acpi_piix4: Update dimm bitmap state on hot-remove fail
  Implement "info memtotal" and "query-memtotal"
  Implement -dimms, -dimmspop command line options
  Implement mem_increase, mem_decrease hmp/qmp commands

 arch_init.c                 |   23 ++-
 docs/specs/acpi_hotplug.txt |   46 +++++
 docs/specs/fwcfg.txt        |   28 +++
 hmp-commands.hx             |   67 +++++++
 hmp.c                       |   24 +++
 hmp.h                       |    2 +
 hw/Makefile.objs            |    2 +-
 hw/acpi_piix4.c             |  131 ++++++++++++-
 hw/dimm.c                   |  449 +++++++++++++++++++++++++++++++++++++++++++
 hw/dimm.h                   |   72 +++++++
 hw/pc.c                     |   94 +++++++++-
 hw/pc.h                     |    6 +
 hw/pc_piix.c                |   18 ++-
 monitor.c                   |   35 ++++
 monitor.h                   |    5 +
 qapi-schema.json            |   38 ++++
 qemu-config.c               |   70 +++++++
 qemu-options.hx             |   15 ++
 qmp-commands.hx             |  137 +++++++++++++
 sysemu.h                    |    1 +
 vl.c                        |  122 ++++++++++++-
 21 files changed, 1368 insertions(+), 17 deletions(-)
 create mode 100644 docs/specs/acpi_hotplug.txt
 create mode 100644 docs/specs/fwcfg.txt
 create mode 100644 hw/dimm.c
 create mode 100644 hw/dimm.h

Vasilis Liaskovitis (7):
  Add ACPI_EXTRACT_DEVICE* macros
  Add SSDT memory device support
  acpi-dsdt: Implement functions for memory hotplug.
  acpi: generate hotplug memory devices.
  pciinit: Fix pcimem_start value
  acpi_dsdt: Support _OST dimm method
  acpi_dsdt: Revert internal dimm state on _OST failure
 
 Makefile              |    2 +-
 src/acpi-dsdt.dsl     |  120 ++++++++++++++++++++++++++++++++++++-
 src/acpi.c            |  158 +++++++++++++++++++++++++++++++++++++++++++++++--
 src/pciinit.c         |    2 +-
 src/ssdt-mem.dsl      |   69 +++++++++++++++++++++
 tools/acpi_extract.py |   28 +++++++++
 6 files changed, 369 insertions(+), 10 deletions(-)
 create mode 100644 src/ssdt-mem.dsl

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 00/21] ACPI memory hotplug
@ 2012-07-11 10:31 ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

This is v2 of the ACPI memory hotplug prototype for x86_64 target.

Changes v1->v2

- memory map is automatically calculated for hotplug dimms. Dimms are added from
top-of-memory skipping the pci hole at [PCI_HOLE_START, 4G).
- Renamed from "-memslot" to "-dimm". Commands changed to "dimm_add", "dimm_del".
- Seabios ejection array reduced to a byte. Use extraction macros for dimm ssdt.
- additional SRAT paravirt info does not break previous SRAT fw_cfg layout.
- Documentation of new acpi_piix4 registers and paravirt data.
- add ACPI _OST support for _OST enabled guests. This allows qemu to receive
notification for success / failure of memory hot-add and hot-remove operations.
Guest needs to support _OST (https://lkml.org/lkml/2012/6/25/321)
- add monitor info command to report total guest memory (initial + hot-added)
- add command line options and monitor commands for batch dimm creation/population

Overview:

Dimm devices are modeled with a new qemu command line 

"-dimm id=name,size=sz,node=pxm,populated=on|off"

As already mentioned, the starting physical address for all dimms is calculated
automatically from top of memory, skipping the pci hole at [PCI_HOLE_START, 4G).
Node is defining numa proximity for this dimm. When not defined it defaults
to zero.
"-dimm id=dimm0,size=512M,node=0,populated=off"
will define a 512M memory slot belonging to numa node 0.

Dimms are added or removed with a new hmp command "dimm_add/dimm_del":
Hot-add syntax: "dimm_add id"
Hot-remove syntax: "dimm_del id"

Issues:

- Live migration works as long as populated field is changed to "on" for
hotplugged dimms at the destination qemu command line (patch 12/21 lifts
this requirement). The DimmState structure does not yet define a
VMStateDescription, but i assume this is the preferred way to pass state
for migration.

- Dimms are abstracted as qdevices attached to the main system bus. However,
memory hotplugging has its own side channel ignoring main_system_bus's hotplug
incapability. A cleaner integration is still needed, probably attaching memory
devices as children-links of an acpi-capable device (in the pc case acpi_piix4)
instead of the system bus (TBD). Then device_add/device_del instead of new
commands can hopefully be used.

Comments/review welcome.

series is based on uq/master for qemu-kvm, and master for seabios. Can be found
also at:
http://github.com/vliaskov/qemu-kvm/commits/memhp-v2
http://github.com/vliaskov/seabios/commits/memhp-v2

Vasilis Liaskovitis (14):
  dimm: Implement memory device abstraction
  acpi_piix4: Implement memory device hotplug registers
  pc: calculate dimm physical addresses and adjust memory map
  pc: Add dimm paravirt SRAT info
  Implement "-dimm" command line option
  Implement dimm_add and dimm_del commands for hmp and qmp
  fix live-migration when "populated=on" is missing
  Implement memory hotplug notification lists
  acpi_piix4: _OST dimm support
  acpi_piix4: Update dimm state on VM reboot
  acpi_piix4: Update dimm bitmap state on hot-remove fail
  Implement "info memtotal" and "query-memtotal"
  Implement -dimms, -dimmspop command line options
  Implement mem_increase, mem_decrease hmp/qmp commands

 arch_init.c                 |   23 ++-
 docs/specs/acpi_hotplug.txt |   46 +++++
 docs/specs/fwcfg.txt        |   28 +++
 hmp-commands.hx             |   67 +++++++
 hmp.c                       |   24 +++
 hmp.h                       |    2 +
 hw/Makefile.objs            |    2 +-
 hw/acpi_piix4.c             |  131 ++++++++++++-
 hw/dimm.c                   |  449 +++++++++++++++++++++++++++++++++++++++++++
 hw/dimm.h                   |   72 +++++++
 hw/pc.c                     |   94 +++++++++-
 hw/pc.h                     |    6 +
 hw/pc_piix.c                |   18 ++-
 monitor.c                   |   35 ++++
 monitor.h                   |    5 +
 qapi-schema.json            |   38 ++++
 qemu-config.c               |   70 +++++++
 qemu-options.hx             |   15 ++
 qmp-commands.hx             |  137 +++++++++++++
 sysemu.h                    |    1 +
 vl.c                        |  122 ++++++++++++-
 21 files changed, 1368 insertions(+), 17 deletions(-)
 create mode 100644 docs/specs/acpi_hotplug.txt
 create mode 100644 docs/specs/fwcfg.txt
 create mode 100644 hw/dimm.c
 create mode 100644 hw/dimm.h

Vasilis Liaskovitis (7):
  Add ACPI_EXTRACT_DEVICE* macros
  Add SSDT memory device support
  acpi-dsdt: Implement functions for memory hotplug.
  acpi: generate hotplug memory devices.
  pciinit: Fix pcimem_start value
  acpi_dsdt: Support _OST dimm method
  acpi_dsdt: Revert internal dimm state on _OST failure
 
 Makefile              |    2 +-
 src/acpi-dsdt.dsl     |  120 ++++++++++++++++++++++++++++++++++++-
 src/acpi.c            |  158 +++++++++++++++++++++++++++++++++++++++++++++++--
 src/pciinit.c         |    2 +-
 src/ssdt-mem.dsl      |   69 +++++++++++++++++++++
 tools/acpi_extract.py |   28 +++++++++
 6 files changed, 369 insertions(+), 10 deletions(-)
 create mode 100644 src/ssdt-mem.dsl

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 01/21][SeaBIOS] Add ACPI_EXTRACT_DEVICE* macros
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

This allows to extract the beginning, end and name of a Device object.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 tools/acpi_extract.py |   28 ++++++++++++++++++++++++++++
 1 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/tools/acpi_extract.py b/tools/acpi_extract.py
index 167a322..cb2540e 100755
--- a/tools/acpi_extract.py
+++ b/tools/acpi_extract.py
@@ -195,6 +195,28 @@ def aml_package_start(offset):
     offset += 1
     return offset + aml_pkglen_bytes(offset) + 1
 
+def aml_device_start(offset):
+    #0x5B 0x82 DeviceOp PkgLength NameString ProcID
+    if ((aml[offset] != 0x5B) or (aml[offset + 1] != 0x82)):
+        die( "Name offset 0x%x: expected 0x5B 0x83 actual 0x%x 0x%x" %
+             (offset, aml[offset], aml[offset + 1]));
+    return offset
+
+def aml_device_string(offset):
+    #0x5B 0x82 DeviceOp PkgLength NameString ProcID
+    start = aml_device_start(offset)
+    offset += 2
+    pkglenbytes = aml_pkglen_bytes(offset)
+    offset += pkglenbytes
+    return offset
+
+def aml_device_end(offset):
+    start = aml_device_start(offset)
+    offset += 2
+    pkglenbytes = aml_pkglen_bytes(offset)
+    pkglen = aml_pkglen(offset)
+    return offset + pkglen
+
 lineno = 0
 for line in fileinput.input():
     # Strip trailing newline
@@ -279,6 +301,12 @@ for i in range(len(asl)):
         offset = aml_processor_end(offset)
     elif (directive == "ACPI_EXTRACT_PKG_START"):
         offset = aml_package_start(offset)
+    elif (directive == "ACPI_EXTRACT_DEVICE_START"):
+        offset = aml_device_start(offset)
+    elif (directive == "ACPI_EXTRACT_DEVICE_STRING"):
+        offset = aml_device_string(offset)
+    elif (directive == "ACPI_EXTRACT_DEVICE_END"):
+        offset = aml_device_end(offset)
     else:
         die("Unsupported directive %s" % directive)
 
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 01/21][SeaBIOS] Add ACPI_EXTRACT_DEVICE* macros
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

This allows to extract the beginning, end and name of a Device object.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 tools/acpi_extract.py |   28 ++++++++++++++++++++++++++++
 1 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/tools/acpi_extract.py b/tools/acpi_extract.py
index 167a322..cb2540e 100755
--- a/tools/acpi_extract.py
+++ b/tools/acpi_extract.py
@@ -195,6 +195,28 @@ def aml_package_start(offset):
     offset += 1
     return offset + aml_pkglen_bytes(offset) + 1
 
+def aml_device_start(offset):
+    #0x5B 0x82 DeviceOp PkgLength NameString ProcID
+    if ((aml[offset] != 0x5B) or (aml[offset + 1] != 0x82)):
+        die( "Name offset 0x%x: expected 0x5B 0x83 actual 0x%x 0x%x" %
+             (offset, aml[offset], aml[offset + 1]));
+    return offset
+
+def aml_device_string(offset):
+    #0x5B 0x82 DeviceOp PkgLength NameString ProcID
+    start = aml_device_start(offset)
+    offset += 2
+    pkglenbytes = aml_pkglen_bytes(offset)
+    offset += pkglenbytes
+    return offset
+
+def aml_device_end(offset):
+    start = aml_device_start(offset)
+    offset += 2
+    pkglenbytes = aml_pkglen_bytes(offset)
+    pkglen = aml_pkglen(offset)
+    return offset + pkglen
+
 lineno = 0
 for line in fileinput.input():
     # Strip trailing newline
@@ -279,6 +301,12 @@ for i in range(len(asl)):
         offset = aml_processor_end(offset)
     elif (directive == "ACPI_EXTRACT_PKG_START"):
         offset = aml_package_start(offset)
+    elif (directive == "ACPI_EXTRACT_DEVICE_START"):
+        offset = aml_device_start(offset)
+    elif (directive == "ACPI_EXTRACT_DEVICE_STRING"):
+        offset = aml_device_string(offset)
+    elif (directive == "ACPI_EXTRACT_DEVICE_END"):
+        offset = aml_device_end(offset)
     else:
         die("Unsupported directive %s" % directive)
 
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 02/21][SeaBIOS] Add SSDT memory device support
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

Define SSDT hotplug-able memory devices in _SB namespace. The dynamically
generated SSDT includes per memory device hotplug methods. These methods
just call methods defined in the DSDT. Also dynamically generate a MTFY
method and a MEON array of the online/available memory devices.  ACPI
extraction macros are used to place the AML code in variables later used by
src/acpi. The design is taken from SSDT cpu generation.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 Makefile         |    2 +-
 src/ssdt-mem.dsl |   65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+), 1 deletions(-)
 create mode 100644 src/ssdt-mem.dsl

diff --git a/Makefile b/Makefile
index fe974f7..299069e 100644
--- a/Makefile
+++ b/Makefile
@@ -228,7 +228,7 @@ $(OUT)%.hex: src/%.dsl ./tools/acpi_extract_preprocess.py ./tools/acpi_extract.p
 	$(Q)$(PYTHON) ./tools/acpi_extract.py $(OUT)$*.lst > $(OUT)$*.off
 	$(Q)cat $(OUT)$*.off > $@
 
-$(OUT)ccode32flat.o: $(OUT)acpi-dsdt.hex $(OUT)ssdt-proc.hex $(OUT)ssdt-pcihp.hex
+$(OUT)ccode32flat.o: $(OUT)acpi-dsdt.hex $(OUT)ssdt-proc.hex $(OUT)ssdt-pcihp.hex $(OUT)ssdt-mem.hex
 
 ################ Kconfig rules
 
diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl
new file mode 100644
index 0000000..ee322f0
--- /dev/null
+++ b/src/ssdt-mem.dsl
@@ -0,0 +1,65 @@
+/* This file is the basis for the ssdt_mem[] variable in src/acpi.c.
+ * It is similar in design to the ssdt_proc variable.
+ * It defines the contents of the per-cpu Processor() object.  At
+ * runtime, a dynamically generated SSDT will contain one copy of this
+ * AML snippet for every possible memory device in the system.  The
+ * objects will * be placed in the \_SB_ namespace.
+ *
+ * In addition to the aml code generated from this file, the
+ * src/acpi.c file creates a MEMNTFY method with an entry for each memdevice:
+ *     Method(MTFY, 2) {
+ *         If (LEqual(Arg0, 0x00)) { Notify(MP00, Arg1) }
+ *         If (LEqual(Arg0, 0x01)) { Notify(MP01, Arg1) }
+ *         ...
+ *     }
+ * and a MEON array with the list of active and inactive memory devices:
+ *     Name(MEON, Package() { One, One, ..., Zero, Zero, ... })
+ */
+ACPI_EXTRACT_ALL_CODE ssdm_mem_aml
+
+DefinitionBlock ("ssdt-mem.aml", "SSDT", 0x02, "BXPC", "CSSDT", 0x1)
+/*  v------------------ DO NOT EDIT ------------------v */
+{
+    ACPI_EXTRACT_DEVICE_START ssdt_mem_start
+    ACPI_EXTRACT_DEVICE_END ssdt_mem_end
+    ACPI_EXTRACT_DEVICE_STRING ssdt_mem_name
+    Device(MPAA) {
+        ACPI_EXTRACT_NAME_BYTE_CONST ssdt_mem_id
+        Name(ID, 0xAA)
+/*  ^------------------ DO NOT EDIT ------------------^
+ *
+ * The src/acpi.c code requires the above layout so that it can update
+ * MPAA and 0xAA with the appropriate MEMDEVICE id (see
+ * SD_OFFSET_MEMHEX/MEMID1/MEMID2).  Don't change the above without
+ * also updating the C code.
+ */
+        Name(_HID, EISAID("PNP0C80"))
+        Name(_PXM, 0xAA)
+
+        External(CMST, MethodObj)
+        External(MPEJ, MethodObj)
+
+        Name(_CRS, ResourceTemplate() {
+            QwordMemory(
+               ResourceConsumer,
+               ,
+               MinFixed,
+               MaxFixed,
+               Cacheable,
+               ReadWrite,
+               0x0,
+               0xDEADBEEF,
+               0xE6ADBEEE,
+               0x00000000,
+               0x08000000,
+               )
+        })
+        Method (_STA, 0) {
+            Return(CMST(ID))        
+        }    
+        Method (_EJ0, 1, NotSerialized) {
+            MPEJ(ID, Arg0)
+        }
+    }
+}    
+
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 02/21][SeaBIOS] Add SSDT memory device support
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

Define SSDT hotplug-able memory devices in _SB namespace. The dynamically
generated SSDT includes per memory device hotplug methods. These methods
just call methods defined in the DSDT. Also dynamically generate a MTFY
method and a MEON array of the online/available memory devices.  ACPI
extraction macros are used to place the AML code in variables later used by
src/acpi. The design is taken from SSDT cpu generation.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 Makefile         |    2 +-
 src/ssdt-mem.dsl |   65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+), 1 deletions(-)
 create mode 100644 src/ssdt-mem.dsl

diff --git a/Makefile b/Makefile
index fe974f7..299069e 100644
--- a/Makefile
+++ b/Makefile
@@ -228,7 +228,7 @@ $(OUT)%.hex: src/%.dsl ./tools/acpi_extract_preprocess.py ./tools/acpi_extract.p
 	$(Q)$(PYTHON) ./tools/acpi_extract.py $(OUT)$*.lst > $(OUT)$*.off
 	$(Q)cat $(OUT)$*.off > $@
 
-$(OUT)ccode32flat.o: $(OUT)acpi-dsdt.hex $(OUT)ssdt-proc.hex $(OUT)ssdt-pcihp.hex
+$(OUT)ccode32flat.o: $(OUT)acpi-dsdt.hex $(OUT)ssdt-proc.hex $(OUT)ssdt-pcihp.hex $(OUT)ssdt-mem.hex
 
 ################ Kconfig rules
 
diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl
new file mode 100644
index 0000000..ee322f0
--- /dev/null
+++ b/src/ssdt-mem.dsl
@@ -0,0 +1,65 @@
+/* This file is the basis for the ssdt_mem[] variable in src/acpi.c.
+ * It is similar in design to the ssdt_proc variable.
+ * It defines the contents of the per-cpu Processor() object.  At
+ * runtime, a dynamically generated SSDT will contain one copy of this
+ * AML snippet for every possible memory device in the system.  The
+ * objects will * be placed in the \_SB_ namespace.
+ *
+ * In addition to the aml code generated from this file, the
+ * src/acpi.c file creates a MEMNTFY method with an entry for each memdevice:
+ *     Method(MTFY, 2) {
+ *         If (LEqual(Arg0, 0x00)) { Notify(MP00, Arg1) }
+ *         If (LEqual(Arg0, 0x01)) { Notify(MP01, Arg1) }
+ *         ...
+ *     }
+ * and a MEON array with the list of active and inactive memory devices:
+ *     Name(MEON, Package() { One, One, ..., Zero, Zero, ... })
+ */
+ACPI_EXTRACT_ALL_CODE ssdm_mem_aml
+
+DefinitionBlock ("ssdt-mem.aml", "SSDT", 0x02, "BXPC", "CSSDT", 0x1)
+/*  v------------------ DO NOT EDIT ------------------v */
+{
+    ACPI_EXTRACT_DEVICE_START ssdt_mem_start
+    ACPI_EXTRACT_DEVICE_END ssdt_mem_end
+    ACPI_EXTRACT_DEVICE_STRING ssdt_mem_name
+    Device(MPAA) {
+        ACPI_EXTRACT_NAME_BYTE_CONST ssdt_mem_id
+        Name(ID, 0xAA)
+/*  ^------------------ DO NOT EDIT ------------------^
+ *
+ * The src/acpi.c code requires the above layout so that it can update
+ * MPAA and 0xAA with the appropriate MEMDEVICE id (see
+ * SD_OFFSET_MEMHEX/MEMID1/MEMID2).  Don't change the above without
+ * also updating the C code.
+ */
+        Name(_HID, EISAID("PNP0C80"))
+        Name(_PXM, 0xAA)
+
+        External(CMST, MethodObj)
+        External(MPEJ, MethodObj)
+
+        Name(_CRS, ResourceTemplate() {
+            QwordMemory(
+               ResourceConsumer,
+               ,
+               MinFixed,
+               MaxFixed,
+               Cacheable,
+               ReadWrite,
+               0x0,
+               0xDEADBEEF,
+               0xE6ADBEEE,
+               0x00000000,
+               0x08000000,
+               )
+        })
+        Method (_STA, 0) {
+            Return(CMST(ID))        
+        }    
+        Method (_EJ0, 1, NotSerialized) {
+            MPEJ(ID, Arg0)
+        }
+    }
+}    
+
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 03/21][SeaBIOS] acpi-dsdt: Implement functions for memory hotplug
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

Extend the DSDT to include methods for handling memory hot-add and hot-remove
notifications and memory device status requests. These functions are called
from the memory device SSDT methods.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 src/acpi-dsdt.dsl |   70 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 2060686..5d3e92b 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -737,6 +737,71 @@ DefinitionBlock (
             }
             Return(One)
         }
+        /* Objects filled in by run-time generated SSDT */
+        External(MTFY, MethodObj)
+        External(MEON, PkgObj)
+
+        Method (CMST, 1, NotSerialized) {
+            // _STA method - return ON status of memdevice
+            // Local0 = MEON flag for this cpu
+            Store(DerefOf(Index(MEON, Arg0)), Local0)
+            If (Local0) { Return(0xF) } Else { Return(0x0) }
+        }
+
+        /* Memory hotplug notify array */
+        OperationRegion(MEST, SystemIO, 0xaf80, 32)
+        Field (MEST, ByteAcc, NoLock, Preserve)
+        {
+            MES, 256
+        }
+ 
+        /* Memory eject byte */
+        OperationRegion(MEMJ, SystemIO, 0xafa0, 1)
+        Field (MEMJ, ByteAcc, NoLock, Preserve)
+        {
+            MPE, 8
+        }
+        
+        Method(MESC, 0) {
+            // Local5 = active memdevice bitmap
+            Store (MES, Local5)
+            // Local2 = last read byte from bitmap
+            Store (Zero, Local2)
+            // Local0 = memory device iterator
+            Store (Zero, Local0)
+            While (LLess(Local0, SizeOf(MEON))) {
+                // Local1 = MEON flag for this memory device
+                Store(DerefOf(Index(MEON, Local0)), Local1)
+                If (And(Local0, 0x07)) {
+                    // Shift down previously read bitmap byte
+                    ShiftRight(Local2, 1, Local2)
+                } Else {
+                    // Read next byte from memdevice bitmap
+                    Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), Local2)
+                }
+                // Local3 = active state for this memory device
+                Store(And(Local2, 1), Local3)
+
+                If (LNotEqual(Local1, Local3)) {
+                    // State change - update MEON with new state
+                    Store(Local3, Index(MEON, Local0))
+                    // Do MEM notify
+                    If (LEqual(Local3, 1)) {
+                        MTFY(Local0, 1)
+                    } Else {
+                        MTFY(Local0, 3)
+                    }
+                }
+                Increment(Local0)
+            }
+            Return(One)
+        }
+
+        Method (MPEJ, 2, NotSerialized) {
+            // _EJ0 method - eject callback
+            Store(Arg0, MPE)
+            Sleep(200)
+        }
     }
 
 
@@ -759,8 +824,9 @@ DefinitionBlock (
             // CPU hotplug event
             Return(\_SB.PRSC())
         }
-        Method(_L03) {
-            Return(0x01)
+        Method(_E03) {
+            // Memory hotplug event
+            Return(\_SB.MESC())
         }
         Method(_L04) {
             Return(0x01)
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 03/21][SeaBIOS] acpi-dsdt: Implement functions for memory hotplug
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

Extend the DSDT to include methods for handling memory hot-add and hot-remove
notifications and memory device status requests. These functions are called
from the memory device SSDT methods.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 src/acpi-dsdt.dsl |   70 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 2060686..5d3e92b 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -737,6 +737,71 @@ DefinitionBlock (
             }
             Return(One)
         }
+        /* Objects filled in by run-time generated SSDT */
+        External(MTFY, MethodObj)
+        External(MEON, PkgObj)
+
+        Method (CMST, 1, NotSerialized) {
+            // _STA method - return ON status of memdevice
+            // Local0 = MEON flag for this cpu
+            Store(DerefOf(Index(MEON, Arg0)), Local0)
+            If (Local0) { Return(0xF) } Else { Return(0x0) }
+        }
+
+        /* Memory hotplug notify array */
+        OperationRegion(MEST, SystemIO, 0xaf80, 32)
+        Field (MEST, ByteAcc, NoLock, Preserve)
+        {
+            MES, 256
+        }
+ 
+        /* Memory eject byte */
+        OperationRegion(MEMJ, SystemIO, 0xafa0, 1)
+        Field (MEMJ, ByteAcc, NoLock, Preserve)
+        {
+            MPE, 8
+        }
+        
+        Method(MESC, 0) {
+            // Local5 = active memdevice bitmap
+            Store (MES, Local5)
+            // Local2 = last read byte from bitmap
+            Store (Zero, Local2)
+            // Local0 = memory device iterator
+            Store (Zero, Local0)
+            While (LLess(Local0, SizeOf(MEON))) {
+                // Local1 = MEON flag for this memory device
+                Store(DerefOf(Index(MEON, Local0)), Local1)
+                If (And(Local0, 0x07)) {
+                    // Shift down previously read bitmap byte
+                    ShiftRight(Local2, 1, Local2)
+                } Else {
+                    // Read next byte from memdevice bitmap
+                    Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), Local2)
+                }
+                // Local3 = active state for this memory device
+                Store(And(Local2, 1), Local3)
+
+                If (LNotEqual(Local1, Local3)) {
+                    // State change - update MEON with new state
+                    Store(Local3, Index(MEON, Local0))
+                    // Do MEM notify
+                    If (LEqual(Local3, 1)) {
+                        MTFY(Local0, 1)
+                    } Else {
+                        MTFY(Local0, 3)
+                    }
+                }
+                Increment(Local0)
+            }
+            Return(One)
+        }
+
+        Method (MPEJ, 2, NotSerialized) {
+            // _EJ0 method - eject callback
+            Store(Arg0, MPE)
+            Sleep(200)
+        }
     }
 
 
@@ -759,8 +824,9 @@ DefinitionBlock (
             // CPU hotplug event
             Return(\_SB.PRSC())
         }
-        Method(_L03) {
-            Return(0x01)
+        Method(_E03) {
+            // Memory hotplug event
+            Return(\_SB.MESC())
         }
         Method(_L04) {
             Return(0x01)
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 04/21][SeaBIOS] acpi: generate hotplug memory devices
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

The memory device generation is guided by qemu paravirt info. Seabios
first uses the info to setup SRAT entries for the hotplug-able memory slots.
Afterwards, build_memssdt uses the created SRAT entries to generate
appropriate memory device objects. One memory device (and corresponding SRAT
entry) is generated for each hotplug-able qemu memslot. Currently no SSDT
memory device is created for initial system memory.

We only support up to 255 DIMMs for now (PackageOp used for the MEON array can
only describe an array of at most 255 elements. VarPackageOp would be needed to
support more than 255 devices)

v1->v2:
Seabios reads mems_sts from qemu to build e820_map
SSDT size and some offsets are calculated with extraction macros.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 src/acpi.c |  158 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 152 insertions(+), 6 deletions(-)

diff --git a/src/acpi.c b/src/acpi.c
index 55e4607..c83e8c7 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -510,6 +510,127 @@ build_ssdt(void)
     return ssdt;
 }
 
+#include "ssdt-mem.hex"
+
+/* 0x5B 0x82 DeviceOp PkgLength NameString DimmID */
+#define MEM_BASE 0xaf80
+#define SD_MEM (ssdm_mem_aml + *ssdt_mem_start)
+#define SD_MEMSIZEOF (*ssdt_mem_end - *ssdt_mem_start)
+#define SD_OFFSET_MEMHEX (*ssdt_mem_name - *ssdt_mem_start + 2)
+#define SD_OFFSET_MEMID (*ssdt_mem_id - *ssdt_mem_start)
+#define SD_OFFSET_PXMID 31
+#define SD_OFFSET_MEMSTART 55
+#define SD_OFFSET_MEMEND   63
+#define SD_OFFSET_MEMSIZE  79
+
+u64 nb_hp_memslots = 0;
+struct srat_memory_affinity *mem;
+
+static void build_memdev(u8 *ssdt_ptr, int i, u64 mem_base, u64 mem_len, u8 node)
+{
+    memcpy(ssdt_ptr, SD_MEM, SD_MEMSIZEOF);
+    ssdt_ptr[SD_OFFSET_MEMHEX] = getHex(i >> 4);
+    ssdt_ptr[SD_OFFSET_MEMHEX+1] = getHex(i);
+    ssdt_ptr[SD_OFFSET_MEMID] = i;
+    ssdt_ptr[SD_OFFSET_PXMID] = node;
+    *(u64*)(ssdt_ptr + SD_OFFSET_MEMSTART) = mem_base;
+    *(u64*)(ssdt_ptr + SD_OFFSET_MEMEND) = mem_base + mem_len;
+    *(u64*)(ssdt_ptr + SD_OFFSET_MEMSIZE) = mem_len;
+}
+
+static void*
+build_memssdt(void)
+{
+    u64 mem_base;
+    u64 mem_len;
+    u8  node;
+    int i;
+    struct srat_memory_affinity *entry = mem;
+    u64 nb_memdevs = nb_hp_memslots;
+    u8  memslot_status, enabled;
+
+    int length = ((1+3+4)
+                  + (nb_memdevs * SD_MEMSIZEOF)
+                  + (1+2+5+(12*nb_memdevs))
+                  + (6+2+1+(1*nb_memdevs)));
+    u8 *ssdt = malloc_high(sizeof(struct acpi_table_header) + length);
+    if (! ssdt) {
+        warn_noalloc();
+        return NULL;
+    }
+    u8 *ssdt_ptr = ssdt + sizeof(struct acpi_table_header);
+
+    // build Scope(_SB_) header
+    *(ssdt_ptr++) = 0x10; // ScopeOp
+    ssdt_ptr = encodeLen(ssdt_ptr, length-1, 3);
+    *(ssdt_ptr++) = '_';
+   *(ssdt_ptr++) = 'S';
+    *(ssdt_ptr++) = 'B';
+    *(ssdt_ptr++) = '_';
+
+    for (i = 0; i < nb_memdevs; i++) {
+        mem_base = (((u64)(entry->base_addr_high) << 32 )| entry->base_addr_low);
+        mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low);
+        node = entry->proximity[0];
+        build_memdev(ssdt_ptr, i, mem_base, mem_len, node);
+        ssdt_ptr += SD_MEMSIZEOF;
+        entry++;
+    }
+
+    // build "Method(MTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CM00, Arg1)} ...}"
+    *(ssdt_ptr++) = 0x14; // MethodOp
+    ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*nb_memdevs), 2);
+    *(ssdt_ptr++) = 'M';
+    *(ssdt_ptr++) = 'T';
+    *(ssdt_ptr++) = 'F';
+    *(ssdt_ptr++) = 'Y';
+    *(ssdt_ptr++) = 0x02;
+    for (i=0; i<nb_memdevs; i++) {
+        *(ssdt_ptr++) = 0xA0; // IfOp
+       ssdt_ptr = encodeLen(ssdt_ptr, 11, 1);
+        *(ssdt_ptr++) = 0x93; // LEqualOp
+        *(ssdt_ptr++) = 0x68; // Arg0Op
+        *(ssdt_ptr++) = 0x0A; // BytePrefix
+        *(ssdt_ptr++) = i;
+        *(ssdt_ptr++) = 0x86; // NotifyOp
+        *(ssdt_ptr++) = 'M';
+        *(ssdt_ptr++) = 'P';
+        *(ssdt_ptr++) = getHex(i >> 4);
+        *(ssdt_ptr++) = getHex(i);
+        *(ssdt_ptr++) = 0x69; // Arg1Op
+    }
+
+    // build "Name(MEON, Package() { One, One, ..., Zero, Zero, ... })"
+    *(ssdt_ptr++) = 0x08; // NameOp
+    *(ssdt_ptr++) = 'M';
+    *(ssdt_ptr++) = 'E';
+    *(ssdt_ptr++) = 'O';
+    *(ssdt_ptr++) = 'N';
+    *(ssdt_ptr++) = 0x12; // PackageOp
+    ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*nb_memdevs), 2);
+    *(ssdt_ptr++) = nb_memdevs;
+
+    entry = mem;
+    memslot_status = 0;
+
+    for (i = 0; i < nb_memdevs; i++) {
+        enabled = 0;
+        if (i % 8 == 0)
+            memslot_status = inb(MEM_BASE + i/8);
+        enabled = memslot_status & 1;
+        mem_base = (((u64)(entry->base_addr_high) << 32 )| entry->base_addr_low);
+        mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low);
+        *(ssdt_ptr++) = enabled ? 0x01 : 0x00;
+        if (enabled)
+            add_e820(mem_base, mem_len, E820_RAM);
+        memslot_status = memslot_status >> 1;
+        entry++;
+    }
+    build_header((void*)ssdt, SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
+
+    return ssdt;
+}
+
 #include "ssdt-pcihp.hex"
 
 #define PCI_RMV_BASE 0xae0c
@@ -618,9 +739,6 @@ build_srat(void)
 {
     int nb_numa_nodes = qemu_cfg_get_numa_nodes();
 
-    if (nb_numa_nodes == 0)
-        return NULL;
-
     u64 *numadata = malloc_tmphigh(sizeof(u64) * (MaxCountCPUs + nb_numa_nodes));
     if (!numadata) {
         warn_noalloc();
@@ -629,10 +747,11 @@ build_srat(void)
 
     qemu_cfg_get_numa_data(numadata, MaxCountCPUs + nb_numa_nodes);
 
+    qemu_cfg_get_numa_data(&nb_hp_memslots, 1);
     struct system_resource_affinity_table *srat;
     int srat_size = sizeof(*srat) +
         sizeof(struct srat_processor_affinity) * MaxCountCPUs +
-        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + 2);
+        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + nb_hp_memslots + 2);
 
     srat = malloc_high(srat_size);
     if (!srat) {
@@ -667,7 +786,7 @@ build_srat(void)
      * from 640k-1M and possibly another one from 3.5G-4G.
      */
     struct srat_memory_affinity *numamem = (void*)core;
-    int slots = 0;
+    int slots = 0, node;
     u64 mem_len, mem_base, next_base = 0;
 
     acpi_build_srat_memory(numamem, 0, 640*1024, 0, 1);
@@ -694,10 +813,36 @@ build_srat(void)
             next_base += (1ULL << 32) - RamSize;
         }
         acpi_build_srat_memory(numamem, mem_base, mem_len, i-1, 1);
+
         numamem++;
         slots++;
+
+    }
+    mem = (void*)numamem;
+
+    if (nb_hp_memslots) {
+        u64 *hpmemdata = malloc_tmphigh(sizeof(u64) * (3 * nb_hp_memslots));
+        if (!hpmemdata) {
+            warn_noalloc();
+            free(hpmemdata);
+            free(numadata);
+            return NULL;
+        }
+
+        qemu_cfg_get_numa_data(hpmemdata, 3 * nb_hp_memslots);
+
+        for (i = 1; i < nb_hp_memslots + 1; ++i) {
+            mem_base = *hpmemdata++;
+            mem_len = *hpmemdata++;
+            node = *hpmemdata++;
+            acpi_build_srat_memory(numamem, mem_base, mem_len, node, 1);
+            numamem++;
+            slots++;
+        }
+        free(hpmemdata);
     }
-    for (; slots < nb_numa_nodes + 2; slots++) {
+
+    for (; slots < nb_numa_nodes + nb_hp_memslots + 2; slots++) {
         acpi_build_srat_memory(numamem, 0, 0, 0, 0);
         numamem++;
     }
@@ -748,6 +893,7 @@ acpi_bios_init(void)
     ACPI_INIT_TABLE(build_madt());
     ACPI_INIT_TABLE(build_hpet());
     ACPI_INIT_TABLE(build_srat());
+    ACPI_INIT_TABLE(build_memssdt());
     ACPI_INIT_TABLE(build_pcihp());
 
     u16 i, external_tables = qemu_cfg_acpi_additional_tables();
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 04/21][SeaBIOS] acpi: generate hotplug memory devices
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

The memory device generation is guided by qemu paravirt info. Seabios
first uses the info to setup SRAT entries for the hotplug-able memory slots.
Afterwards, build_memssdt uses the created SRAT entries to generate
appropriate memory device objects. One memory device (and corresponding SRAT
entry) is generated for each hotplug-able qemu memslot. Currently no SSDT
memory device is created for initial system memory.

We only support up to 255 DIMMs for now (PackageOp used for the MEON array can
only describe an array of at most 255 elements. VarPackageOp would be needed to
support more than 255 devices)

v1->v2:
Seabios reads mems_sts from qemu to build e820_map
SSDT size and some offsets are calculated with extraction macros.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 src/acpi.c |  158 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 152 insertions(+), 6 deletions(-)

diff --git a/src/acpi.c b/src/acpi.c
index 55e4607..c83e8c7 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -510,6 +510,127 @@ build_ssdt(void)
     return ssdt;
 }
 
+#include "ssdt-mem.hex"
+
+/* 0x5B 0x82 DeviceOp PkgLength NameString DimmID */
+#define MEM_BASE 0xaf80
+#define SD_MEM (ssdm_mem_aml + *ssdt_mem_start)
+#define SD_MEMSIZEOF (*ssdt_mem_end - *ssdt_mem_start)
+#define SD_OFFSET_MEMHEX (*ssdt_mem_name - *ssdt_mem_start + 2)
+#define SD_OFFSET_MEMID (*ssdt_mem_id - *ssdt_mem_start)
+#define SD_OFFSET_PXMID 31
+#define SD_OFFSET_MEMSTART 55
+#define SD_OFFSET_MEMEND   63
+#define SD_OFFSET_MEMSIZE  79
+
+u64 nb_hp_memslots = 0;
+struct srat_memory_affinity *mem;
+
+static void build_memdev(u8 *ssdt_ptr, int i, u64 mem_base, u64 mem_len, u8 node)
+{
+    memcpy(ssdt_ptr, SD_MEM, SD_MEMSIZEOF);
+    ssdt_ptr[SD_OFFSET_MEMHEX] = getHex(i >> 4);
+    ssdt_ptr[SD_OFFSET_MEMHEX+1] = getHex(i);
+    ssdt_ptr[SD_OFFSET_MEMID] = i;
+    ssdt_ptr[SD_OFFSET_PXMID] = node;
+    *(u64*)(ssdt_ptr + SD_OFFSET_MEMSTART) = mem_base;
+    *(u64*)(ssdt_ptr + SD_OFFSET_MEMEND) = mem_base + mem_len;
+    *(u64*)(ssdt_ptr + SD_OFFSET_MEMSIZE) = mem_len;
+}
+
+static void*
+build_memssdt(void)
+{
+    u64 mem_base;
+    u64 mem_len;
+    u8  node;
+    int i;
+    struct srat_memory_affinity *entry = mem;
+    u64 nb_memdevs = nb_hp_memslots;
+    u8  memslot_status, enabled;
+
+    int length = ((1+3+4)
+                  + (nb_memdevs * SD_MEMSIZEOF)
+                  + (1+2+5+(12*nb_memdevs))
+                  + (6+2+1+(1*nb_memdevs)));
+    u8 *ssdt = malloc_high(sizeof(struct acpi_table_header) + length);
+    if (! ssdt) {
+        warn_noalloc();
+        return NULL;
+    }
+    u8 *ssdt_ptr = ssdt + sizeof(struct acpi_table_header);
+
+    // build Scope(_SB_) header
+    *(ssdt_ptr++) = 0x10; // ScopeOp
+    ssdt_ptr = encodeLen(ssdt_ptr, length-1, 3);
+    *(ssdt_ptr++) = '_';
+   *(ssdt_ptr++) = 'S';
+    *(ssdt_ptr++) = 'B';
+    *(ssdt_ptr++) = '_';
+
+    for (i = 0; i < nb_memdevs; i++) {
+        mem_base = (((u64)(entry->base_addr_high) << 32 )| entry->base_addr_low);
+        mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low);
+        node = entry->proximity[0];
+        build_memdev(ssdt_ptr, i, mem_base, mem_len, node);
+        ssdt_ptr += SD_MEMSIZEOF;
+        entry++;
+    }
+
+    // build "Method(MTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CM00, Arg1)} ...}"
+    *(ssdt_ptr++) = 0x14; // MethodOp
+    ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*nb_memdevs), 2);
+    *(ssdt_ptr++) = 'M';
+    *(ssdt_ptr++) = 'T';
+    *(ssdt_ptr++) = 'F';
+    *(ssdt_ptr++) = 'Y';
+    *(ssdt_ptr++) = 0x02;
+    for (i=0; i<nb_memdevs; i++) {
+        *(ssdt_ptr++) = 0xA0; // IfOp
+       ssdt_ptr = encodeLen(ssdt_ptr, 11, 1);
+        *(ssdt_ptr++) = 0x93; // LEqualOp
+        *(ssdt_ptr++) = 0x68; // Arg0Op
+        *(ssdt_ptr++) = 0x0A; // BytePrefix
+        *(ssdt_ptr++) = i;
+        *(ssdt_ptr++) = 0x86; // NotifyOp
+        *(ssdt_ptr++) = 'M';
+        *(ssdt_ptr++) = 'P';
+        *(ssdt_ptr++) = getHex(i >> 4);
+        *(ssdt_ptr++) = getHex(i);
+        *(ssdt_ptr++) = 0x69; // Arg1Op
+    }
+
+    // build "Name(MEON, Package() { One, One, ..., Zero, Zero, ... })"
+    *(ssdt_ptr++) = 0x08; // NameOp
+    *(ssdt_ptr++) = 'M';
+    *(ssdt_ptr++) = 'E';
+    *(ssdt_ptr++) = 'O';
+    *(ssdt_ptr++) = 'N';
+    *(ssdt_ptr++) = 0x12; // PackageOp
+    ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*nb_memdevs), 2);
+    *(ssdt_ptr++) = nb_memdevs;
+
+    entry = mem;
+    memslot_status = 0;
+
+    for (i = 0; i < nb_memdevs; i++) {
+        enabled = 0;
+        if (i % 8 == 0)
+            memslot_status = inb(MEM_BASE + i/8);
+        enabled = memslot_status & 1;
+        mem_base = (((u64)(entry->base_addr_high) << 32 )| entry->base_addr_low);
+        mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low);
+        *(ssdt_ptr++) = enabled ? 0x01 : 0x00;
+        if (enabled)
+            add_e820(mem_base, mem_len, E820_RAM);
+        memslot_status = memslot_status >> 1;
+        entry++;
+    }
+    build_header((void*)ssdt, SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
+
+    return ssdt;
+}
+
 #include "ssdt-pcihp.hex"
 
 #define PCI_RMV_BASE 0xae0c
@@ -618,9 +739,6 @@ build_srat(void)
 {
     int nb_numa_nodes = qemu_cfg_get_numa_nodes();
 
-    if (nb_numa_nodes == 0)
-        return NULL;
-
     u64 *numadata = malloc_tmphigh(sizeof(u64) * (MaxCountCPUs + nb_numa_nodes));
     if (!numadata) {
         warn_noalloc();
@@ -629,10 +747,11 @@ build_srat(void)
 
     qemu_cfg_get_numa_data(numadata, MaxCountCPUs + nb_numa_nodes);
 
+    qemu_cfg_get_numa_data(&nb_hp_memslots, 1);
     struct system_resource_affinity_table *srat;
     int srat_size = sizeof(*srat) +
         sizeof(struct srat_processor_affinity) * MaxCountCPUs +
-        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + 2);
+        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + nb_hp_memslots + 2);
 
     srat = malloc_high(srat_size);
     if (!srat) {
@@ -667,7 +786,7 @@ build_srat(void)
      * from 640k-1M and possibly another one from 3.5G-4G.
      */
     struct srat_memory_affinity *numamem = (void*)core;
-    int slots = 0;
+    int slots = 0, node;
     u64 mem_len, mem_base, next_base = 0;
 
     acpi_build_srat_memory(numamem, 0, 640*1024, 0, 1);
@@ -694,10 +813,36 @@ build_srat(void)
             next_base += (1ULL << 32) - RamSize;
         }
         acpi_build_srat_memory(numamem, mem_base, mem_len, i-1, 1);
+
         numamem++;
         slots++;
+
+    }
+    mem = (void*)numamem;
+
+    if (nb_hp_memslots) {
+        u64 *hpmemdata = malloc_tmphigh(sizeof(u64) * (3 * nb_hp_memslots));
+        if (!hpmemdata) {
+            warn_noalloc();
+            free(hpmemdata);
+            free(numadata);
+            return NULL;
+        }
+
+        qemu_cfg_get_numa_data(hpmemdata, 3 * nb_hp_memslots);
+
+        for (i = 1; i < nb_hp_memslots + 1; ++i) {
+            mem_base = *hpmemdata++;
+            mem_len = *hpmemdata++;
+            node = *hpmemdata++;
+            acpi_build_srat_memory(numamem, mem_base, mem_len, node, 1);
+            numamem++;
+            slots++;
+        }
+        free(hpmemdata);
     }
-    for (; slots < nb_numa_nodes + 2; slots++) {
+
+    for (; slots < nb_numa_nodes + nb_hp_memslots + 2; slots++) {
         acpi_build_srat_memory(numamem, 0, 0, 0, 0);
         numamem++;
     }
@@ -748,6 +893,7 @@ acpi_bios_init(void)
     ACPI_INIT_TABLE(build_madt());
     ACPI_INIT_TABLE(build_hpet());
     ACPI_INIT_TABLE(build_srat());
+    ACPI_INIT_TABLE(build_memssdt());
     ACPI_INIT_TABLE(build_pcihp());
 
     u16 i, external_tables = qemu_cfg_acpi_additional_tables();
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 05/21][SeaBIOS] pciinit: Fix pcimem_start value
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

In order to hotplug memory between RamSize and BUILD_PCIMEM_START, the pci
window needs to start at BUILD_PCIMEM_START (0xe0000000).
Otherwise, the guest cannot online new dimms at those ranges due to pci_root
window conflicts. (workaround for linux guest is booting with pci=nocrs)

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 src/pciinit.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index 68f302a..0b11cbe 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -592,7 +592,7 @@ static void pci_region_map_entries(struct pci_bus *busses, struct pci_region *r)
 
 static void pci_bios_map_devices(struct pci_bus *busses)
 {
-    pcimem_start = RamSize;
+    pcimem_start = BUILD_PCIMEM_START;
 
     if (pci_bios_init_root_regions(busses)) {
         struct pci_region r64_mem, r64_pref;
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 05/21][SeaBIOS] pciinit: Fix pcimem_start value
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

In order to hotplug memory between RamSize and BUILD_PCIMEM_START, the pci
window needs to start at BUILD_PCIMEM_START (0xe0000000).
Otherwise, the guest cannot online new dimms at those ranges due to pci_root
window conflicts. (workaround for linux guest is booting with pci=nocrs)

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 src/pciinit.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index 68f302a..0b11cbe 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -592,7 +592,7 @@ static void pci_region_map_entries(struct pci_bus *busses, struct pci_region *r)
 
 static void pci_bios_map_devices(struct pci_bus *busses)
 {
-    pcimem_start = RamSize;
+    pcimem_start = BUILD_PCIMEM_START;
 
     if (pci_bios_init_root_regions(busses)) {
         struct pci_region r64_mem, r64_pref;
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 06/21] dimm: Implement memory device abstraction
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

Each hotplug-able memory slot is a SysBusDevice. A hot-add operation for a
particular dimm creates a new MemoryRegion of the given physical address
offset, size and node proximity, and attaches it to main system memory as a
sub_region. A hot-remove operation detaches and frees the MemoryRegion from
system memory.

This prototype still lacks proper qdev integration: a separate
hotplug side-channel is used and main system bus hotplug capability is
ignored.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hw/Makefile.objs |    2 +-
 hw/dimm.c        |  234 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/dimm.h        |   58 +++++++++++++
 3 files changed, 293 insertions(+), 1 deletions(-)
 create mode 100644 hw/dimm.c
 create mode 100644 hw/dimm.h

diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 3d77259..e2184bf 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -26,7 +26,7 @@ hw-obj-$(CONFIG_I8254) += i8254_common.o i8254.o
 hw-obj-$(CONFIG_PCSPK) += pcspk.o
 hw-obj-$(CONFIG_PCKBD) += pckbd.o
 hw-obj-$(CONFIG_FDC) += fdc.o
-hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o
+hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o dimm.o
 hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
 hw-obj-$(CONFIG_DMA) += dma.o
 hw-obj-$(CONFIG_I82374) += i82374.o
diff --git a/hw/dimm.c b/hw/dimm.c
new file mode 100644
index 0000000..00c4623
--- /dev/null
+++ b/hw/dimm.c
@@ -0,0 +1,234 @@
+/*
+ * Dimm device for Memory Hotplug
+ *
+ * Copyright ProfitBricks GmbH 2012
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "trace.h"
+#include "qdev.h"
+#include "dimm.h"
+#include <time.h>
+#include "../exec-memory.h"
+#include "qmp-commands.h"
+
+static DeviceState *dimm_hotplug_qdev;
+static dimm_hotplug_fn dimm_hotplug;
+static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;
+
+static Property dimm_properties[] = {
+    DEFINE_PROP_END_OF_LIST()
+};
+
+void dimm_populate(DimmState *s)
+{
+    DeviceState *dev= (DeviceState*)s;
+    MemoryRegion *new = NULL;
+
+    new = g_malloc(sizeof(MemoryRegion));
+    memory_region_init_ram(new, dev->id, s->size);
+    vmstate_register_ram_global(new);
+    memory_region_add_subregion(get_system_memory(), s->start, new);
+    s->mr = new;
+    s->populated = true;
+}
+
+
+void dimm_depopulate(DimmState *s)
+{
+    assert(s);
+    if (s->populated) {
+        vmstate_unregister_ram(s->mr, NULL);
+        memory_region_del_subregion(get_system_memory(), s->mr);
+        memory_region_destroy(s->mr);
+        s->populated = false;
+        s->mr = NULL;
+    }
+}
+
+DimmState *dimm_create(char *id, uint64_t size, uint64_t node, uint32_t
+        dimm_idx, bool populated)
+{
+    DeviceState *dev;
+    DimmState *mdev;
+
+    dev = sysbus_create_simple("dimm", -1, NULL);
+    dev->id = id;
+
+    mdev = DIMM(dev);
+    mdev->idx = dimm_idx;
+    mdev->start = 0;
+    mdev->size = size;
+    mdev->node = node;
+    mdev->populated = populated;
+    QTAILQ_INSERT_TAIL(&dimmlist, mdev, nextdimm);
+    return mdev;
+}
+
+void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev)
+{
+    dimm_hotplug_qdev = qdev;
+    dimm_hotplug = hotplug;
+    dimm_scan_populated();
+}
+
+void dimm_activate(DimmState *slot)
+{
+    dimm_populate(slot);
+    if (dimm_hotplug)
+        dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 1);
+}
+
+void dimm_deactivate(DimmState *slot)
+{
+    if (dimm_hotplug)
+        dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 0);
+}
+
+DimmState *dimm_find_from_name(char *id)
+{
+    Error *err = NULL;
+    DeviceState *qdev;
+    const char *type;
+    qdev = qdev_find_recursive(sysbus_get_default(), id);
+    if (qdev) {
+        type = object_property_get_str(OBJECT(qdev), "type", &err);
+        if (!type) {
+            return NULL;
+        }
+        if (!strcmp(type, "dimm")) {
+            return DIMM(qdev);
+        }
+    }    
+    return NULL;
+}
+
+int dimm_do(Monitor *mon, const QDict *qdict, bool add)
+{
+    DimmState *slot = NULL;
+
+    char *id = (char*) qdict_get_try_str(qdict, "id");
+    if (!id) {
+        fprintf(stderr, "ERROR %s invalid id\n",__FUNCTION__);
+        return 1;
+    }
+
+    slot = dimm_find_from_name(id);
+
+    if (!slot) {
+        fprintf(stderr, "%s no slot %s found\n", __FUNCTION__, id);
+        return 1;
+    }
+
+    if (add) {
+        if (slot->populated) {
+            fprintf(stderr, "ERROR %s slot %s already populated\n",
+                    __FUNCTION__, id);
+            return 1;
+        }
+        dimm_activate(slot);
+    }
+    else {
+        if (!slot->populated) {
+            fprintf(stderr, "ERROR %s slot %s is not populated\n",
+                    __FUNCTION__, id);
+            return 1;
+        }
+        dimm_deactivate(slot);
+    }
+
+    return 0;
+}
+
+DimmState *dimm_find_from_idx(uint32_t idx)
+{
+    DimmState *slot;
+
+    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
+        if (slot->idx == idx) {
+            return slot;
+        }
+    }
+    return NULL;
+}
+
+/* used to calculate physical address offsets for all dimms */
+void dimm_calc_offsets(dimm_calcoffset_fn calcfn)
+{
+    DimmState *slot;
+    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
+        if (!slot->start)
+            slot->start = calcfn(slot->size);
+    }
+}
+
+/* used to populate and activate dimms at boot time */
+void dimm_scan_populated(void)
+{
+    DimmState *slot;
+    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
+        if (slot->populated && !slot->mr) {
+            dimm_activate(slot);
+        }
+    }
+}
+
+void dimm_notify(uint32_t idx, uint32_t event)
+{
+    DimmState *s;
+    s = dimm_find_from_idx(idx);
+    assert(s != NULL);
+
+    switch(event) {
+        case DIMM_REMOVE_SUCCESS:
+            dimm_depopulate(s);
+            break;
+        default:
+            break;
+    }
+}
+
+static int dimm_init(SysBusDevice *s)
+{
+    DimmState *slot;
+    slot = DIMM(s);
+    slot->mr = NULL;
+    slot->populated = false;
+    return 0;
+}
+
+static void dimm_class_init(ObjectClass *klass, void *data)
+{
+    SysBusDeviceClass *sc = SYS_BUS_DEVICE_CLASS(klass);
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->props = dimm_properties;
+    sc->init = dimm_init;
+    dimm_hotplug = NULL;
+    QTAILQ_INIT(&dimmlist);
+}
+
+static TypeInfo dimm_info = {
+    .name          = "dimm",
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(DimmState),
+    .class_init    = dimm_class_init,
+};
+
+static void dimm_register_types(void)
+{
+    type_register_static(&dimm_info);
+}
+
+type_init(dimm_register_types)
diff --git a/hw/dimm.h b/hw/dimm.h
new file mode 100644
index 0000000..643f319
--- /dev/null
+++ b/hw/dimm.h
@@ -0,0 +1,58 @@
+#ifndef QEMU_DIMM_H
+#define QEMU_DIMM_H
+
+#include "qemu-common.h"
+#include "memory.h"
+#include "sysbus.h"
+#include "qapi-types.h"
+#include "qemu-queue.h"
+#include "cpus.h"
+#define MAX_DIMMS 255
+#define DIMM_BITMAP_BYTES (MAX_DIMMS + 7) / 8
+#define DEFAULT_DIMMSIZE 1024*1024*1024
+
+typedef enum {
+    DIMM_REMOVE_SUCCESS = 0,
+    DIMM_REMOVE_FAIL = 1,
+    DIMM_ADD_SUCCESS = 2,
+    DIMM_ADD_FAIL = 3
+} dimm_hp_result_code;
+
+#define TYPE_DIMM "dimm"
+#define DIMM(obj) \
+    OBJECT_CHECK(DimmState, (obj), TYPE_DIMM)
+#define DIMM_CLASS(klass) \
+    OBJECT_CLASS_CHECK(DimmClass, (obj), TYPE_DIMM)
+#define DIMM_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(DimmClass, (obj), TYPE_DIMM)
+
+typedef struct DimmState {
+    SysBusDevice busdev;
+    uint32_t idx; /* index in memory hotplug register/bitmap */
+    ram_addr_t start; /* starting physical address */
+    ram_addr_t size;
+    uint32_t node; /* numa node proximity */
+    MemoryRegion *mr; /* MemoryRegion for this slot. !NULL only if populated */
+    bool populated; /* 1 means device has been hotplugged. Default is 0. */
+    QTAILQ_ENTRY (DimmState) nextdimm;
+} DimmState;
+
+typedef int (*dimm_hotplug_fn)(DeviceState *qdev, SysBusDevice *dev, int add);
+typedef target_phys_addr_t (*dimm_calcoffset_fn)(uint64_t size);
+
+DimmState *dimm_create(char *id, uint64_t size, uint64_t node, uint32_t
+        dimm_idx, bool populated);
+void dimm_populate(DimmState *s);
+void dimm_depopulate(DimmState *s);
+int dimm_do(Monitor *mon, const QDict *qdict, bool add);
+DimmState *dimm_find_from_idx(uint32_t idx);
+DimmState *dimm_find_from_name(char *id);
+void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev);
+void dimm_calc_offsets(dimm_calcoffset_fn calcfn);
+void dimm_activate(DimmState *slot);
+void dimm_deactivate(DimmState *slot);
+void dimm_scan_populated(void);
+void dimm_notify(uint32_t idx, uint32_t event);
+
+
+#endif
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 06/21] dimm: Implement memory device abstraction
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

Each hotplug-able memory slot is a SysBusDevice. A hot-add operation for a
particular dimm creates a new MemoryRegion of the given physical address
offset, size and node proximity, and attaches it to main system memory as a
sub_region. A hot-remove operation detaches and frees the MemoryRegion from
system memory.

This prototype still lacks proper qdev integration: a separate
hotplug side-channel is used and main system bus hotplug capability is
ignored.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hw/Makefile.objs |    2 +-
 hw/dimm.c        |  234 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/dimm.h        |   58 +++++++++++++
 3 files changed, 293 insertions(+), 1 deletions(-)
 create mode 100644 hw/dimm.c
 create mode 100644 hw/dimm.h

diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 3d77259..e2184bf 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -26,7 +26,7 @@ hw-obj-$(CONFIG_I8254) += i8254_common.o i8254.o
 hw-obj-$(CONFIG_PCSPK) += pcspk.o
 hw-obj-$(CONFIG_PCKBD) += pckbd.o
 hw-obj-$(CONFIG_FDC) += fdc.o
-hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o
+hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o dimm.o
 hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
 hw-obj-$(CONFIG_DMA) += dma.o
 hw-obj-$(CONFIG_I82374) += i82374.o
diff --git a/hw/dimm.c b/hw/dimm.c
new file mode 100644
index 0000000..00c4623
--- /dev/null
+++ b/hw/dimm.c
@@ -0,0 +1,234 @@
+/*
+ * Dimm device for Memory Hotplug
+ *
+ * Copyright ProfitBricks GmbH 2012
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "trace.h"
+#include "qdev.h"
+#include "dimm.h"
+#include <time.h>
+#include "../exec-memory.h"
+#include "qmp-commands.h"
+
+static DeviceState *dimm_hotplug_qdev;
+static dimm_hotplug_fn dimm_hotplug;
+static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;
+
+static Property dimm_properties[] = {
+    DEFINE_PROP_END_OF_LIST()
+};
+
+void dimm_populate(DimmState *s)
+{
+    DeviceState *dev= (DeviceState*)s;
+    MemoryRegion *new = NULL;
+
+    new = g_malloc(sizeof(MemoryRegion));
+    memory_region_init_ram(new, dev->id, s->size);
+    vmstate_register_ram_global(new);
+    memory_region_add_subregion(get_system_memory(), s->start, new);
+    s->mr = new;
+    s->populated = true;
+}
+
+
+void dimm_depopulate(DimmState *s)
+{
+    assert(s);
+    if (s->populated) {
+        vmstate_unregister_ram(s->mr, NULL);
+        memory_region_del_subregion(get_system_memory(), s->mr);
+        memory_region_destroy(s->mr);
+        s->populated = false;
+        s->mr = NULL;
+    }
+}
+
+DimmState *dimm_create(char *id, uint64_t size, uint64_t node, uint32_t
+        dimm_idx, bool populated)
+{
+    DeviceState *dev;
+    DimmState *mdev;
+
+    dev = sysbus_create_simple("dimm", -1, NULL);
+    dev->id = id;
+
+    mdev = DIMM(dev);
+    mdev->idx = dimm_idx;
+    mdev->start = 0;
+    mdev->size = size;
+    mdev->node = node;
+    mdev->populated = populated;
+    QTAILQ_INSERT_TAIL(&dimmlist, mdev, nextdimm);
+    return mdev;
+}
+
+void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev)
+{
+    dimm_hotplug_qdev = qdev;
+    dimm_hotplug = hotplug;
+    dimm_scan_populated();
+}
+
+void dimm_activate(DimmState *slot)
+{
+    dimm_populate(slot);
+    if (dimm_hotplug)
+        dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 1);
+}
+
+void dimm_deactivate(DimmState *slot)
+{
+    if (dimm_hotplug)
+        dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 0);
+}
+
+DimmState *dimm_find_from_name(char *id)
+{
+    Error *err = NULL;
+    DeviceState *qdev;
+    const char *type;
+    qdev = qdev_find_recursive(sysbus_get_default(), id);
+    if (qdev) {
+        type = object_property_get_str(OBJECT(qdev), "type", &err);
+        if (!type) {
+            return NULL;
+        }
+        if (!strcmp(type, "dimm")) {
+            return DIMM(qdev);
+        }
+    }    
+    return NULL;
+}
+
+int dimm_do(Monitor *mon, const QDict *qdict, bool add)
+{
+    DimmState *slot = NULL;
+
+    char *id = (char*) qdict_get_try_str(qdict, "id");
+    if (!id) {
+        fprintf(stderr, "ERROR %s invalid id\n",__FUNCTION__);
+        return 1;
+    }
+
+    slot = dimm_find_from_name(id);
+
+    if (!slot) {
+        fprintf(stderr, "%s no slot %s found\n", __FUNCTION__, id);
+        return 1;
+    }
+
+    if (add) {
+        if (slot->populated) {
+            fprintf(stderr, "ERROR %s slot %s already populated\n",
+                    __FUNCTION__, id);
+            return 1;
+        }
+        dimm_activate(slot);
+    }
+    else {
+        if (!slot->populated) {
+            fprintf(stderr, "ERROR %s slot %s is not populated\n",
+                    __FUNCTION__, id);
+            return 1;
+        }
+        dimm_deactivate(slot);
+    }
+
+    return 0;
+}
+
+DimmState *dimm_find_from_idx(uint32_t idx)
+{
+    DimmState *slot;
+
+    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
+        if (slot->idx == idx) {
+            return slot;
+        }
+    }
+    return NULL;
+}
+
+/* used to calculate physical address offsets for all dimms */
+void dimm_calc_offsets(dimm_calcoffset_fn calcfn)
+{
+    DimmState *slot;
+    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
+        if (!slot->start)
+            slot->start = calcfn(slot->size);
+    }
+}
+
+/* used to populate and activate dimms at boot time */
+void dimm_scan_populated(void)
+{
+    DimmState *slot;
+    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
+        if (slot->populated && !slot->mr) {
+            dimm_activate(slot);
+        }
+    }
+}
+
+void dimm_notify(uint32_t idx, uint32_t event)
+{
+    DimmState *s;
+    s = dimm_find_from_idx(idx);
+    assert(s != NULL);
+
+    switch(event) {
+        case DIMM_REMOVE_SUCCESS:
+            dimm_depopulate(s);
+            break;
+        default:
+            break;
+    }
+}
+
+static int dimm_init(SysBusDevice *s)
+{
+    DimmState *slot;
+    slot = DIMM(s);
+    slot->mr = NULL;
+    slot->populated = false;
+    return 0;
+}
+
+static void dimm_class_init(ObjectClass *klass, void *data)
+{
+    SysBusDeviceClass *sc = SYS_BUS_DEVICE_CLASS(klass);
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->props = dimm_properties;
+    sc->init = dimm_init;
+    dimm_hotplug = NULL;
+    QTAILQ_INIT(&dimmlist);
+}
+
+static TypeInfo dimm_info = {
+    .name          = "dimm",
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(DimmState),
+    .class_init    = dimm_class_init,
+};
+
+static void dimm_register_types(void)
+{
+    type_register_static(&dimm_info);
+}
+
+type_init(dimm_register_types)
diff --git a/hw/dimm.h b/hw/dimm.h
new file mode 100644
index 0000000..643f319
--- /dev/null
+++ b/hw/dimm.h
@@ -0,0 +1,58 @@
+#ifndef QEMU_DIMM_H
+#define QEMU_DIMM_H
+
+#include "qemu-common.h"
+#include "memory.h"
+#include "sysbus.h"
+#include "qapi-types.h"
+#include "qemu-queue.h"
+#include "cpus.h"
+#define MAX_DIMMS 255
+#define DIMM_BITMAP_BYTES (MAX_DIMMS + 7) / 8
+#define DEFAULT_DIMMSIZE 1024*1024*1024
+
+typedef enum {
+    DIMM_REMOVE_SUCCESS = 0,
+    DIMM_REMOVE_FAIL = 1,
+    DIMM_ADD_SUCCESS = 2,
+    DIMM_ADD_FAIL = 3
+} dimm_hp_result_code;
+
+#define TYPE_DIMM "dimm"
+#define DIMM(obj) \
+    OBJECT_CHECK(DimmState, (obj), TYPE_DIMM)
+#define DIMM_CLASS(klass) \
+    OBJECT_CLASS_CHECK(DimmClass, (obj), TYPE_DIMM)
+#define DIMM_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(DimmClass, (obj), TYPE_DIMM)
+
+typedef struct DimmState {
+    SysBusDevice busdev;
+    uint32_t idx; /* index in memory hotplug register/bitmap */
+    ram_addr_t start; /* starting physical address */
+    ram_addr_t size;
+    uint32_t node; /* numa node proximity */
+    MemoryRegion *mr; /* MemoryRegion for this slot. !NULL only if populated */
+    bool populated; /* 1 means device has been hotplugged. Default is 0. */
+    QTAILQ_ENTRY (DimmState) nextdimm;
+} DimmState;
+
+typedef int (*dimm_hotplug_fn)(DeviceState *qdev, SysBusDevice *dev, int add);
+typedef target_phys_addr_t (*dimm_calcoffset_fn)(uint64_t size);
+
+DimmState *dimm_create(char *id, uint64_t size, uint64_t node, uint32_t
+        dimm_idx, bool populated);
+void dimm_populate(DimmState *s);
+void dimm_depopulate(DimmState *s);
+int dimm_do(Monitor *mon, const QDict *qdict, bool add);
+DimmState *dimm_find_from_idx(uint32_t idx);
+DimmState *dimm_find_from_name(char *id);
+void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev);
+void dimm_calc_offsets(dimm_calcoffset_fn calcfn);
+void dimm_activate(DimmState *slot);
+void dimm_deactivate(DimmState *slot);
+void dimm_scan_populated(void);
+void dimm_notify(uint32_t idx, uint32_t event);
+
+
+#endif
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 07/21] acpi_piix4: Implement memory device hotplug registers
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

A 32-byte register is used to present up to 256 hotplug-able memory devices
to BIOS and OSPM. Hot-add and hot-remove functions trigger an ACPI hotplug
event through these. Only reads are allowed from these registers.

An ACPI hot-remove event but needs to wait for OSPM to eject the device.
We use a single-byte register to know when OSPM has called the _EJ function
for a particular dimm. A write to this byte will depopulate the respective dimm.
Only writes are allowed to this byte.

v1->v2:
mems_sts address moved from 0xaf20 to 0xaf80 (to accomodate more space for
cpu-hotplugging in the future).
_EJ array is reduced to a single byte.
Add documentation in docs/specs/acpi_hotplug.txt

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 docs/specs/acpi_hotplug.txt |   22 +++++++++++++
 hw/acpi_piix4.c             |   73 ++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 91 insertions(+), 4 deletions(-)
 create mode 100644 docs/specs/acpi_hotplug.txt

diff --git a/docs/specs/acpi_hotplug.txt b/docs/specs/acpi_hotplug.txt
new file mode 100644
index 0000000..cf86242
--- /dev/null
+++ b/docs/specs/acpi_hotplug.txt
@@ -0,0 +1,22 @@
+QEMU<->ACPI BIOS hotplug interface
+--------------------------------------
+This document describes the interface between QEMU and the ACPI BIOS for non-PCI
+space. For the PCI interface please look at docs/specs/acpi_pci_hotplug.txt
+
+QEMU<->ACPI BIOS memory hotplug interface
+--------------------------------------
+
+Memory Dimm status array (IO port 0xaf80-0xaf9f, 1-byte access):
+---------------------------------------------------------------
+Dimm hot-plug notification pending. One bit per slot.
+
+Read by ACPI BIOS GPE.3 handler to notify OS of memory hot-add or hot-remove
+events.  Read-only.
+
+Memory Dimm ejection success notification (IO port 0xafa0, 1-byte access):
+---------------------------------------------------------------
+Dimm hot-remove _EJ0 notification. Byte value indicates Dimm slot that was
+ejected.
+
+Written by ACPI memory device _EJ0 method to notify qemu of successfull
+hot-removal.  Write-only.
diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 0aace60..b988597 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -28,6 +28,8 @@
 #include "range.h"
 #include "ioport.h"
 #include "fw_cfg.h"
+#include "sysbus.h"
+#include "dimm.h"
 
 //#define DEBUG
 
@@ -45,9 +47,15 @@
 #define PCI_DOWN_BASE 0xae04
 #define PCI_EJ_BASE 0xae08
 #define PCI_RMV_BASE 0xae0c
+#define MEM_BASE 0xaf80
+#define MEM_EJ_BASE 0xafa0
 
+#define PIIX4_MEM_HOTPLUG_STATUS 8
 #define PIIX4_PCI_HOTPLUG_STATUS 2
 
+struct gpe_regs {
+    uint8_t mems_sts[DIMM_BITMAP_BYTES];
+};
 struct pci_status {
     uint32_t up; /* deprecated, maintained for migration compatibility */
     uint32_t down;
@@ -69,6 +77,7 @@ typedef struct PIIX4PMState {
     Notifier machine_ready;
 
     /* for pci hotplug */
+    struct gpe_regs gperegs;
     struct pci_status pci0_status;
     uint32_t pci0_hotplug_enable;
     uint32_t pci0_slot_device_present;
@@ -93,8 +102,8 @@ static void pm_update_sci(PIIX4PMState *s)
                    ACPI_BITMASK_POWER_BUTTON_ENABLE |
                    ACPI_BITMASK_GLOBAL_LOCK_ENABLE |
                    ACPI_BITMASK_TIMER_ENABLE)) != 0) ||
-        (((s->ar.gpe.sts[0] & s->ar.gpe.en[0])
-          & PIIX4_PCI_HOTPLUG_STATUS) != 0);
+        (((s->ar.gpe.sts[0] & s->ar.gpe.en[0]) &
+          (PIIX4_PCI_HOTPLUG_STATUS | PIIX4_MEM_HOTPLUG_STATUS)) != 0);
 
     qemu_set_irq(s->irq, sci_level);
     /* schedule a timer interruption if needed */
@@ -499,7 +508,16 @@ type_init(piix4_pm_register_types)
 static uint32_t gpe_readb(void *opaque, uint32_t addr)
 {
     PIIX4PMState *s = opaque;
-    uint32_t val = acpi_gpe_ioport_readb(&s->ar, addr);
+    uint32_t val = 0;
+    struct gpe_regs *g = &s->gperegs;
+
+    switch (addr) {
+        case MEM_BASE ... MEM_BASE+DIMM_BITMAP_BYTES:
+            val = g->mems_sts[addr - MEM_BASE];
+            break;
+        default:
+            val = acpi_gpe_ioport_readb(&s->ar, addr);
+    }
 
     PIIX4_DPRINTF("gpe read %x == %x\n", addr, val);
     return val;
@@ -509,7 +527,13 @@ static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
 {
     PIIX4PMState *s = opaque;
 
-    acpi_gpe_ioport_writeb(&s->ar, addr, val);
+    switch (addr) {
+        case MEM_EJ_BASE:
+            dimm_notify(val, DIMM_REMOVE_SUCCESS);
+            break;
+        default:
+            acpi_gpe_ioport_writeb(&s->ar, addr, val);
+    }
     pm_update_sci(s);
 
     PIIX4_DPRINTF("gpe write %x <== %d\n", addr, val);
@@ -560,9 +584,11 @@ static uint32_t pcirmv_read(void *opaque, uint32_t addr)
 
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
                                 PCIHotplugState state);
+static int piix4_dimm_hotplug(DeviceState *qdev, SysBusDevice *dev, int add);
 
 static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
 {
+    int i = 0;
 
     register_ioport_write(GPE_BASE, GPE_LEN, 1, gpe_writeb, s);
     register_ioport_read(GPE_BASE, GPE_LEN, 1,  gpe_readb, s);
@@ -576,7 +602,15 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
 
     register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
 
+    register_ioport_read(MEM_BASE, DIMM_BITMAP_BYTES, 1,  gpe_readb, s);
+    register_ioport_write(MEM_EJ_BASE, 1, 1,  gpe_writeb, s);
+
+    for(i = 0; i < DIMM_BITMAP_BYTES; i++) {
+        s->gperegs.mems_sts[i] = 0;
+    }
+
     pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
+    dimm_register_hotplug(piix4_dimm_hotplug, &s->dev.qdev);
 }
 
 static void enable_device(PIIX4PMState *s, int slot)
@@ -591,6 +625,37 @@ static void disable_device(PIIX4PMState *s, int slot)
     s->pci0_status.down |= (1U << slot);
 }
 
+static void enable_mem_device(PIIX4PMState *s, int memdevice)
+{
+    struct gpe_regs *g = &s->gperegs;
+    s->ar.gpe.sts[0] |= PIIX4_MEM_HOTPLUG_STATUS;
+    g->mems_sts[memdevice/8] |= (1 << (memdevice%8));
+}
+
+static void disable_mem_device(PIIX4PMState *s, int memdevice)
+{
+    struct gpe_regs *g = &s->gperegs;
+    s->ar.gpe.sts[0] |= PIIX4_MEM_HOTPLUG_STATUS;
+    g->mems_sts[memdevice/8] &= ~(1 << (memdevice%8));
+}
+
+static int piix4_dimm_hotplug(DeviceState *qdev, SysBusDevice *dev, int
+        add)
+{
+    PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, qdev);
+    PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, pci_dev);
+    DimmState *slot = DIMM(dev);
+
+    if (add) {
+        enable_mem_device(s, slot->idx);
+    }
+    else {
+        disable_mem_device(s, slot->idx);
+    }
+    pm_update_sci(s);
+    return 0;
+}
+
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
 				PCIHotplugState state)
 {
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 07/21] acpi_piix4: Implement memory device hotplug registers
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

A 32-byte register is used to present up to 256 hotplug-able memory devices
to BIOS and OSPM. Hot-add and hot-remove functions trigger an ACPI hotplug
event through these. Only reads are allowed from these registers.

An ACPI hot-remove event but needs to wait for OSPM to eject the device.
We use a single-byte register to know when OSPM has called the _EJ function
for a particular dimm. A write to this byte will depopulate the respective dimm.
Only writes are allowed to this byte.

v1->v2:
mems_sts address moved from 0xaf20 to 0xaf80 (to accomodate more space for
cpu-hotplugging in the future).
_EJ array is reduced to a single byte.
Add documentation in docs/specs/acpi_hotplug.txt

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 docs/specs/acpi_hotplug.txt |   22 +++++++++++++
 hw/acpi_piix4.c             |   73 ++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 91 insertions(+), 4 deletions(-)
 create mode 100644 docs/specs/acpi_hotplug.txt

diff --git a/docs/specs/acpi_hotplug.txt b/docs/specs/acpi_hotplug.txt
new file mode 100644
index 0000000..cf86242
--- /dev/null
+++ b/docs/specs/acpi_hotplug.txt
@@ -0,0 +1,22 @@
+QEMU<->ACPI BIOS hotplug interface
+--------------------------------------
+This document describes the interface between QEMU and the ACPI BIOS for non-PCI
+space. For the PCI interface please look at docs/specs/acpi_pci_hotplug.txt
+
+QEMU<->ACPI BIOS memory hotplug interface
+--------------------------------------
+
+Memory Dimm status array (IO port 0xaf80-0xaf9f, 1-byte access):
+---------------------------------------------------------------
+Dimm hot-plug notification pending. One bit per slot.
+
+Read by ACPI BIOS GPE.3 handler to notify OS of memory hot-add or hot-remove
+events.  Read-only.
+
+Memory Dimm ejection success notification (IO port 0xafa0, 1-byte access):
+---------------------------------------------------------------
+Dimm hot-remove _EJ0 notification. Byte value indicates Dimm slot that was
+ejected.
+
+Written by ACPI memory device _EJ0 method to notify qemu of successfull
+hot-removal.  Write-only.
diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 0aace60..b988597 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -28,6 +28,8 @@
 #include "range.h"
 #include "ioport.h"
 #include "fw_cfg.h"
+#include "sysbus.h"
+#include "dimm.h"
 
 //#define DEBUG
 
@@ -45,9 +47,15 @@
 #define PCI_DOWN_BASE 0xae04
 #define PCI_EJ_BASE 0xae08
 #define PCI_RMV_BASE 0xae0c
+#define MEM_BASE 0xaf80
+#define MEM_EJ_BASE 0xafa0
 
+#define PIIX4_MEM_HOTPLUG_STATUS 8
 #define PIIX4_PCI_HOTPLUG_STATUS 2
 
+struct gpe_regs {
+    uint8_t mems_sts[DIMM_BITMAP_BYTES];
+};
 struct pci_status {
     uint32_t up; /* deprecated, maintained for migration compatibility */
     uint32_t down;
@@ -69,6 +77,7 @@ typedef struct PIIX4PMState {
     Notifier machine_ready;
 
     /* for pci hotplug */
+    struct gpe_regs gperegs;
     struct pci_status pci0_status;
     uint32_t pci0_hotplug_enable;
     uint32_t pci0_slot_device_present;
@@ -93,8 +102,8 @@ static void pm_update_sci(PIIX4PMState *s)
                    ACPI_BITMASK_POWER_BUTTON_ENABLE |
                    ACPI_BITMASK_GLOBAL_LOCK_ENABLE |
                    ACPI_BITMASK_TIMER_ENABLE)) != 0) ||
-        (((s->ar.gpe.sts[0] & s->ar.gpe.en[0])
-          & PIIX4_PCI_HOTPLUG_STATUS) != 0);
+        (((s->ar.gpe.sts[0] & s->ar.gpe.en[0]) &
+          (PIIX4_PCI_HOTPLUG_STATUS | PIIX4_MEM_HOTPLUG_STATUS)) != 0);
 
     qemu_set_irq(s->irq, sci_level);
     /* schedule a timer interruption if needed */
@@ -499,7 +508,16 @@ type_init(piix4_pm_register_types)
 static uint32_t gpe_readb(void *opaque, uint32_t addr)
 {
     PIIX4PMState *s = opaque;
-    uint32_t val = acpi_gpe_ioport_readb(&s->ar, addr);
+    uint32_t val = 0;
+    struct gpe_regs *g = &s->gperegs;
+
+    switch (addr) {
+        case MEM_BASE ... MEM_BASE+DIMM_BITMAP_BYTES:
+            val = g->mems_sts[addr - MEM_BASE];
+            break;
+        default:
+            val = acpi_gpe_ioport_readb(&s->ar, addr);
+    }
 
     PIIX4_DPRINTF("gpe read %x == %x\n", addr, val);
     return val;
@@ -509,7 +527,13 @@ static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
 {
     PIIX4PMState *s = opaque;
 
-    acpi_gpe_ioport_writeb(&s->ar, addr, val);
+    switch (addr) {
+        case MEM_EJ_BASE:
+            dimm_notify(val, DIMM_REMOVE_SUCCESS);
+            break;
+        default:
+            acpi_gpe_ioport_writeb(&s->ar, addr, val);
+    }
     pm_update_sci(s);
 
     PIIX4_DPRINTF("gpe write %x <== %d\n", addr, val);
@@ -560,9 +584,11 @@ static uint32_t pcirmv_read(void *opaque, uint32_t addr)
 
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
                                 PCIHotplugState state);
+static int piix4_dimm_hotplug(DeviceState *qdev, SysBusDevice *dev, int add);
 
 static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
 {
+    int i = 0;
 
     register_ioport_write(GPE_BASE, GPE_LEN, 1, gpe_writeb, s);
     register_ioport_read(GPE_BASE, GPE_LEN, 1,  gpe_readb, s);
@@ -576,7 +602,15 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
 
     register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
 
+    register_ioport_read(MEM_BASE, DIMM_BITMAP_BYTES, 1,  gpe_readb, s);
+    register_ioport_write(MEM_EJ_BASE, 1, 1,  gpe_writeb, s);
+
+    for(i = 0; i < DIMM_BITMAP_BYTES; i++) {
+        s->gperegs.mems_sts[i] = 0;
+    }
+
     pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
+    dimm_register_hotplug(piix4_dimm_hotplug, &s->dev.qdev);
 }
 
 static void enable_device(PIIX4PMState *s, int slot)
@@ -591,6 +625,37 @@ static void disable_device(PIIX4PMState *s, int slot)
     s->pci0_status.down |= (1U << slot);
 }
 
+static void enable_mem_device(PIIX4PMState *s, int memdevice)
+{
+    struct gpe_regs *g = &s->gperegs;
+    s->ar.gpe.sts[0] |= PIIX4_MEM_HOTPLUG_STATUS;
+    g->mems_sts[memdevice/8] |= (1 << (memdevice%8));
+}
+
+static void disable_mem_device(PIIX4PMState *s, int memdevice)
+{
+    struct gpe_regs *g = &s->gperegs;
+    s->ar.gpe.sts[0] |= PIIX4_MEM_HOTPLUG_STATUS;
+    g->mems_sts[memdevice/8] &= ~(1 << (memdevice%8));
+}
+
+static int piix4_dimm_hotplug(DeviceState *qdev, SysBusDevice *dev, int
+        add)
+{
+    PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, qdev);
+    PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, pci_dev);
+    DimmState *slot = DIMM(dev);
+
+    if (add) {
+        enable_mem_device(s, slot->idx);
+    }
+    else {
+        disable_mem_device(s, slot->idx);
+    }
+    pm_update_sci(s);
+    return 0;
+}
+
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
 				PCIHotplugState state)
 {
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 08/21] pc: calculate dimm physical addresses and adjust memory map
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

Dimm physical address offsets are calculated automatically and memory map is
adjusted accordingly. If a DIMM can fit before the PCI_HOLE_START (currently
0xe0000000), it will be added normally, otherwise its physical address will be
above 4GB.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hw/pc.c      |   41 +++++++++++++++++++++++++++++++++++++++++
 hw/pc.h      |    6 ++++++
 hw/pc_piix.c |   18 ++++++++++++------
 vl.c         |    1 +
 4 files changed, 60 insertions(+), 6 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index c7e9ab3..ef9901a 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -48,6 +48,7 @@
 #include "memory.h"
 #include "exec-memory.h"
 #include "arch_init.h"
+#include "dimm.h"
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
@@ -89,6 +90,9 @@ struct e820_table {
 static struct e820_table e820_table;
 struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 
+ram_addr_t below_4g_hp_mem_size = 0;
+ram_addr_t above_4g_hp_mem_size = 0;
+extern target_phys_addr_t ram_hp_offset;
 void gsi_handler(void *opaque, int n, int level)
 {
     GSIState *s = opaque;
@@ -1182,3 +1186,40 @@ void pc_pci_device_init(PCIBus *pci_bus)
         pci_create_simple(pci_bus, -1, "lsi53c895a");
     }
 }
+
+
+/* Function to configure memory offsets of hotpluggable dimms */
+
+target_phys_addr_t pc_set_hp_memory_offset(uint64_t size)
+{
+    target_phys_addr_t ret;
+
+    /* on first call, initialize ram_hp_offset */
+    if (!ram_hp_offset) {
+        if (ram_size >= PCI_HOLE_START ) {
+            ram_hp_offset = 0x100000000LL + (ram_size - PCI_HOLE_START);
+        } else {
+            ram_hp_offset = ram_size;
+        }
+    }
+
+    if (ram_hp_offset >= 0x100000000LL) {
+        ret = ram_hp_offset;
+        above_4g_hp_mem_size += size;
+        ram_hp_offset += size;
+    }
+    /* if dimm fits before pci hole, append it normally */
+    else if (ram_hp_offset + size <= PCI_HOLE_START) {
+        ret = ram_hp_offset;
+        below_4g_hp_mem_size += size;
+        ram_hp_offset += size;
+    }
+    /* otherwise place it above 4GB */
+    else {
+        ret = 0x100000000LL;
+        above_4g_hp_mem_size += size;
+        ram_hp_offset = 0x100000000LL + size;
+    }
+
+    return ret;
+}
diff --git a/hw/pc.h b/hw/pc.h
index 31ccb6f..15bdd7d 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -10,6 +10,7 @@
 #include "memory.h"
 #include "ioapic.h"
 
+#define PCI_HOLE_START 0xe0000000
 /* PC-style peripherals (also used by other machines).  */
 
 /* serial.c */
@@ -218,6 +219,11 @@ static inline bool isa_ne2000_init(ISABus *bus, int base, int irq, NICInfo *nd)
 /* pc_sysfw.c */
 void pc_system_firmware_init(MemoryRegion *rom_memory);
 
+/* memory hotplug */
+target_phys_addr_t pc_set_hp_memory_offset(uint64_t size);
+extern ram_addr_t below_4g_hp_mem_size;
+extern ram_addr_t above_4g_hp_mem_size;
+
 /* e820 types */
 #define E820_RAM        1
 #define E820_RESERVED   2
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 0c0096f..f3f1651 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -43,6 +43,7 @@
 #include "xen.h"
 #include "memory.h"
 #include "exec-memory.h"
+#include "dimm.h"
 #ifdef CONFIG_XEN
 #  include <xen/hvm/hvm_info_table.h>
 #endif
@@ -155,9 +156,9 @@ static void pc_init1(MemoryRegion *system_memory,
         kvmclock_create();
     }
 
-    if (ram_size >= 0xe0000000 ) {
-        above_4g_mem_size = ram_size - 0xe0000000;
-        below_4g_mem_size = 0xe0000000;
+    if (ram_size >= PCI_HOLE_START ) {
+        above_4g_mem_size = ram_size - PCI_HOLE_START;
+        below_4g_mem_size = PCI_HOLE_START;
     } else {
         above_4g_mem_size = 0;
         below_4g_mem_size = ram_size;
@@ -172,6 +173,9 @@ static void pc_init1(MemoryRegion *system_memory,
         rom_memory = system_memory;
     }
 
+    /* adjust memory map for hotplug dimms */
+    dimm_calc_offsets(pc_set_hp_memory_offset);
+
     /* allocate ram and load rom/bios */
     if (!xen_enabled()) {
         fw_cfg = pc_memory_init(system_memory,
@@ -192,9 +196,11 @@ static void pc_init1(MemoryRegion *system_memory,
     if (pci_enabled) {
         pci_bus = i440fx_init(&i440fx_state, &piix3_devfn, &isa_bus, gsi,
                               system_memory, system_io, ram_size,
-                              below_4g_mem_size,
-                              0x100000000ULL - below_4g_mem_size,
-                              0x100000000ULL + above_4g_mem_size,
+                              below_4g_mem_size + below_4g_hp_mem_size,
+                              0x100000000ULL - below_4g_mem_size
+                                - below_4g_hp_mem_size,
+                              0x100000000ULL + above_4g_mem_size
+                                + above_4g_hp_mem_size,
                               (sizeof(target_phys_addr_t) == 4
                                ? 0
                                : ((uint64_t)1 << 62)),
diff --git a/vl.c b/vl.c
index 1329c30..0ff8818 100644
--- a/vl.c
+++ b/vl.c
@@ -176,6 +176,7 @@ DisplayType display_type = DT_DEFAULT;
 int display_remote = 0;
 const char* keyboard_layout = NULL;
 ram_addr_t ram_size;
+ram_addr_t ram_hp_offset;
 const char *mem_path = NULL;
 #ifdef MAP_POPULATE
 int mem_prealloc = 0; /* force preallocation of physical target memory */
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 08/21] pc: calculate dimm physical addresses and adjust memory map
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

Dimm physical address offsets are calculated automatically and memory map is
adjusted accordingly. If a DIMM can fit before the PCI_HOLE_START (currently
0xe0000000), it will be added normally, otherwise its physical address will be
above 4GB.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hw/pc.c      |   41 +++++++++++++++++++++++++++++++++++++++++
 hw/pc.h      |    6 ++++++
 hw/pc_piix.c |   18 ++++++++++++------
 vl.c         |    1 +
 4 files changed, 60 insertions(+), 6 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index c7e9ab3..ef9901a 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -48,6 +48,7 @@
 #include "memory.h"
 #include "exec-memory.h"
 #include "arch_init.h"
+#include "dimm.h"
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
@@ -89,6 +90,9 @@ struct e820_table {
 static struct e820_table e820_table;
 struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 
+ram_addr_t below_4g_hp_mem_size = 0;
+ram_addr_t above_4g_hp_mem_size = 0;
+extern target_phys_addr_t ram_hp_offset;
 void gsi_handler(void *opaque, int n, int level)
 {
     GSIState *s = opaque;
@@ -1182,3 +1186,40 @@ void pc_pci_device_init(PCIBus *pci_bus)
         pci_create_simple(pci_bus, -1, "lsi53c895a");
     }
 }
+
+
+/* Function to configure memory offsets of hotpluggable dimms */
+
+target_phys_addr_t pc_set_hp_memory_offset(uint64_t size)
+{
+    target_phys_addr_t ret;
+
+    /* on first call, initialize ram_hp_offset */
+    if (!ram_hp_offset) {
+        if (ram_size >= PCI_HOLE_START ) {
+            ram_hp_offset = 0x100000000LL + (ram_size - PCI_HOLE_START);
+        } else {
+            ram_hp_offset = ram_size;
+        }
+    }
+
+    if (ram_hp_offset >= 0x100000000LL) {
+        ret = ram_hp_offset;
+        above_4g_hp_mem_size += size;
+        ram_hp_offset += size;
+    }
+    /* if dimm fits before pci hole, append it normally */
+    else if (ram_hp_offset + size <= PCI_HOLE_START) {
+        ret = ram_hp_offset;
+        below_4g_hp_mem_size += size;
+        ram_hp_offset += size;
+    }
+    /* otherwise place it above 4GB */
+    else {
+        ret = 0x100000000LL;
+        above_4g_hp_mem_size += size;
+        ram_hp_offset = 0x100000000LL + size;
+    }
+
+    return ret;
+}
diff --git a/hw/pc.h b/hw/pc.h
index 31ccb6f..15bdd7d 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -10,6 +10,7 @@
 #include "memory.h"
 #include "ioapic.h"
 
+#define PCI_HOLE_START 0xe0000000
 /* PC-style peripherals (also used by other machines).  */
 
 /* serial.c */
@@ -218,6 +219,11 @@ static inline bool isa_ne2000_init(ISABus *bus, int base, int irq, NICInfo *nd)
 /* pc_sysfw.c */
 void pc_system_firmware_init(MemoryRegion *rom_memory);
 
+/* memory hotplug */
+target_phys_addr_t pc_set_hp_memory_offset(uint64_t size);
+extern ram_addr_t below_4g_hp_mem_size;
+extern ram_addr_t above_4g_hp_mem_size;
+
 /* e820 types */
 #define E820_RAM        1
 #define E820_RESERVED   2
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 0c0096f..f3f1651 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -43,6 +43,7 @@
 #include "xen.h"
 #include "memory.h"
 #include "exec-memory.h"
+#include "dimm.h"
 #ifdef CONFIG_XEN
 #  include <xen/hvm/hvm_info_table.h>
 #endif
@@ -155,9 +156,9 @@ static void pc_init1(MemoryRegion *system_memory,
         kvmclock_create();
     }
 
-    if (ram_size >= 0xe0000000 ) {
-        above_4g_mem_size = ram_size - 0xe0000000;
-        below_4g_mem_size = 0xe0000000;
+    if (ram_size >= PCI_HOLE_START ) {
+        above_4g_mem_size = ram_size - PCI_HOLE_START;
+        below_4g_mem_size = PCI_HOLE_START;
     } else {
         above_4g_mem_size = 0;
         below_4g_mem_size = ram_size;
@@ -172,6 +173,9 @@ static void pc_init1(MemoryRegion *system_memory,
         rom_memory = system_memory;
     }
 
+    /* adjust memory map for hotplug dimms */
+    dimm_calc_offsets(pc_set_hp_memory_offset);
+
     /* allocate ram and load rom/bios */
     if (!xen_enabled()) {
         fw_cfg = pc_memory_init(system_memory,
@@ -192,9 +196,11 @@ static void pc_init1(MemoryRegion *system_memory,
     if (pci_enabled) {
         pci_bus = i440fx_init(&i440fx_state, &piix3_devfn, &isa_bus, gsi,
                               system_memory, system_io, ram_size,
-                              below_4g_mem_size,
-                              0x100000000ULL - below_4g_mem_size,
-                              0x100000000ULL + above_4g_mem_size,
+                              below_4g_mem_size + below_4g_hp_mem_size,
+                              0x100000000ULL - below_4g_mem_size
+                                - below_4g_hp_mem_size,
+                              0x100000000ULL + above_4g_mem_size
+                                + above_4g_hp_mem_size,
                               (sizeof(target_phys_addr_t) == 4
                                ? 0
                                : ((uint64_t)1 << 62)),
diff --git a/vl.c b/vl.c
index 1329c30..0ff8818 100644
--- a/vl.c
+++ b/vl.c
@@ -176,6 +176,7 @@ DisplayType display_type = DT_DEFAULT;
 int display_remote = 0;
 const char* keyboard_layout = NULL;
 ram_addr_t ram_size;
+ram_addr_t ram_hp_offset;
 const char *mem_path = NULL;
 #ifdef MAP_POPULATE
 int mem_prealloc = 0; /* force preallocation of physical target memory */
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 09/21] pc: Add dimm paravirt SRAT info
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

The numa_fw_cfg paravirt interface is extended to include SRAT information for
all hotplug-able dimms. There are 3 words for each hotplug-able memory slot,
denoting start address, size and node proximity. The new info is appended after
existing numa info, so that the fw_cfg layout does not break.  This information
is used by Seabios to build hotplug memory device objects at runtime.
nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info
to SeaBIOS.

v1->v2:
Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order not
to break existing layout
Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 docs/specs/fwcfg.txt |   28 ++++++++++++++++++++++++++
 hw/pc.c              |   53 ++++++++++++++++++++++++++++++++++++++++++++++++-
 vl.c                 |    2 +-
 3 files changed, 80 insertions(+), 3 deletions(-)
 create mode 100644 docs/specs/fwcfg.txt

diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt
new file mode 100644
index 0000000..e6fcd8f
--- /dev/null
+++ b/docs/specs/fwcfg.txt
@@ -0,0 +1,28 @@
+QEMU<->BIOS Paravirt Documentation
+--------------------------------------
+
+This document describes paravirt data structures passed from QEMU to BIOS.
+
+fw_cfg SRAT paravirt info
+--------------------
+The SRAT info passed from QEMU to BIOS has the following layout:
+
+-----------------------------------------------------------------------------------------------
+#nodes | cpu0_pxm | cpu1_pxm | ... | cpulast_pxm | node0_mem | node1_mem | ... | nodelast_mem
+
+-----------------------------------------------------------------------------------------------
+#dimms | dimm0_start | dimm0_sz | dimm0_pxm | ... | dimmlast_start | dimmlast_sz | dimmlast_pxm
+
+Entry 0 contains the number of numa nodes (nb_numa_nodes).
+
+Entries 1..max_cpus: The next max_cpus entries describe node proximity for each
+one of the vCPUs in the system.
+
+Entries max_cpus+1..max_cpus+nb_numa_nodes+1:  The next nb_numa_nodes entries
+describe the memory size for each one of the NUMA nodes in the system.
+
+Entry max_cpus+nb_numa_nodes+1 contains the number of memory dimms (nb_hp_dimms)
+
+The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet contains
+the physical address offset, size (in bytes), and node proximity for the
+respective dimm.
diff --git a/hw/pc.c b/hw/pc.c
index ef9901a..cf651d0 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -598,12 +598,15 @@ int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
     return index;
 }
 
+static void setup_hp_dimms(uint64_t *fw_cfg_slots);
+
 static void *bochs_bios_init(void)
 {
     void *fw_cfg;
     uint8_t *smbios_table;
     size_t smbios_len;
     uint64_t *numa_fw_cfg;
+    uint64_t *hp_dimms_fw_cfg;
     int i, j;
 
     register_ioport_write(0x400, 1, 2, bochs_bios_write, NULL);
@@ -638,8 +641,10 @@ static void *bochs_bios_init(void)
     /* allocate memory for the NUMA channel: one (64bit) word for the number
      * of nodes, one word for each VCPU->node and one word for each node to
      * hold the amount of memory.
+     * Finally one word for the number of hotplug memory slots and three words
+     * for each hotplug memory slot (start address, size and node proximity).
      */
-    numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8);
+    numa_fw_cfg = g_malloc0((2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8);
     numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
     for (i = 0; i < max_cpus; i++) {
         for (j = 0; j < nb_numa_nodes; j++) {
@@ -652,8 +657,15 @@ static void *bochs_bios_init(void)
     for (i = 0; i < nb_numa_nodes; i++) {
         numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]);
     }
+
+    numa_fw_cfg[1 + max_cpus + nb_numa_nodes] = cpu_to_le64(nb_hp_dimms);
+
+    hp_dimms_fw_cfg = numa_fw_cfg + 2 + max_cpus + nb_numa_nodes;
+    if (nb_hp_dimms)
+        setup_hp_dimms(hp_dimms_fw_cfg);
+
     fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg,
-                     (1 + max_cpus + nb_numa_nodes) * 8);
+                     (2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8);
 
     return fw_cfg;
 }
@@ -1223,3 +1235,40 @@ target_phys_addr_t pc_set_hp_memory_offset(uint64_t size)
 
     return ret;
 }
+
+static void setup_hp_dimms(uint64_t *fw_cfg_slots)
+{
+    int i = 0;
+    Error *err = NULL;
+    DeviceState *dev;
+    DimmState *slot;
+    const char *type;
+    BusChild *kid;
+    BusState *bus = sysbus_get_default();
+
+    QTAILQ_FOREACH(kid, &bus->children, sibling) {
+        dev = kid->child;
+        type = object_property_get_str(OBJECT(dev), "type", &err);
+        if (err) {
+            error_free(err);
+            fprintf(stderr, "error getting device type\n");
+            exit(1);
+        }
+
+        if (!strcmp(type, "dimm")) {
+            if (!dev->id) {
+                fprintf(stderr, "error getting dimm device id\n");
+                exit(1);
+            }
+            slot = DIMM(dev);
+            /* determine starting physical address for this memory slot */
+            assert(slot->start);
+            fw_cfg_slots[3 * slot->idx] = cpu_to_le64(slot->start);
+            fw_cfg_slots[3 * slot->idx + 1] = cpu_to_le64(slot->size);
+            fw_cfg_slots[3 * slot->idx + 2] = cpu_to_le64(slot->node);
+            i++;
+        }
+    }
+    assert(i == nb_hp_dimms);
+}
+
diff --git a/vl.c b/vl.c
index 0ff8818..37c9798 100644
--- a/vl.c
+++ b/vl.c
@@ -2335,7 +2335,7 @@ int main(int argc, char **argv, char **envp)
         node_cpumask[i] = 0;
     }
 
-    nb_numa_nodes = 0;
+    nb_numa_nodes = 1;
     nb_nics = 0;
 
     autostart= 1;
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 09/21] pc: Add dimm paravirt SRAT info
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

The numa_fw_cfg paravirt interface is extended to include SRAT information for
all hotplug-able dimms. There are 3 words for each hotplug-able memory slot,
denoting start address, size and node proximity. The new info is appended after
existing numa info, so that the fw_cfg layout does not break.  This information
is used by Seabios to build hotplug memory device objects at runtime.
nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info
to SeaBIOS.

v1->v2:
Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order not
to break existing layout
Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 docs/specs/fwcfg.txt |   28 ++++++++++++++++++++++++++
 hw/pc.c              |   53 ++++++++++++++++++++++++++++++++++++++++++++++++-
 vl.c                 |    2 +-
 3 files changed, 80 insertions(+), 3 deletions(-)
 create mode 100644 docs/specs/fwcfg.txt

diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt
new file mode 100644
index 0000000..e6fcd8f
--- /dev/null
+++ b/docs/specs/fwcfg.txt
@@ -0,0 +1,28 @@
+QEMU<->BIOS Paravirt Documentation
+--------------------------------------
+
+This document describes paravirt data structures passed from QEMU to BIOS.
+
+fw_cfg SRAT paravirt info
+--------------------
+The SRAT info passed from QEMU to BIOS has the following layout:
+
+-----------------------------------------------------------------------------------------------
+#nodes | cpu0_pxm | cpu1_pxm | ... | cpulast_pxm | node0_mem | node1_mem | ... | nodelast_mem
+
+-----------------------------------------------------------------------------------------------
+#dimms | dimm0_start | dimm0_sz | dimm0_pxm | ... | dimmlast_start | dimmlast_sz | dimmlast_pxm
+
+Entry 0 contains the number of numa nodes (nb_numa_nodes).
+
+Entries 1..max_cpus: The next max_cpus entries describe node proximity for each
+one of the vCPUs in the system.
+
+Entries max_cpus+1..max_cpus+nb_numa_nodes+1:  The next nb_numa_nodes entries
+describe the memory size for each one of the NUMA nodes in the system.
+
+Entry max_cpus+nb_numa_nodes+1 contains the number of memory dimms (nb_hp_dimms)
+
+The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet contains
+the physical address offset, size (in bytes), and node proximity for the
+respective dimm.
diff --git a/hw/pc.c b/hw/pc.c
index ef9901a..cf651d0 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -598,12 +598,15 @@ int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
     return index;
 }
 
+static void setup_hp_dimms(uint64_t *fw_cfg_slots);
+
 static void *bochs_bios_init(void)
 {
     void *fw_cfg;
     uint8_t *smbios_table;
     size_t smbios_len;
     uint64_t *numa_fw_cfg;
+    uint64_t *hp_dimms_fw_cfg;
     int i, j;
 
     register_ioport_write(0x400, 1, 2, bochs_bios_write, NULL);
@@ -638,8 +641,10 @@ static void *bochs_bios_init(void)
     /* allocate memory for the NUMA channel: one (64bit) word for the number
      * of nodes, one word for each VCPU->node and one word for each node to
      * hold the amount of memory.
+     * Finally one word for the number of hotplug memory slots and three words
+     * for each hotplug memory slot (start address, size and node proximity).
      */
-    numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8);
+    numa_fw_cfg = g_malloc0((2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8);
     numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
     for (i = 0; i < max_cpus; i++) {
         for (j = 0; j < nb_numa_nodes; j++) {
@@ -652,8 +657,15 @@ static void *bochs_bios_init(void)
     for (i = 0; i < nb_numa_nodes; i++) {
         numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]);
     }
+
+    numa_fw_cfg[1 + max_cpus + nb_numa_nodes] = cpu_to_le64(nb_hp_dimms);
+
+    hp_dimms_fw_cfg = numa_fw_cfg + 2 + max_cpus + nb_numa_nodes;
+    if (nb_hp_dimms)
+        setup_hp_dimms(hp_dimms_fw_cfg);
+
     fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg,
-                     (1 + max_cpus + nb_numa_nodes) * 8);
+                     (2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8);
 
     return fw_cfg;
 }
@@ -1223,3 +1235,40 @@ target_phys_addr_t pc_set_hp_memory_offset(uint64_t size)
 
     return ret;
 }
+
+static void setup_hp_dimms(uint64_t *fw_cfg_slots)
+{
+    int i = 0;
+    Error *err = NULL;
+    DeviceState *dev;
+    DimmState *slot;
+    const char *type;
+    BusChild *kid;
+    BusState *bus = sysbus_get_default();
+
+    QTAILQ_FOREACH(kid, &bus->children, sibling) {
+        dev = kid->child;
+        type = object_property_get_str(OBJECT(dev), "type", &err);
+        if (err) {
+            error_free(err);
+            fprintf(stderr, "error getting device type\n");
+            exit(1);
+        }
+
+        if (!strcmp(type, "dimm")) {
+            if (!dev->id) {
+                fprintf(stderr, "error getting dimm device id\n");
+                exit(1);
+            }
+            slot = DIMM(dev);
+            /* determine starting physical address for this memory slot */
+            assert(slot->start);
+            fw_cfg_slots[3 * slot->idx] = cpu_to_le64(slot->start);
+            fw_cfg_slots[3 * slot->idx + 1] = cpu_to_le64(slot->size);
+            fw_cfg_slots[3 * slot->idx + 2] = cpu_to_le64(slot->node);
+            i++;
+        }
+    }
+    assert(i == nb_hp_dimms);
+}
+
diff --git a/vl.c b/vl.c
index 0ff8818..37c9798 100644
--- a/vl.c
+++ b/vl.c
@@ -2335,7 +2335,7 @@ int main(int argc, char **argv, char **envp)
         node_cpumask[i] = 0;
     }
 
-    nb_numa_nodes = 0;
+    nb_numa_nodes = 1;
     nb_nics = 0;
 
     autostart= 1;
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 10/21] Implement "-dimm" command line option
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

Syntax: "-dimm id=name,size=sz,node=pxm,populated=on|off"

The starting physical address for all dimms is calculated automatically from top
of memory, skipping the pci hole at [PCI_HOLE_START, 4G). 
"populated=on" means the dimm is populated at machine startup. Default is off.
"node" is defining numa proximity for this dimm. Default is node zero.

Example:
"-dimm id=dimm0,size=512M,node=0,populated=off"
will define a 512M memory slot belonging to numa node 0.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 qemu-config.c   |   25 +++++++++++++++++++++++++
 qemu-options.hx |    5 +++++
 sysemu.h        |    1 +
 vl.c            |   35 +++++++++++++++++++++++++++++++++++
 4 files changed, 66 insertions(+), 0 deletions(-)

diff --git a/qemu-config.c b/qemu-config.c
index 5c3296b..4abc31b 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -626,6 +626,30 @@ QemuOptsList qemu_boot_opts = {
     },
 };
 
+static QemuOptsList qemu_dimm_opts = {
+    .name = "dimm",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_dimm_opts.head),
+    .desc = {
+        {
+            .name = "id",
+            .type = QEMU_OPT_STRING,
+            .help = "id of this dimm device",
+        },{
+            .name = "size",
+            .type = QEMU_OPT_SIZE,
+            .help = "memory size for this dimm",
+        },{
+            .name = "populated",
+            .type = QEMU_OPT_BOOL,
+            .help = "populated for this dimm",
+        },{
+            .name = "node",
+            .type = QEMU_OPT_NUMBER,
+            .help = "NUMA node number (i.e. proximity) for this dimm",
+        },
+        { /* end of list */ }
+    },
+};
 static QemuOptsList *vm_config_groups[32] = {
     &qemu_drive_opts,
     &qemu_chardev_opts,
@@ -641,6 +665,7 @@ static QemuOptsList *vm_config_groups[32] = {
     &qemu_machine_opts,
     &qemu_boot_opts,
     &qemu_iscsi_opts,
+    &qemu_dimm_opts,
     NULL,
 };
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 8b66264..61909f7 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2747,3 +2747,8 @@ HXCOMM This is the last statement. Insert new options before this line!
 STEXI
 @end table
 ETEXI
+
+DEF("dimm", HAS_ARG, QEMU_OPTION_dimm,
+        "-dimm id=dimmid,size=sz,node=nd,populated=on|off\n"
+        "specify memory dimm device with name dimmid, size sz on node nd",
+        QEMU_ARCH_ALL)
diff --git a/sysemu.h b/sysemu.h
index bc2c788..3e21a22 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -136,6 +136,7 @@ extern QEMUClock *rtc_clock;
 extern int nb_numa_nodes;
 extern uint64_t node_mem[MAX_NODES];
 extern uint64_t node_cpumask[MAX_NODES];
+extern int nb_hp_dimms;
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/vl.c b/vl.c
index 0ff8818..efe915e 100644
--- a/vl.c
+++ b/vl.c
@@ -120,6 +120,7 @@ int main(int argc, char **argv)
 #include "hw/xen.h"
 #include "hw/qdev.h"
 #include "hw/loader.h"
+#include "hw/dimm.h"
 #include "bt-host.h"
 #include "net.h"
 #include "net/slirp.h"
@@ -242,6 +243,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order
 int nb_numa_nodes;
 uint64_t node_mem[MAX_NODES];
 uint64_t node_cpumask[MAX_NODES];
+int nb_hp_dimms;
 
 uint8_t qemu_uuid[16];
 
@@ -518,6 +520,23 @@ static void configure_rtc_date_offset(const char *startdate, int legacy)
         rtc_date_offset = time(NULL) - rtc_start_date;
     }
 }
+static void configure_dimm(QemuOpts *opts)
+{
+    const char *id;
+    uint64_t size, node;
+    bool populated;
+    if (nb_hp_dimms == MAX_DIMMS) {
+        fprintf(stderr, "qemu: maximum number of DIMMs (%d) exceeded\n",
+                MAX_DIMMS);
+        exit(1);
+    }
+    id = qemu_opts_id(opts);
+    size = qemu_opt_get_size(opts, "size", DEFAULT_DIMMSIZE);
+    populated = qemu_opt_get_bool(opts, "populated", 0);
+    node = qemu_opt_get_number(opts, "node", 0);
+    dimm_create((char*)id, size, node, nb_hp_dimms, populated);
+    nb_hp_dimms++;
+}
 
 static void configure_rtc(QemuOpts *opts)
 {
@@ -2273,6 +2292,8 @@ int main(int argc, char **argv, char **envp)
     DisplayChangeListener *dcl;
     int cyls, heads, secs, translation;
     QemuOpts *hda_opts = NULL, *opts, *machine_opts;
+    QemuOpts *dimm_opts[MAX_DIMMS];
+    int nb_dimm_opts = 0;
     QemuOptsList *olist;
     int optind;
     const char *optarg;
@@ -3200,6 +3221,18 @@ int main(int argc, char **argv, char **envp)
             case QEMU_OPTION_qtest_log:
                 qtest_log = optarg;
                 break;
+            case QEMU_OPTION_dimm:
+                if (nb_dimm_opts == MAX_DIMMS) {
+                    fprintf(stderr, "qemu: maximum number of DIMMs (%d) exceeded\n",
+                        MAX_DIMMS);
+                }
+                dimm_opts[nb_dimm_opts] =
+                    qemu_opts_parse(qemu_find_opts("dimm"), optarg, 0);
+                if (!dimm_opts[nb_dimm_opts]) {
+                    exit(1);
+                }
+                nb_dimm_opts++;
+                break;
             default:
                 os_parse_cmd_args(popt->index, optarg);
             }
@@ -3517,6 +3550,8 @@ int main(int argc, char **argv, char **envp)
     }
     qemu_add_globals();
 
+    for (i = 0; i < nb_dimm_opts; i++)
+        configure_dimm(dimm_opts[i]);
     qdev_machine_init();
 
     machine->init(ram_size, boot_devices,
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 10/21] Implement "-dimm" command line option
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

Syntax: "-dimm id=name,size=sz,node=pxm,populated=on|off"

The starting physical address for all dimms is calculated automatically from top
of memory, skipping the pci hole at [PCI_HOLE_START, 4G). 
"populated=on" means the dimm is populated at machine startup. Default is off.
"node" is defining numa proximity for this dimm. Default is node zero.

Example:
"-dimm id=dimm0,size=512M,node=0,populated=off"
will define a 512M memory slot belonging to numa node 0.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 qemu-config.c   |   25 +++++++++++++++++++++++++
 qemu-options.hx |    5 +++++
 sysemu.h        |    1 +
 vl.c            |   35 +++++++++++++++++++++++++++++++++++
 4 files changed, 66 insertions(+), 0 deletions(-)

diff --git a/qemu-config.c b/qemu-config.c
index 5c3296b..4abc31b 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -626,6 +626,30 @@ QemuOptsList qemu_boot_opts = {
     },
 };
 
+static QemuOptsList qemu_dimm_opts = {
+    .name = "dimm",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_dimm_opts.head),
+    .desc = {
+        {
+            .name = "id",
+            .type = QEMU_OPT_STRING,
+            .help = "id of this dimm device",
+        },{
+            .name = "size",
+            .type = QEMU_OPT_SIZE,
+            .help = "memory size for this dimm",
+        },{
+            .name = "populated",
+            .type = QEMU_OPT_BOOL,
+            .help = "populated for this dimm",
+        },{
+            .name = "node",
+            .type = QEMU_OPT_NUMBER,
+            .help = "NUMA node number (i.e. proximity) for this dimm",
+        },
+        { /* end of list */ }
+    },
+};
 static QemuOptsList *vm_config_groups[32] = {
     &qemu_drive_opts,
     &qemu_chardev_opts,
@@ -641,6 +665,7 @@ static QemuOptsList *vm_config_groups[32] = {
     &qemu_machine_opts,
     &qemu_boot_opts,
     &qemu_iscsi_opts,
+    &qemu_dimm_opts,
     NULL,
 };
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 8b66264..61909f7 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2747,3 +2747,8 @@ HXCOMM This is the last statement. Insert new options before this line!
 STEXI
 @end table
 ETEXI
+
+DEF("dimm", HAS_ARG, QEMU_OPTION_dimm,
+        "-dimm id=dimmid,size=sz,node=nd,populated=on|off\n"
+        "specify memory dimm device with name dimmid, size sz on node nd",
+        QEMU_ARCH_ALL)
diff --git a/sysemu.h b/sysemu.h
index bc2c788..3e21a22 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -136,6 +136,7 @@ extern QEMUClock *rtc_clock;
 extern int nb_numa_nodes;
 extern uint64_t node_mem[MAX_NODES];
 extern uint64_t node_cpumask[MAX_NODES];
+extern int nb_hp_dimms;
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/vl.c b/vl.c
index 0ff8818..efe915e 100644
--- a/vl.c
+++ b/vl.c
@@ -120,6 +120,7 @@ int main(int argc, char **argv)
 #include "hw/xen.h"
 #include "hw/qdev.h"
 #include "hw/loader.h"
+#include "hw/dimm.h"
 #include "bt-host.h"
 #include "net.h"
 #include "net/slirp.h"
@@ -242,6 +243,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order
 int nb_numa_nodes;
 uint64_t node_mem[MAX_NODES];
 uint64_t node_cpumask[MAX_NODES];
+int nb_hp_dimms;
 
 uint8_t qemu_uuid[16];
 
@@ -518,6 +520,23 @@ static void configure_rtc_date_offset(const char *startdate, int legacy)
         rtc_date_offset = time(NULL) - rtc_start_date;
     }
 }
+static void configure_dimm(QemuOpts *opts)
+{
+    const char *id;
+    uint64_t size, node;
+    bool populated;
+    if (nb_hp_dimms == MAX_DIMMS) {
+        fprintf(stderr, "qemu: maximum number of DIMMs (%d) exceeded\n",
+                MAX_DIMMS);
+        exit(1);
+    }
+    id = qemu_opts_id(opts);
+    size = qemu_opt_get_size(opts, "size", DEFAULT_DIMMSIZE);
+    populated = qemu_opt_get_bool(opts, "populated", 0);
+    node = qemu_opt_get_number(opts, "node", 0);
+    dimm_create((char*)id, size, node, nb_hp_dimms, populated);
+    nb_hp_dimms++;
+}
 
 static void configure_rtc(QemuOpts *opts)
 {
@@ -2273,6 +2292,8 @@ int main(int argc, char **argv, char **envp)
     DisplayChangeListener *dcl;
     int cyls, heads, secs, translation;
     QemuOpts *hda_opts = NULL, *opts, *machine_opts;
+    QemuOpts *dimm_opts[MAX_DIMMS];
+    int nb_dimm_opts = 0;
     QemuOptsList *olist;
     int optind;
     const char *optarg;
@@ -3200,6 +3221,18 @@ int main(int argc, char **argv, char **envp)
             case QEMU_OPTION_qtest_log:
                 qtest_log = optarg;
                 break;
+            case QEMU_OPTION_dimm:
+                if (nb_dimm_opts == MAX_DIMMS) {
+                    fprintf(stderr, "qemu: maximum number of DIMMs (%d) exceeded\n",
+                        MAX_DIMMS);
+                }
+                dimm_opts[nb_dimm_opts] =
+                    qemu_opts_parse(qemu_find_opts("dimm"), optarg, 0);
+                if (!dimm_opts[nb_dimm_opts]) {
+                    exit(1);
+                }
+                nb_dimm_opts++;
+                break;
             default:
                 os_parse_cmd_args(popt->index, optarg);
             }
@@ -3517,6 +3550,8 @@ int main(int argc, char **argv, char **envp)
     }
     qemu_add_globals();
 
+    for (i = 0; i < nb_dimm_opts; i++)
+        configure_dimm(dimm_opts[i]);
     qdev_machine_init();
 
     machine->init(ram_size, boot_devices,
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 11/21] Implement dimm_add and dimm_del hmp/qmp commands
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

Hot-add hmp syntax: dimm_add dimmid
Hot-remove hmp syntax: dimm_del dimmid

Respective qmp commands are "dimm-add", "dimm-del".

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hmp-commands.hx |   32 ++++++++++++++++++++++++++++++++
 monitor.c       |   11 +++++++++++
 monitor.h       |    3 +++
 qmp-commands.hx |   39 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 85 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index f5d9d91..012c150 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -618,6 +618,38 @@ Add device.
 ETEXI
 
     {
+        .name       = "dimm_del",
+        .args_type  = "id:s",
+        .params     = "id",
+        .help       = "hot-remove memory (dimm device)",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_dimm_del,
+    },
+
+STEXI
+@item dimm_del @var{config}
+@findex dimm_del
+
+Hot-remove dimm.
+ETEXI
+
+    {
+        .name       = "dimm_add",
+        .args_type  = "id:s",
+        .params     = "id",
+        .help       = "hot-add memory (dimm device)",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_dimm_add,
+    },
+
+STEXI
+@item dimm_add @var{config}
+@findex dimm_add
+
+Hot-add dimm.
+ETEXI
+
+    {
         .name       = "device_del",
         .args_type  = "id:s",
         .params     = "device",
diff --git a/monitor.c b/monitor.c
index f6107ba..d3d95a6 100644
--- a/monitor.c
+++ b/monitor.c
@@ -67,6 +67,7 @@
 #include "qmp-commands.h"
 #include "hmp.h"
 #include "qemu-thread.h"
+#include "hw/dimm.h"
 
 /* for pic/irq_info */
 #if defined(TARGET_SPARC)
@@ -4813,3 +4814,13 @@ int monitor_read_block_device_key(Monitor *mon, const char *device,
 
     return monitor_read_bdrv_key_start(mon, bs, completion_cb, opaque);
 }
+
+int do_dimm_add(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+    return dimm_do(mon, qdict, true);
+}
+
+int do_dimm_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+    return dimm_do(mon, qdict, false);
+}
diff --git a/monitor.h b/monitor.h
index 5f4de1b..afdd721 100644
--- a/monitor.h
+++ b/monitor.h
@@ -86,4 +86,7 @@ int qmp_qom_set(Monitor *mon, const QDict *qdict, QObject **ret);
 
 int qmp_qom_get(Monitor *mon, const QDict *qdict, QObject **ret);
 
+int do_dimm_add(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_dimm_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
+
 #endif /* !MONITOR_H */
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 2e1a38e..7efd628 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2209,3 +2209,42 @@ EQMP
         .args_type  = "implements:s?,abstract:b?",
         .mhandler.cmd_new = qmp_marshal_input_qom_list_types,
     },
+    {
+        .name       = "dimm-add",
+        .args_type  = "id:s",
+        .mhandler.cmd_new = do_dimm_add,
+    },
+SQMP
+dimm-add
+-------------
+
+Hot-add memory DIMM
+
+Will hotplug memory DIMMs with given id.
+
+Example:
+
+-> { "execute": "dimm-add", "arguments": { "id": "dimm0" } }
+<- { "return": {} }
+
+EQMP
+
+    {
+        .name       = "dimm-del",
+        .args_type  = "id:s",
+        .mhandler.cmd_new = do_dimm_del,
+    },
+SQMP
+dimm-del
+-------------
+
+Hot-remove memory DIMM
+
+Will hot-unplug memory DIMMs with given id.
+
+Example:
+
+-> { "execute": "dimm-del", "arguments": { "id": "dimm0" } }
+<- { "return": {} }
+
+EQMP
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 11/21] Implement dimm_add and dimm_del hmp/qmp commands
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

Hot-add hmp syntax: dimm_add dimmid
Hot-remove hmp syntax: dimm_del dimmid

Respective qmp commands are "dimm-add", "dimm-del".

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hmp-commands.hx |   32 ++++++++++++++++++++++++++++++++
 monitor.c       |   11 +++++++++++
 monitor.h       |    3 +++
 qmp-commands.hx |   39 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 85 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index f5d9d91..012c150 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -618,6 +618,38 @@ Add device.
 ETEXI
 
     {
+        .name       = "dimm_del",
+        .args_type  = "id:s",
+        .params     = "id",
+        .help       = "hot-remove memory (dimm device)",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_dimm_del,
+    },
+
+STEXI
+@item dimm_del @var{config}
+@findex dimm_del
+
+Hot-remove dimm.
+ETEXI
+
+    {
+        .name       = "dimm_add",
+        .args_type  = "id:s",
+        .params     = "id",
+        .help       = "hot-add memory (dimm device)",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_dimm_add,
+    },
+
+STEXI
+@item dimm_add @var{config}
+@findex dimm_add
+
+Hot-add dimm.
+ETEXI
+
+    {
         .name       = "device_del",
         .args_type  = "id:s",
         .params     = "device",
diff --git a/monitor.c b/monitor.c
index f6107ba..d3d95a6 100644
--- a/monitor.c
+++ b/monitor.c
@@ -67,6 +67,7 @@
 #include "qmp-commands.h"
 #include "hmp.h"
 #include "qemu-thread.h"
+#include "hw/dimm.h"
 
 /* for pic/irq_info */
 #if defined(TARGET_SPARC)
@@ -4813,3 +4814,13 @@ int monitor_read_block_device_key(Monitor *mon, const char *device,
 
     return monitor_read_bdrv_key_start(mon, bs, completion_cb, opaque);
 }
+
+int do_dimm_add(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+    return dimm_do(mon, qdict, true);
+}
+
+int do_dimm_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+    return dimm_do(mon, qdict, false);
+}
diff --git a/monitor.h b/monitor.h
index 5f4de1b..afdd721 100644
--- a/monitor.h
+++ b/monitor.h
@@ -86,4 +86,7 @@ int qmp_qom_set(Monitor *mon, const QDict *qdict, QObject **ret);
 
 int qmp_qom_get(Monitor *mon, const QDict *qdict, QObject **ret);
 
+int do_dimm_add(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_dimm_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
+
 #endif /* !MONITOR_H */
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 2e1a38e..7efd628 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2209,3 +2209,42 @@ EQMP
         .args_type  = "implements:s?,abstract:b?",
         .mhandler.cmd_new = qmp_marshal_input_qom_list_types,
     },
+    {
+        .name       = "dimm-add",
+        .args_type  = "id:s",
+        .mhandler.cmd_new = do_dimm_add,
+    },
+SQMP
+dimm-add
+-------------
+
+Hot-add memory DIMM
+
+Will hotplug memory DIMMs with given id.
+
+Example:
+
+-> { "execute": "dimm-add", "arguments": { "id": "dimm0" } }
+<- { "return": {} }
+
+EQMP
+
+    {
+        .name       = "dimm-del",
+        .args_type  = "id:s",
+        .mhandler.cmd_new = do_dimm_del,
+    },
+SQMP
+dimm-del
+-------------
+
+Hot-remove memory DIMM
+
+Will hot-unplug memory DIMMs with given id.
+
+Example:
+
+-> { "execute": "dimm-del", "arguments": { "id": "dimm0" } }
+<- { "return": {} }
+
+EQMP
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 12/21] fix live-migration when "populated=on" is missing
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

Live migration works after memory hot-add events, as long as the
qemu command line "-dimm" arguments are changed on the destination host
to specify "populated=on" for the dimms that have been hot-added.

If a command-line change has not occured, the destination host does not yet
have the corresponding ramblock in its ram_list. Activate the memslot on the
destination during ram_load.

Perhaps several fields of the DimmState struct should be part of a
VMStateDescription to handle migration in a cleaner way.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 arch_init.c |   23 ++++++++++++++++++++---
 1 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index a9e8b74..5f46b98 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -43,6 +43,7 @@
 #include "hw/smbios.h"
 #include "exec-memory.h"
 #include "hw/pcspk.h"
+#include "hw/dimm.h"
 
 #ifdef TARGET_SPARC
 int graphic_width = 1024;
@@ -452,9 +453,25 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
                     }
 
                     if (!block) {
-                        fprintf(stderr, "Unknown ramblock \"%s\", cannot "
-                                "accept migration\n", id);
-                        return -EINVAL;
+                        /* this can happen if a dimm was hot-added at source host */
+                        DimmState *slot = dimm_find_from_name(id);
+                        if (slot) {
+                            dimm_activate(slot);
+                            /* rescan ram_list, verify ramblock is there now */
+                            QLIST_FOREACH(block, &ram_list.blocks, next) {
+                                if (!strncmp(id, block->idstr, sizeof(id))) {
+                                    if (block->length != length)
+                                        return -EINVAL;
+                                    break;
+                                }
+                            }
+                            assert(block);
+                        }
+                        else {
+                            fprintf(stderr, "Unknown ramblock \"%s\", cannot "
+                                    "accept migration\n", id);
+                            return -EINVAL;
+                        }
                     }
 
                     total_ram_bytes -= length;
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 12/21] fix live-migration when "populated=on" is missing
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

Live migration works after memory hot-add events, as long as the
qemu command line "-dimm" arguments are changed on the destination host
to specify "populated=on" for the dimms that have been hot-added.

If a command-line change has not occured, the destination host does not yet
have the corresponding ramblock in its ram_list. Activate the memslot on the
destination during ram_load.

Perhaps several fields of the DimmState struct should be part of a
VMStateDescription to handle migration in a cleaner way.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 arch_init.c |   23 ++++++++++++++++++++---
 1 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index a9e8b74..5f46b98 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -43,6 +43,7 @@
 #include "hw/smbios.h"
 #include "exec-memory.h"
 #include "hw/pcspk.h"
+#include "hw/dimm.h"
 
 #ifdef TARGET_SPARC
 int graphic_width = 1024;
@@ -452,9 +453,25 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
                     }
 
                     if (!block) {
-                        fprintf(stderr, "Unknown ramblock \"%s\", cannot "
-                                "accept migration\n", id);
-                        return -EINVAL;
+                        /* this can happen if a dimm was hot-added at source host */
+                        DimmState *slot = dimm_find_from_name(id);
+                        if (slot) {
+                            dimm_activate(slot);
+                            /* rescan ram_list, verify ramblock is there now */
+                            QLIST_FOREACH(block, &ram_list.blocks, next) {
+                                if (!strncmp(id, block->idstr, sizeof(id))) {
+                                    if (block->length != length)
+                                        return -EINVAL;
+                                    break;
+                                }
+                            }
+                            assert(block);
+                        }
+                        else {
+                            fprintf(stderr, "Unknown ramblock \"%s\", cannot "
+                                    "accept migration\n", id);
+                            return -EINVAL;
+                        }
                     }
 
                     total_ram_bytes -= length;
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 13/21] Implement memory hotplug notification lists
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

Guest can respond to ACPI hotplug events e.g. with _EJ or _OST method.
This patch implements a tail queue to store guest notifications for memory
hot-add and hot-remove requests.

Guest responses for memory hotplug command on a per-dimm basis can be detected
with the new hmp command "info memhp" or the new qmp command "query-memhp"
Examples:
    
(qemu) dimm_add dimm0
(qemu) info memhp
Dimm: dimm0 hot-add success
or
Dimm: dimm0 hot-add failure
    
(qemu) dimm_del dimm0
(qemu) info memhp
Dimm: dimm0 hot-remove success
or
Dimm: dimm0 hot-remove failure

Results are removed from the queue once read.

This patch only queues _EJ events that signal hot-remove success.
For  _OST event queuing, which cover the hot-remove failure and
hot-add success/failure cases, the next 2 patches are also needed.

These notification items should probably be part of migration state (not yet
implemented)

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hmp-commands.hx  |    2 +
 hmp.c            |   17 ++++++++++++++++
 hmp.h            |    1 +
 hw/dimm.c        |   55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/dimm.h        |    6 +++++
 monitor.c        |    7 ++++++
 qapi-schema.json |   26 +++++++++++++++++++++++++
 qmp-commands.hx  |   38 +++++++++++++++++++++++++++++++++++++
 8 files changed, 152 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 012c150..3172cde 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1459,6 +1459,8 @@ show device tree
 show qdev device model list
 @item info roms
 show roms
+@item info memhp
+show memhp
 @end table
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index b9cec1d..ec25d9a 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1000,3 +1000,20 @@ void hmp_netdev_del(Monitor *mon, const QDict *qdict)
     qmp_netdev_del(id, &err);
     hmp_handle_error(mon, &err);
 }
+
+void hmp_info_memhp(Monitor *mon)
+{
+    MemHpInfoList *info;
+    MemHpInfoList *item;
+    MemHpInfo *dimm;
+
+    info = qmp_query_memhp(NULL);
+    for (item = info; item; item = item->next) {
+        dimm = item->value;
+        monitor_printf(mon, "Dimm: %s %s %s\n", dimm->Dimm,
+                dimm->request, dimm->result);
+        dimm->Dimm = NULL;
+    }
+
+    qapi_free_MemHpInfoList(info);
+}
diff --git a/hmp.h b/hmp.h
index 79d138d..971e7c4 100644
--- a/hmp.h
+++ b/hmp.h
@@ -64,5 +64,6 @@ void hmp_device_del(Monitor *mon, const QDict *qdict);
 void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict);
 void hmp_netdev_add(Monitor *mon, const QDict *qdict);
 void hmp_netdev_del(Monitor *mon, const QDict *qdict);
+void hmp_info_memhp(Monitor *mon);
 
 #endif
diff --git a/hw/dimm.c b/hw/dimm.c
index 00c4623..9b32386 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -26,6 +26,7 @@
 static DeviceState *dimm_hotplug_qdev;
 static dimm_hotplug_fn dimm_hotplug;
 static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;
+static QTAILQ_HEAD(dimm_hp_result_head, dimm_hp_result)  dimm_hp_result_queue;
 
 static Property dimm_properties[] = {
     DEFINE_PROP_END_OF_LIST()
@@ -189,16 +190,69 @@ void dimm_notify(uint32_t idx, uint32_t event)
     DimmState *s;
     s = dimm_find_from_idx(idx);
     assert(s != NULL);
+    struct dimm_hp_result *result = g_malloc0(sizeof(*result));
 
+    result->s = s;
+    result->ret = event;
     switch(event) {
         case DIMM_REMOVE_SUCCESS:
             dimm_depopulate(s);
+            QTAILQ_INSERT_TAIL(&dimm_hp_result_queue, result, next);
             break;
         default:
+            g_free(result);
             break;
     }
 }
 
+MemHpInfoList *qmp_query_memhp(Error **errp)
+{
+    MemHpInfoList *head = NULL, *cur_item = NULL, *info;
+    struct dimm_hp_result *item, *nextitem;
+
+    QTAILQ_FOREACH_SAFE(item, &dimm_hp_result_queue, next, nextitem) {
+
+        info = g_malloc0(sizeof(*info));
+        info->value = g_malloc0(sizeof(*info->value));
+        info->value->Dimm = g_malloc0(sizeof(char) * 32);
+        info->value->request = g_malloc0(sizeof(char) * 16);
+        info->value->result = g_malloc0(sizeof(char) * 16);
+        switch (item->ret) {
+            case DIMM_REMOVE_SUCCESS:
+                strcpy(info->value->request, "hot-remove");
+                strcpy(info->value->result, "success");
+                break;
+            case DIMM_REMOVE_FAIL:
+                strcpy(info->value->request, "hot-remove");
+                strcpy(info->value->result, "failure");
+                break;
+            case DIMM_ADD_SUCCESS:
+                strcpy(info->value->request, "hot-add");
+                strcpy(info->value->result, "success");
+                break;
+            case DIMM_ADD_FAIL:
+                strcpy(info->value->request, "hot-add");
+                strcpy(info->value->result, "failure");
+                break;
+            default:
+                break;    
+        }
+        strcpy(info->value->Dimm, item->s->busdev.qdev.id);
+        /* XXX: waiting for the qapi to support GSList */
+        if (!cur_item) {
+            head = cur_item = info;
+        } else {
+            cur_item->next = info;
+            cur_item = info;
+        }
+
+        /* hotplug notification copied to qmp list, delete original item */
+        QTAILQ_REMOVE(&dimm_hp_result_queue, item, next);
+        g_free(item);
+    }
+
+    return head;
+}
 static int dimm_init(SysBusDevice *s)
 {
     DimmState *slot;
@@ -217,6 +271,7 @@ static void dimm_class_init(ObjectClass *klass, void *data)
     sc->init = dimm_init;
     dimm_hotplug = NULL;
     QTAILQ_INIT(&dimmlist);
+    QTAILQ_INIT(&dimm_hp_result_queue);
 }
 
 static TypeInfo dimm_info = {
diff --git a/hw/dimm.h b/hw/dimm.h
index 643f319..3e55ed3 100644
--- a/hw/dimm.h
+++ b/hw/dimm.h
@@ -37,6 +37,12 @@ typedef struct DimmState {
     QTAILQ_ENTRY (DimmState) nextdimm;
 } DimmState;
 
+struct dimm_hp_result {
+    DimmState *s;
+    dimm_hp_result_code ret;
+    QTAILQ_ENTRY (dimm_hp_result) next;
+};
+
 typedef int (*dimm_hotplug_fn)(DeviceState *qdev, SysBusDevice *dev, int add);
 typedef target_phys_addr_t (*dimm_calcoffset_fn)(uint64_t size);
 
diff --git a/monitor.c b/monitor.c
index d3d95a6..4a14e26 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2732,6 +2732,13 @@ static mon_cmd_t info_cmds[] = {
         .mhandler.info = do_trace_print_events,
     },
     {
+        .name       = "memhp",
+        .args_type  = "",
+        .params     = "",
+        .help       = "show memory hotplug status",
+        .mhandler.info = hmp_info_memhp,
+    },
+    {
         .name       = NULL,
     },
 };
diff --git a/qapi-schema.json b/qapi-schema.json
index 3b6e346..049f6f9 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1862,3 +1862,29 @@
 # Since: 0.14.0
 ##
 { 'command': 'netdev_del', 'data': {'id': 'str'} }
+
+##
+# @MemHpInfo:
+#
+# Information about status of a memory hotplug command
+#
+# @Dimm: the Dimm associated with the result
+#
+# @result: the result of the hotplug command
+#
+# Since: 1.1.3
+#
+##
+{ 'type': 'MemHpInfo',
+  'data': {'Dimm': 'str', 'request': 'str', 'result': 'str'} }
+
+##
+# @query-memhp:
+#
+# Returns a list of information about pending hotplug commands
+#
+# Returns: a list of @MemhpInfo
+#
+# Since: 1.1.3
+##
+{ 'command': 'query-memhp', 'returns': ['MemHpInfo'] }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 7efd628..cd1d5f0 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2248,3 +2248,41 @@ Example:
 <- { "return": {} }
 
 EQMP
+
+    {
+        .name       = "query-memhp",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_input_query_memhp
+    },
+SQMP
+query-memhp
+----------
+
+Show memory hotplug command notifications.
+
+Return a json-array. Each DIMM that has a pending notification is represented
+by a json-object, which contains:
+
+- "Dimm": Dimm name (json-str)
+- "request": type of hot request: hot-add or hot-remove  (json-str)
+- "result": result of the hotplug request for this Dimm success or failure (json-str)
+
+Example:
+
+-> { "execute": "query-memhp" }
+<- {
+      "return":[
+         {
+            "result": "failure",
+            "request": "hot-remove",
+            "Dimm": "dimm10"
+         },
+         {
+            "result": "success",
+            "request": "hot-add",
+            "Dimm": "dimm3"
+         }
+      ]
+   }
+
+EQMP
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 13/21] Implement memory hotplug notification lists
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

Guest can respond to ACPI hotplug events e.g. with _EJ or _OST method.
This patch implements a tail queue to store guest notifications for memory
hot-add and hot-remove requests.

Guest responses for memory hotplug command on a per-dimm basis can be detected
with the new hmp command "info memhp" or the new qmp command "query-memhp"
Examples:
    
(qemu) dimm_add dimm0
(qemu) info memhp
Dimm: dimm0 hot-add success
or
Dimm: dimm0 hot-add failure
    
(qemu) dimm_del dimm0
(qemu) info memhp
Dimm: dimm0 hot-remove success
or
Dimm: dimm0 hot-remove failure

Results are removed from the queue once read.

This patch only queues _EJ events that signal hot-remove success.
For  _OST event queuing, which cover the hot-remove failure and
hot-add success/failure cases, the next 2 patches are also needed.

These notification items should probably be part of migration state (not yet
implemented)

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hmp-commands.hx  |    2 +
 hmp.c            |   17 ++++++++++++++++
 hmp.h            |    1 +
 hw/dimm.c        |   55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/dimm.h        |    6 +++++
 monitor.c        |    7 ++++++
 qapi-schema.json |   26 +++++++++++++++++++++++++
 qmp-commands.hx  |   38 +++++++++++++++++++++++++++++++++++++
 8 files changed, 152 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 012c150..3172cde 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1459,6 +1459,8 @@ show device tree
 show qdev device model list
 @item info roms
 show roms
+@item info memhp
+show memhp
 @end table
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index b9cec1d..ec25d9a 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1000,3 +1000,20 @@ void hmp_netdev_del(Monitor *mon, const QDict *qdict)
     qmp_netdev_del(id, &err);
     hmp_handle_error(mon, &err);
 }
+
+void hmp_info_memhp(Monitor *mon)
+{
+    MemHpInfoList *info;
+    MemHpInfoList *item;
+    MemHpInfo *dimm;
+
+    info = qmp_query_memhp(NULL);
+    for (item = info; item; item = item->next) {
+        dimm = item->value;
+        monitor_printf(mon, "Dimm: %s %s %s\n", dimm->Dimm,
+                dimm->request, dimm->result);
+        dimm->Dimm = NULL;
+    }
+
+    qapi_free_MemHpInfoList(info);
+}
diff --git a/hmp.h b/hmp.h
index 79d138d..971e7c4 100644
--- a/hmp.h
+++ b/hmp.h
@@ -64,5 +64,6 @@ void hmp_device_del(Monitor *mon, const QDict *qdict);
 void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict);
 void hmp_netdev_add(Monitor *mon, const QDict *qdict);
 void hmp_netdev_del(Monitor *mon, const QDict *qdict);
+void hmp_info_memhp(Monitor *mon);
 
 #endif
diff --git a/hw/dimm.c b/hw/dimm.c
index 00c4623..9b32386 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -26,6 +26,7 @@
 static DeviceState *dimm_hotplug_qdev;
 static dimm_hotplug_fn dimm_hotplug;
 static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;
+static QTAILQ_HEAD(dimm_hp_result_head, dimm_hp_result)  dimm_hp_result_queue;
 
 static Property dimm_properties[] = {
     DEFINE_PROP_END_OF_LIST()
@@ -189,16 +190,69 @@ void dimm_notify(uint32_t idx, uint32_t event)
     DimmState *s;
     s = dimm_find_from_idx(idx);
     assert(s != NULL);
+    struct dimm_hp_result *result = g_malloc0(sizeof(*result));
 
+    result->s = s;
+    result->ret = event;
     switch(event) {
         case DIMM_REMOVE_SUCCESS:
             dimm_depopulate(s);
+            QTAILQ_INSERT_TAIL(&dimm_hp_result_queue, result, next);
             break;
         default:
+            g_free(result);
             break;
     }
 }
 
+MemHpInfoList *qmp_query_memhp(Error **errp)
+{
+    MemHpInfoList *head = NULL, *cur_item = NULL, *info;
+    struct dimm_hp_result *item, *nextitem;
+
+    QTAILQ_FOREACH_SAFE(item, &dimm_hp_result_queue, next, nextitem) {
+
+        info = g_malloc0(sizeof(*info));
+        info->value = g_malloc0(sizeof(*info->value));
+        info->value->Dimm = g_malloc0(sizeof(char) * 32);
+        info->value->request = g_malloc0(sizeof(char) * 16);
+        info->value->result = g_malloc0(sizeof(char) * 16);
+        switch (item->ret) {
+            case DIMM_REMOVE_SUCCESS:
+                strcpy(info->value->request, "hot-remove");
+                strcpy(info->value->result, "success");
+                break;
+            case DIMM_REMOVE_FAIL:
+                strcpy(info->value->request, "hot-remove");
+                strcpy(info->value->result, "failure");
+                break;
+            case DIMM_ADD_SUCCESS:
+                strcpy(info->value->request, "hot-add");
+                strcpy(info->value->result, "success");
+                break;
+            case DIMM_ADD_FAIL:
+                strcpy(info->value->request, "hot-add");
+                strcpy(info->value->result, "failure");
+                break;
+            default:
+                break;    
+        }
+        strcpy(info->value->Dimm, item->s->busdev.qdev.id);
+        /* XXX: waiting for the qapi to support GSList */
+        if (!cur_item) {
+            head = cur_item = info;
+        } else {
+            cur_item->next = info;
+            cur_item = info;
+        }
+
+        /* hotplug notification copied to qmp list, delete original item */
+        QTAILQ_REMOVE(&dimm_hp_result_queue, item, next);
+        g_free(item);
+    }
+
+    return head;
+}
 static int dimm_init(SysBusDevice *s)
 {
     DimmState *slot;
@@ -217,6 +271,7 @@ static void dimm_class_init(ObjectClass *klass, void *data)
     sc->init = dimm_init;
     dimm_hotplug = NULL;
     QTAILQ_INIT(&dimmlist);
+    QTAILQ_INIT(&dimm_hp_result_queue);
 }
 
 static TypeInfo dimm_info = {
diff --git a/hw/dimm.h b/hw/dimm.h
index 643f319..3e55ed3 100644
--- a/hw/dimm.h
+++ b/hw/dimm.h
@@ -37,6 +37,12 @@ typedef struct DimmState {
     QTAILQ_ENTRY (DimmState) nextdimm;
 } DimmState;
 
+struct dimm_hp_result {
+    DimmState *s;
+    dimm_hp_result_code ret;
+    QTAILQ_ENTRY (dimm_hp_result) next;
+};
+
 typedef int (*dimm_hotplug_fn)(DeviceState *qdev, SysBusDevice *dev, int add);
 typedef target_phys_addr_t (*dimm_calcoffset_fn)(uint64_t size);
 
diff --git a/monitor.c b/monitor.c
index d3d95a6..4a14e26 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2732,6 +2732,13 @@ static mon_cmd_t info_cmds[] = {
         .mhandler.info = do_trace_print_events,
     },
     {
+        .name       = "memhp",
+        .args_type  = "",
+        .params     = "",
+        .help       = "show memory hotplug status",
+        .mhandler.info = hmp_info_memhp,
+    },
+    {
         .name       = NULL,
     },
 };
diff --git a/qapi-schema.json b/qapi-schema.json
index 3b6e346..049f6f9 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1862,3 +1862,29 @@
 # Since: 0.14.0
 ##
 { 'command': 'netdev_del', 'data': {'id': 'str'} }
+
+##
+# @MemHpInfo:
+#
+# Information about status of a memory hotplug command
+#
+# @Dimm: the Dimm associated with the result
+#
+# @result: the result of the hotplug command
+#
+# Since: 1.1.3
+#
+##
+{ 'type': 'MemHpInfo',
+  'data': {'Dimm': 'str', 'request': 'str', 'result': 'str'} }
+
+##
+# @query-memhp:
+#
+# Returns a list of information about pending hotplug commands
+#
+# Returns: a list of @MemhpInfo
+#
+# Since: 1.1.3
+##
+{ 'command': 'query-memhp', 'returns': ['MemHpInfo'] }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 7efd628..cd1d5f0 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2248,3 +2248,41 @@ Example:
 <- { "return": {} }
 
 EQMP
+
+    {
+        .name       = "query-memhp",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_input_query_memhp
+    },
+SQMP
+query-memhp
+----------
+
+Show memory hotplug command notifications.
+
+Return a json-array. Each DIMM that has a pending notification is represented
+by a json-object, which contains:
+
+- "Dimm": Dimm name (json-str)
+- "request": type of hot request: hot-add or hot-remove  (json-str)
+- "result": result of the hotplug request for this Dimm success or failure (json-str)
+
+Example:
+
+-> { "execute": "query-memhp" }
+<- {
+      "return":[
+         {
+            "result": "failure",
+            "request": "hot-remove",
+            "Dimm": "dimm10"
+         },
+         {
+            "result": "success",
+            "request": "hot-add",
+            "Dimm": "dimm3"
+         }
+      ]
+   }
+
+EQMP
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 14/21][SeaBIOS] acpi_dsdt: Support _OST dimm method
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

Add support for _OST method. _OST method will write into the correct I/O byte to
signal success / failure of hot-add or hot-remove to qemu.
 
Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 src/acpi-dsdt.dsl |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 src/ssdt-mem.dsl  |    4 ++++
 2 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 5d3e92b..1c253ca 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -762,6 +762,28 @@ DefinitionBlock (
             MPE, 8
         }
         
+
+        /* Memory hot-remove notify failure byte */
+        OperationRegion(MEEF, SystemIO, 0xafa1, 1)
+        Field (MEEF, ByteAcc, NoLock, Preserve)
+        {
+            MEF, 8
+        }
+
+        /* Memory hot-add notify success byte */
+        OperationRegion(MPIS, SystemIO, 0xafa2, 1)
+        Field (MPIS, ByteAcc, NoLock, Preserve)
+        {
+            MIS, 8
+        }
+
+        /* Memory hot-add notify failure byte */
+        OperationRegion(MPIF, SystemIO, 0xafa3, 1)
+        Field (MPIF, ByteAcc, NoLock, Preserve)
+        {
+            MIF, 8
+        }
+
         Method(MESC, 0) {
             // Local5 = active memdevice bitmap
             Store (MES, Local5)
@@ -802,6 +824,30 @@ DefinitionBlock (
             Store(Arg0, MPE)
             Sleep(200)
         }
+        Method (MOST, 3, Serialized) {
+            // _OST method - OS status indication
+            Switch (And(Arg0, 0xFF)) {
+                Case(0x3)
+                {
+                    Switch(And(Arg1, 0xFF)) {
+                        Case(0x1) {
+                            Store(Arg2, MEF)
+                        }
+                    }
+                }
+                Case(0x1)
+                {
+                    Switch(And(Arg1, 0xFF)) {
+                        Case(0x0) {
+                            Store(Arg2, MIS)
+                        }
+                        Case(0x1) {
+                            Store(Arg2, MIF)
+                        }
+                    }
+                }
+            }
+        }
     }
 
 
diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl
index ee322f0..041d301 100644
--- a/src/ssdt-mem.dsl
+++ b/src/ssdt-mem.dsl
@@ -38,6 +38,7 @@ DefinitionBlock ("ssdt-mem.aml", "SSDT", 0x02, "BXPC", "CSSDT", 0x1)
 
         External(CMST, MethodObj)
         External(MPEJ, MethodObj)
+        External(MOST, MethodObj)
 
         Name(_CRS, ResourceTemplate() {
             QwordMemory(
@@ -60,6 +61,9 @@ DefinitionBlock ("ssdt-mem.aml", "SSDT", 0x02, "BXPC", "CSSDT", 0x1)
         Method (_EJ0, 1, NotSerialized) {
             MPEJ(ID, Arg0)
         }
+        Method (_OST, 3) {
+            MOST(Arg0, Arg1, ID)
+        }
     }
 }    
 
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 14/21][SeaBIOS] acpi_dsdt: Support _OST dimm method
@ 2012-07-11 10:31   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:31 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

Add support for _OST method. _OST method will write into the correct I/O byte to
signal success / failure of hot-add or hot-remove to qemu.
 
Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 src/acpi-dsdt.dsl |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 src/ssdt-mem.dsl  |    4 ++++
 2 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 5d3e92b..1c253ca 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -762,6 +762,28 @@ DefinitionBlock (
             MPE, 8
         }
         
+
+        /* Memory hot-remove notify failure byte */
+        OperationRegion(MEEF, SystemIO, 0xafa1, 1)
+        Field (MEEF, ByteAcc, NoLock, Preserve)
+        {
+            MEF, 8
+        }
+
+        /* Memory hot-add notify success byte */
+        OperationRegion(MPIS, SystemIO, 0xafa2, 1)
+        Field (MPIS, ByteAcc, NoLock, Preserve)
+        {
+            MIS, 8
+        }
+
+        /* Memory hot-add notify failure byte */
+        OperationRegion(MPIF, SystemIO, 0xafa3, 1)
+        Field (MPIF, ByteAcc, NoLock, Preserve)
+        {
+            MIF, 8
+        }
+
         Method(MESC, 0) {
             // Local5 = active memdevice bitmap
             Store (MES, Local5)
@@ -802,6 +824,30 @@ DefinitionBlock (
             Store(Arg0, MPE)
             Sleep(200)
         }
+        Method (MOST, 3, Serialized) {
+            // _OST method - OS status indication
+            Switch (And(Arg0, 0xFF)) {
+                Case(0x3)
+                {
+                    Switch(And(Arg1, 0xFF)) {
+                        Case(0x1) {
+                            Store(Arg2, MEF)
+                        }
+                    }
+                }
+                Case(0x1)
+                {
+                    Switch(And(Arg1, 0xFF)) {
+                        Case(0x0) {
+                            Store(Arg2, MIS)
+                        }
+                        Case(0x1) {
+                            Store(Arg2, MIF)
+                        }
+                    }
+                }
+            }
+        }
     }
 
 
diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl
index ee322f0..041d301 100644
--- a/src/ssdt-mem.dsl
+++ b/src/ssdt-mem.dsl
@@ -38,6 +38,7 @@ DefinitionBlock ("ssdt-mem.aml", "SSDT", 0x02, "BXPC", "CSSDT", 0x1)
 
         External(CMST, MethodObj)
         External(MPEJ, MethodObj)
+        External(MOST, MethodObj)
 
         Name(_CRS, ResourceTemplate() {
             QwordMemory(
@@ -60,6 +61,9 @@ DefinitionBlock ("ssdt-mem.aml", "SSDT", 0x02, "BXPC", "CSSDT", 0x1)
         Method (_EJ0, 1, NotSerialized) {
             MPEJ(ID, Arg0)
         }
+        Method (_OST, 3) {
+            MOST(Arg0, Arg1, ID)
+        }
     }
 }    
 
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 15/21] acpi_piix4: _OST dimm support
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:32   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:32 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

This allows qemu to receive notifications from the guest OS on success or
failure of a memory hotplug request. The guest OS needs to implement the _OST
functionality for this to work (linux-next: http://lkml.org/lkml/2012/6/25/321)
Also add new _OST registers in docs/specs/acpi_hotplug.txt

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 docs/specs/acpi_hotplug.txt |   24 ++++++++++++++++++++++++
 hw/acpi_piix4.c             |   15 +++++++++++++++
 hw/dimm.c                   |   18 ++++++++++++++++++
 hw/dimm.h                   |    1 +
 4 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/docs/specs/acpi_hotplug.txt b/docs/specs/acpi_hotplug.txt
index cf86242..2f6fd5f 100644
--- a/docs/specs/acpi_hotplug.txt
+++ b/docs/specs/acpi_hotplug.txt
@@ -20,3 +20,27 @@ ejected.
 
 Written by ACPI memory device _EJ0 method to notify qemu of successfull
 hot-removal.  Write-only.
+
+Memory Dimm ejection failure notification (IO port 0xafa1, 1-byte access):
+---------------------------------------------------------------
+Dimm hot-remove _OST failure notification. Byte value indicates Dimm slot for
+which ejection failed.
+
+Written by ACPI memory device _OST method to notify qemu of failed
+hot-removal.  Write-only.
+
+Memory Dimm insertion success notification (IO port 0xafa2, 1-byte access):
+---------------------------------------------------------------
+Dimm hot-add _OST success notification. Byte value indicates Dimm slot for which
+insertion succeeded.
+
+Written by ACPI memory device _OST method to notify qemu of failed
+hot-add.  Write-only.
+
+Memory Dimm insertion failure notification (IO port 0xafa3, 1-byte access):
+---------------------------------------------------------------
+Dimm hot-add _OST failure notification. Byte value indicates Dimm slot for which
+insertion failed.
+
+Written by ACPI memory device _OST method to notify qemu of failed
+hot-add.  Write-only.
diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index b988597..d8e2c22 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -49,6 +49,9 @@
 #define PCI_RMV_BASE 0xae0c
 #define MEM_BASE 0xaf80
 #define MEM_EJ_BASE 0xafa0
+#define MEM_OST_REMOVE_FAIL 0xafa1
+#define MEM_OST_ADD_SUCCESS 0xafa2
+#define MEM_OST_ADD_FAIL 0xafa3
 
 #define PIIX4_MEM_HOTPLUG_STATUS 8
 #define PIIX4_PCI_HOTPLUG_STATUS 2
@@ -531,6 +534,15 @@ static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
         case MEM_EJ_BASE:
             dimm_notify(val, DIMM_REMOVE_SUCCESS);
             break;
+        case MEM_OST_REMOVE_FAIL:
+            dimm_notify(val, DIMM_REMOVE_FAIL);
+            break;
+        case MEM_OST_ADD_SUCCESS:
+            dimm_notify(val, DIMM_ADD_SUCCESS);
+            break;
+        case MEM_OST_ADD_FAIL:
+            dimm_notify(val, DIMM_ADD_FAIL);
+            break;
         default:
             acpi_gpe_ioport_writeb(&s->ar, addr, val);
     }
@@ -604,6 +616,9 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
 
     register_ioport_read(MEM_BASE, DIMM_BITMAP_BYTES, 1,  gpe_readb, s);
     register_ioport_write(MEM_EJ_BASE, 1, 1,  gpe_writeb, s);
+    register_ioport_write(MEM_OST_REMOVE_FAIL, 1, 1,  gpe_writeb, s);
+    register_ioport_write(MEM_OST_ADD_SUCCESS, 1, 1,  gpe_writeb, s);
+    register_ioport_write(MEM_OST_ADD_FAIL, 1, 1,  gpe_writeb, s);
 
     for(i = 0; i < DIMM_BITMAP_BYTES; i++) {
         s->gperegs.mems_sts[i] = 0;
diff --git a/hw/dimm.c b/hw/dimm.c
index 9b32386..ba104cc 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -89,12 +89,14 @@ void dimm_activate(DimmState *slot)
     dimm_populate(slot);
     if (dimm_hotplug)
         dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 1);
+    slot->pending = true;
 }
 
 void dimm_deactivate(DimmState *slot)
 {
     if (dimm_hotplug)
         dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 0);
+    slot->pending = true;
 }
 
 DimmState *dimm_find_from_name(char *id)
@@ -138,6 +140,10 @@ int dimm_do(Monitor *mon, const QDict *qdict, bool add)
                     __FUNCTION__, id);
             return 1;
         }
+        if (slot->pending) {
+            fprintf(stderr, "warning: %s slot %s hot-operation pending\n",
+                    __FUNCTION__, id);
+        }
         dimm_activate(slot);
     }
     else {
@@ -146,6 +152,10 @@ int dimm_do(Monitor *mon, const QDict *qdict, bool add)
                     __FUNCTION__, id);
             return 1;
         }
+        if (slot->pending) {
+            fprintf(stderr, "warning: %s slot %s hot-operation pending\n",
+                    __FUNCTION__, id);
+        }
         dimm_deactivate(slot);
     }
 
@@ -198,6 +208,13 @@ void dimm_notify(uint32_t idx, uint32_t event)
         case DIMM_REMOVE_SUCCESS:
             dimm_depopulate(s);
             QTAILQ_INSERT_TAIL(&dimm_hp_result_queue, result, next);
+            s->pending = false;
+            break;
+        case DIMM_REMOVE_FAIL:
+        case DIMM_ADD_SUCCESS:
+        case DIMM_ADD_FAIL:
+            QTAILQ_INSERT_TAIL(&dimm_hp_result_queue, result, next);
+            s->pending = false;
             break;
         default:
             g_free(result);
@@ -259,6 +276,7 @@ static int dimm_init(SysBusDevice *s)
     slot = DIMM(s);
     slot->mr = NULL;
     slot->populated = false;
+    slot->pending = false;
     return 0;
 }
 
diff --git a/hw/dimm.h b/hw/dimm.h
index 3e55ed3..0fa6137 100644
--- a/hw/dimm.h
+++ b/hw/dimm.h
@@ -35,6 +35,7 @@ typedef struct DimmState {
     MemoryRegion *mr; /* MemoryRegion for this slot. !NULL only if populated */
     bool populated; /* 1 means device has been hotplugged. Default is 0. */
     QTAILQ_ENTRY (DimmState) nextdimm;
+    bool pending; /* true means a hot operation is pending for this dimm */
 } DimmState;
 
 struct dimm_hp_result {
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 15/21] acpi_piix4: _OST dimm support
@ 2012-07-11 10:32   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:32 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

This allows qemu to receive notifications from the guest OS on success or
failure of a memory hotplug request. The guest OS needs to implement the _OST
functionality for this to work (linux-next: http://lkml.org/lkml/2012/6/25/321)
Also add new _OST registers in docs/specs/acpi_hotplug.txt

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 docs/specs/acpi_hotplug.txt |   24 ++++++++++++++++++++++++
 hw/acpi_piix4.c             |   15 +++++++++++++++
 hw/dimm.c                   |   18 ++++++++++++++++++
 hw/dimm.h                   |    1 +
 4 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/docs/specs/acpi_hotplug.txt b/docs/specs/acpi_hotplug.txt
index cf86242..2f6fd5f 100644
--- a/docs/specs/acpi_hotplug.txt
+++ b/docs/specs/acpi_hotplug.txt
@@ -20,3 +20,27 @@ ejected.
 
 Written by ACPI memory device _EJ0 method to notify qemu of successfull
 hot-removal.  Write-only.
+
+Memory Dimm ejection failure notification (IO port 0xafa1, 1-byte access):
+---------------------------------------------------------------
+Dimm hot-remove _OST failure notification. Byte value indicates Dimm slot for
+which ejection failed.
+
+Written by ACPI memory device _OST method to notify qemu of failed
+hot-removal.  Write-only.
+
+Memory Dimm insertion success notification (IO port 0xafa2, 1-byte access):
+---------------------------------------------------------------
+Dimm hot-add _OST success notification. Byte value indicates Dimm slot for which
+insertion succeeded.
+
+Written by ACPI memory device _OST method to notify qemu of failed
+hot-add.  Write-only.
+
+Memory Dimm insertion failure notification (IO port 0xafa3, 1-byte access):
+---------------------------------------------------------------
+Dimm hot-add _OST failure notification. Byte value indicates Dimm slot for which
+insertion failed.
+
+Written by ACPI memory device _OST method to notify qemu of failed
+hot-add.  Write-only.
diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index b988597..d8e2c22 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -49,6 +49,9 @@
 #define PCI_RMV_BASE 0xae0c
 #define MEM_BASE 0xaf80
 #define MEM_EJ_BASE 0xafa0
+#define MEM_OST_REMOVE_FAIL 0xafa1
+#define MEM_OST_ADD_SUCCESS 0xafa2
+#define MEM_OST_ADD_FAIL 0xafa3
 
 #define PIIX4_MEM_HOTPLUG_STATUS 8
 #define PIIX4_PCI_HOTPLUG_STATUS 2
@@ -531,6 +534,15 @@ static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
         case MEM_EJ_BASE:
             dimm_notify(val, DIMM_REMOVE_SUCCESS);
             break;
+        case MEM_OST_REMOVE_FAIL:
+            dimm_notify(val, DIMM_REMOVE_FAIL);
+            break;
+        case MEM_OST_ADD_SUCCESS:
+            dimm_notify(val, DIMM_ADD_SUCCESS);
+            break;
+        case MEM_OST_ADD_FAIL:
+            dimm_notify(val, DIMM_ADD_FAIL);
+            break;
         default:
             acpi_gpe_ioport_writeb(&s->ar, addr, val);
     }
@@ -604,6 +616,9 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
 
     register_ioport_read(MEM_BASE, DIMM_BITMAP_BYTES, 1,  gpe_readb, s);
     register_ioport_write(MEM_EJ_BASE, 1, 1,  gpe_writeb, s);
+    register_ioport_write(MEM_OST_REMOVE_FAIL, 1, 1,  gpe_writeb, s);
+    register_ioport_write(MEM_OST_ADD_SUCCESS, 1, 1,  gpe_writeb, s);
+    register_ioport_write(MEM_OST_ADD_FAIL, 1, 1,  gpe_writeb, s);
 
     for(i = 0; i < DIMM_BITMAP_BYTES; i++) {
         s->gperegs.mems_sts[i] = 0;
diff --git a/hw/dimm.c b/hw/dimm.c
index 9b32386..ba104cc 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -89,12 +89,14 @@ void dimm_activate(DimmState *slot)
     dimm_populate(slot);
     if (dimm_hotplug)
         dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 1);
+    slot->pending = true;
 }
 
 void dimm_deactivate(DimmState *slot)
 {
     if (dimm_hotplug)
         dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 0);
+    slot->pending = true;
 }
 
 DimmState *dimm_find_from_name(char *id)
@@ -138,6 +140,10 @@ int dimm_do(Monitor *mon, const QDict *qdict, bool add)
                     __FUNCTION__, id);
             return 1;
         }
+        if (slot->pending) {
+            fprintf(stderr, "warning: %s slot %s hot-operation pending\n",
+                    __FUNCTION__, id);
+        }
         dimm_activate(slot);
     }
     else {
@@ -146,6 +152,10 @@ int dimm_do(Monitor *mon, const QDict *qdict, bool add)
                     __FUNCTION__, id);
             return 1;
         }
+        if (slot->pending) {
+            fprintf(stderr, "warning: %s slot %s hot-operation pending\n",
+                    __FUNCTION__, id);
+        }
         dimm_deactivate(slot);
     }
 
@@ -198,6 +208,13 @@ void dimm_notify(uint32_t idx, uint32_t event)
         case DIMM_REMOVE_SUCCESS:
             dimm_depopulate(s);
             QTAILQ_INSERT_TAIL(&dimm_hp_result_queue, result, next);
+            s->pending = false;
+            break;
+        case DIMM_REMOVE_FAIL:
+        case DIMM_ADD_SUCCESS:
+        case DIMM_ADD_FAIL:
+            QTAILQ_INSERT_TAIL(&dimm_hp_result_queue, result, next);
+            s->pending = false;
             break;
         default:
             g_free(result);
@@ -259,6 +276,7 @@ static int dimm_init(SysBusDevice *s)
     slot = DIMM(s);
     slot->mr = NULL;
     slot->populated = false;
+    slot->pending = false;
     return 0;
 }
 
diff --git a/hw/dimm.h b/hw/dimm.h
index 3e55ed3..0fa6137 100644
--- a/hw/dimm.h
+++ b/hw/dimm.h
@@ -35,6 +35,7 @@ typedef struct DimmState {
     MemoryRegion *mr; /* MemoryRegion for this slot. !NULL only if populated */
     bool populated; /* 1 means device has been hotplugged. Default is 0. */
     QTAILQ_ENTRY (DimmState) nextdimm;
+    bool pending; /* true means a hot operation is pending for this dimm */
 } DimmState;
 
 struct dimm_hp_result {
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 16/21] acpi_piix4: Update dimm state on VM reboot
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:32   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:32 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

in case of hot-remove or hot-add failure, the dimm bitmaps in qemu and Seabios
are inconsistent with the true state of the DIMM devices. The "populated" field
of the DimmState reflects the true state of the device. This inconsistency means
that a failed operation cannot be retried.

Ths patch updates the bit array to the true state of the dimms on VM reboot.
This allows retry of failed hot-add or hot-remove operations after a reboot.

Retrying a failed hot operation is not yet possible before reboot (the following
patch removes this limitation for guests with _OST acpi support)

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hw/acpi_piix4.c |   25 +++++++++++++++++++++++++
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index d8e2c22..ebc5de7 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -91,6 +91,7 @@ typedef struct PIIX4PMState {
 } PIIX4PMState;
 
 static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s);
+static void piix4_dimm_state_sync(PIIX4PMState *s);
 
 #define ACPI_ENABLE 0xf1
 #define ACPI_DISABLE 0xf0
@@ -369,6 +370,7 @@ static void piix4_reset(void *opaque)
         /* Mark SMM as already inited (until KVM supports SMM). */
         pci_conf[0x5B] = 0x02;
     }
+    piix4_dimm_state_sync(s);
     piix4_update_hotplug(s);
 }
 
@@ -671,6 +673,29 @@ static int piix4_dimm_hotplug(DeviceState *qdev, SysBusDevice *dev, int
     return 0;
 }
 
+void piix4_dimm_state_sync(PIIX4PMState *s)
+{
+    struct gpe_regs *g = &s->gperegs;
+    DimmState *slot = NULL;
+    uint32_t i, temp = 1;
+
+    for(i = 0; i < MAX_DIMMS; i++) {
+        slot = dimm_find_from_idx(i);
+        if (!slot)
+            break;
+        if (i % 8 == 0) {
+            temp = 1;
+            g->mems_sts[i / 8] = 0;
+        }
+        else
+            temp = temp << 1;
+        if (slot->populated) {
+            g->mems_sts[i / 8] |= temp;
+        }
+        slot->pending = false;
+    }
+}
+
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
 				PCIHotplugState state)
 {
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 16/21] acpi_piix4: Update dimm state on VM reboot
@ 2012-07-11 10:32   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:32 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

in case of hot-remove or hot-add failure, the dimm bitmaps in qemu and Seabios
are inconsistent with the true state of the DIMM devices. The "populated" field
of the DimmState reflects the true state of the device. This inconsistency means
that a failed operation cannot be retried.

Ths patch updates the bit array to the true state of the dimms on VM reboot.
This allows retry of failed hot-add or hot-remove operations after a reboot.

Retrying a failed hot operation is not yet possible before reboot (the following
patch removes this limitation for guests with _OST acpi support)

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hw/acpi_piix4.c |   25 +++++++++++++++++++++++++
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index d8e2c22..ebc5de7 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -91,6 +91,7 @@ typedef struct PIIX4PMState {
 } PIIX4PMState;
 
 static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s);
+static void piix4_dimm_state_sync(PIIX4PMState *s);
 
 #define ACPI_ENABLE 0xf1
 #define ACPI_DISABLE 0xf0
@@ -369,6 +370,7 @@ static void piix4_reset(void *opaque)
         /* Mark SMM as already inited (until KVM supports SMM). */
         pci_conf[0x5B] = 0x02;
     }
+    piix4_dimm_state_sync(s);
     piix4_update_hotplug(s);
 }
 
@@ -671,6 +673,29 @@ static int piix4_dimm_hotplug(DeviceState *qdev, SysBusDevice *dev, int
     return 0;
 }
 
+void piix4_dimm_state_sync(PIIX4PMState *s)
+{
+    struct gpe_regs *g = &s->gperegs;
+    DimmState *slot = NULL;
+    uint32_t i, temp = 1;
+
+    for(i = 0; i < MAX_DIMMS; i++) {
+        slot = dimm_find_from_idx(i);
+        if (!slot)
+            break;
+        if (i % 8 == 0) {
+            temp = 1;
+            g->mems_sts[i / 8] = 0;
+        }
+        else
+            temp = temp << 1;
+        if (slot->populated) {
+            g->mems_sts[i / 8] |= temp;
+        }
+        slot->pending = false;
+    }
+}
+
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
 				PCIHotplugState state)
 {
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 17/21][SeaBIOS] acpi_dsdt: Revert internal dimm state on _OST failure
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:32   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:32 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

This reverts bitmap state in the case of a failed hot operation, in order to
allow retry of failed hot operations

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 src/acpi-dsdt.dsl |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 1c253ca..0d37bbc 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -832,6 +832,8 @@ DefinitionBlock (
                     Switch(And(Arg1, 0xFF)) {
                         Case(0x1) {
                             Store(Arg2, MEF)
+                            // Revert MEON flag for this memory device to one
+                            Store(One, Index(MEON, Arg2))
                         }
                     }
                 }
@@ -843,6 +845,8 @@ DefinitionBlock (
                         }
                         Case(0x1) {
                             Store(Arg2, MIF)
+                            // Revert MEON flag for this memory device to zero
+                            Store(Zero, Index(MEON, Arg2))
                         }
                     }
                 }
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 17/21][SeaBIOS] acpi_dsdt: Revert internal dimm state on _OST failure
@ 2012-07-11 10:32   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:32 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

This reverts bitmap state in the case of a failed hot operation, in order to
allow retry of failed hot operations

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 src/acpi-dsdt.dsl |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 1c253ca..0d37bbc 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -832,6 +832,8 @@ DefinitionBlock (
                     Switch(And(Arg1, 0xFF)) {
                         Case(0x1) {
                             Store(Arg2, MEF)
+                            // Revert MEON flag for this memory device to one
+                            Store(One, Index(MEON, Arg2))
                         }
                     }
                 }
@@ -843,6 +845,8 @@ DefinitionBlock (
                         }
                         Case(0x1) {
                             Store(Arg2, MIF)
+                            // Revert MEON flag for this memory device to zero
+                            Store(Zero, Index(MEON, Arg2))
                         }
                     }
                 }
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 18/21] acpi_piix4: Update dimm bitmap state on hot-remove fail
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:32   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:32 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

This allows failed hot operations to be retried at anytime. This only
works for guests that use _OST notification. Other guests cannot retry failed
hot operations on same devices until after reboot.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hw/acpi_piix4.c |   20 +++++++++++++++++++-
 hw/dimm.c       |   16 +++++++++++++++-
 hw/dimm.h       |    2 +-
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index ebc5de7..db631cc 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -599,6 +599,7 @@ static uint32_t pcirmv_read(void *opaque, uint32_t addr)
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
                                 PCIHotplugState state);
 static int piix4_dimm_hotplug(DeviceState *qdev, SysBusDevice *dev, int add);
+static int piix4_dimm_revert(DeviceState *qdev, SysBusDevice *dev, int add);
 
 static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
 {
@@ -627,7 +628,7 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
     }
 
     pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
-    dimm_register_hotplug(piix4_dimm_hotplug, &s->dev.qdev);
+    dimm_register_hotplug(piix4_dimm_hotplug, piix4_dimm_revert, &s->dev.qdev);
 }
 
 static void enable_device(PIIX4PMState *s, int slot)
@@ -696,6 +697,23 @@ void piix4_dimm_state_sync(PIIX4PMState *s)
     }
 }
 
+static int piix4_dimm_revert(DeviceState *qdev, SysBusDevice *dev, int add)
+{
+    PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, qdev);
+    PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, pci_dev);
+    struct gpe_regs *g = &s->gperegs;
+    DimmState *slot = DIMM(dev);
+    int idx = slot->idx;
+
+    if (add) {
+        g->mems_sts[idx/8] &= ~(1 << (idx%8));
+    }
+    else {
+        g->mems_sts[idx/8] |= (1 << (idx%8));
+    }
+    return 0;
+}
+
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
 				PCIHotplugState state)
 {
diff --git a/hw/dimm.c b/hw/dimm.c
index ba104cc..2115567 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -25,6 +25,7 @@
 
 static DeviceState *dimm_hotplug_qdev;
 static dimm_hotplug_fn dimm_hotplug;
+static dimm_hotplug_fn dimm_revert;
 static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;
 static QTAILQ_HEAD(dimm_hp_result_head, dimm_hp_result)  dimm_hp_result_queue;
 
@@ -77,10 +78,12 @@ DimmState *dimm_create(char *id, uint64_t size, uint64_t node, uint32_t
     return mdev;
 }
 
-void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev)
+void dimm_register_hotplug(dimm_hotplug_fn hotplug, dimm_hotplug_fn revert,
+        DeviceState *qdev)
 {
     dimm_hotplug_qdev = qdev;
     dimm_hotplug = hotplug;
+    dimm_revert = revert;
     dimm_scan_populated();
 }
 
@@ -211,10 +214,20 @@ void dimm_notify(uint32_t idx, uint32_t event)
             s->pending = false;
             break;
         case DIMM_REMOVE_FAIL:
+            QTAILQ_INSERT_TAIL(&dimm_hp_result_queue, result, next);
+            s->pending = false;
+            if (dimm_revert)
+                dimm_revert(dimm_hotplug_qdev, (SysBusDevice*)s, 0);
+            break;
         case DIMM_ADD_SUCCESS:
+            QTAILQ_INSERT_TAIL(&dimm_hp_result_queue, result, next);
+            s->pending = false;
+            break;
         case DIMM_ADD_FAIL:
             QTAILQ_INSERT_TAIL(&dimm_hp_result_queue, result, next);
             s->pending = false;
+            if (dimm_revert)
+                dimm_revert(dimm_hotplug_qdev, (SysBusDevice*)s, 1);
             break;
         default:
             g_free(result);
@@ -288,6 +301,7 @@ static void dimm_class_init(ObjectClass *klass, void *data)
     dc->props = dimm_properties;
     sc->init = dimm_init;
     dimm_hotplug = NULL;
+    dimm_revert = NULL;
     QTAILQ_INIT(&dimmlist);
     QTAILQ_INIT(&dimm_hp_result_queue);
 }
diff --git a/hw/dimm.h b/hw/dimm.h
index 0fa6137..b563e3f 100644
--- a/hw/dimm.h
+++ b/hw/dimm.h
@@ -54,7 +54,7 @@ void dimm_depopulate(DimmState *s);
 int dimm_do(Monitor *mon, const QDict *qdict, bool add);
 DimmState *dimm_find_from_idx(uint32_t idx);
 DimmState *dimm_find_from_name(char *id);
-void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev);
+void dimm_register_hotplug(dimm_hotplug_fn hotplug, dimm_hotplug_fn revert, DeviceState *qdev);
 void dimm_calc_offsets(dimm_calcoffset_fn calcfn);
 void dimm_activate(DimmState *slot);
 void dimm_deactivate(DimmState *slot);
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 18/21] acpi_piix4: Update dimm bitmap state on hot-remove fail
@ 2012-07-11 10:32   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:32 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

This allows failed hot operations to be retried at anytime. This only
works for guests that use _OST notification. Other guests cannot retry failed
hot operations on same devices until after reboot.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hw/acpi_piix4.c |   20 +++++++++++++++++++-
 hw/dimm.c       |   16 +++++++++++++++-
 hw/dimm.h       |    2 +-
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index ebc5de7..db631cc 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -599,6 +599,7 @@ static uint32_t pcirmv_read(void *opaque, uint32_t addr)
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
                                 PCIHotplugState state);
 static int piix4_dimm_hotplug(DeviceState *qdev, SysBusDevice *dev, int add);
+static int piix4_dimm_revert(DeviceState *qdev, SysBusDevice *dev, int add);
 
 static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
 {
@@ -627,7 +628,7 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
     }
 
     pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
-    dimm_register_hotplug(piix4_dimm_hotplug, &s->dev.qdev);
+    dimm_register_hotplug(piix4_dimm_hotplug, piix4_dimm_revert, &s->dev.qdev);
 }
 
 static void enable_device(PIIX4PMState *s, int slot)
@@ -696,6 +697,23 @@ void piix4_dimm_state_sync(PIIX4PMState *s)
     }
 }
 
+static int piix4_dimm_revert(DeviceState *qdev, SysBusDevice *dev, int add)
+{
+    PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, qdev);
+    PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, pci_dev);
+    struct gpe_regs *g = &s->gperegs;
+    DimmState *slot = DIMM(dev);
+    int idx = slot->idx;
+
+    if (add) {
+        g->mems_sts[idx/8] &= ~(1 << (idx%8));
+    }
+    else {
+        g->mems_sts[idx/8] |= (1 << (idx%8));
+    }
+    return 0;
+}
+
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
 				PCIHotplugState state)
 {
diff --git a/hw/dimm.c b/hw/dimm.c
index ba104cc..2115567 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -25,6 +25,7 @@
 
 static DeviceState *dimm_hotplug_qdev;
 static dimm_hotplug_fn dimm_hotplug;
+static dimm_hotplug_fn dimm_revert;
 static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;
 static QTAILQ_HEAD(dimm_hp_result_head, dimm_hp_result)  dimm_hp_result_queue;
 
@@ -77,10 +78,12 @@ DimmState *dimm_create(char *id, uint64_t size, uint64_t node, uint32_t
     return mdev;
 }
 
-void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev)
+void dimm_register_hotplug(dimm_hotplug_fn hotplug, dimm_hotplug_fn revert,
+        DeviceState *qdev)
 {
     dimm_hotplug_qdev = qdev;
     dimm_hotplug = hotplug;
+    dimm_revert = revert;
     dimm_scan_populated();
 }
 
@@ -211,10 +214,20 @@ void dimm_notify(uint32_t idx, uint32_t event)
             s->pending = false;
             break;
         case DIMM_REMOVE_FAIL:
+            QTAILQ_INSERT_TAIL(&dimm_hp_result_queue, result, next);
+            s->pending = false;
+            if (dimm_revert)
+                dimm_revert(dimm_hotplug_qdev, (SysBusDevice*)s, 0);
+            break;
         case DIMM_ADD_SUCCESS:
+            QTAILQ_INSERT_TAIL(&dimm_hp_result_queue, result, next);
+            s->pending = false;
+            break;
         case DIMM_ADD_FAIL:
             QTAILQ_INSERT_TAIL(&dimm_hp_result_queue, result, next);
             s->pending = false;
+            if (dimm_revert)
+                dimm_revert(dimm_hotplug_qdev, (SysBusDevice*)s, 1);
             break;
         default:
             g_free(result);
@@ -288,6 +301,7 @@ static void dimm_class_init(ObjectClass *klass, void *data)
     dc->props = dimm_properties;
     sc->init = dimm_init;
     dimm_hotplug = NULL;
+    dimm_revert = NULL;
     QTAILQ_INIT(&dimmlist);
     QTAILQ_INIT(&dimm_hp_result_queue);
 }
diff --git a/hw/dimm.h b/hw/dimm.h
index 0fa6137..b563e3f 100644
--- a/hw/dimm.h
+++ b/hw/dimm.h
@@ -54,7 +54,7 @@ void dimm_depopulate(DimmState *s);
 int dimm_do(Monitor *mon, const QDict *qdict, bool add);
 DimmState *dimm_find_from_idx(uint32_t idx);
 DimmState *dimm_find_from_name(char *id);
-void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev);
+void dimm_register_hotplug(dimm_hotplug_fn hotplug, dimm_hotplug_fn revert, DeviceState *qdev);
 void dimm_calc_offsets(dimm_calcoffset_fn calcfn);
 void dimm_activate(DimmState *slot);
 void dimm_deactivate(DimmState *slot);
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 19/21] Implement "info memtotal" and "query-memtotal"
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:32   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:32 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

Returns total memory of guest in bytes, including hotplugged memory.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hmp-commands.hx  |    2 ++
 hmp.c            |    7 +++++++
 hmp.h            |    1 +
 hw/dimm.c        |   15 +++++++++++++++
 monitor.c        |    7 +++++++
 qapi-schema.json |   12 ++++++++++++
 qmp-commands.hx  |   20 ++++++++++++++++++++
 7 files changed, 64 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 3172cde..016062e 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1461,6 +1461,8 @@ show qdev device model list
 show roms
 @item info memhp
 show memhp
+@item info memtotal
+show memtotal
 @end table
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index ec25d9a..8f89c7d 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1017,3 +1017,10 @@ void hmp_info_memhp(Monitor *mon)
 
     qapi_free_MemHpInfoList(info);
 }
+
+void hmp_info_memtotal(Monitor *mon)
+{
+    uint64_t ram_total;
+    ram_total = (uint64_t)qmp_query_memtotal(NULL);
+    monitor_printf(mon, "MemTotal: %lu \n", ram_total);
+}
diff --git a/hmp.h b/hmp.h
index 971e7c4..d6e715e 100644
--- a/hmp.h
+++ b/hmp.h
@@ -65,5 +65,6 @@ void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict);
 void hmp_netdev_add(Monitor *mon, const QDict *qdict);
 void hmp_netdev_del(Monitor *mon, const QDict *qdict);
 void hmp_info_memhp(Monitor *mon);
+void hmp_info_memtotal(Monitor *mon);
 
 #endif
diff --git a/hw/dimm.c b/hw/dimm.c
index 6e324d3..b544173 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -28,6 +28,7 @@ static dimm_hotplug_fn dimm_hotplug;
 static dimm_hotplug_fn dimm_revert;
 static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;
 static QTAILQ_HEAD(dimm_hp_result_head, dimm_hp_result)  dimm_hp_result_queue;
+extern ram_addr_t ram_size;
 
 static Property dimm_properties[] = {
     DEFINE_PROP_END_OF_LIST()
@@ -292,6 +293,20 @@ MemHpInfoList *qmp_query_memhp(Error **errp)
 
     return head;
 }
+
+int64_t qmp_query_memtotal(Error **errp)
+{
+    DimmState *slot;
+    uint64_t info = ram_size;
+
+    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
+        if (slot->populated) {
+            info += slot->size;
+        }
+    }
+    return (int64_t)info;
+}
+
 static int dimm_init(SysBusDevice *s)
 {
     DimmState *slot;
diff --git a/monitor.c b/monitor.c
index 4a14e26..1dd646c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2739,6 +2739,13 @@ static mon_cmd_t info_cmds[] = {
         .mhandler.info = hmp_info_memhp,
     },
     {
+        .name       = "memtotal",
+        .args_type  = "",
+        .params     = "",
+        .help       = "show total memory size",
+        .mhandler.info = hmp_info_memtotal,
+    },
+    {
         .name       = NULL,
     },
 };
diff --git a/qapi-schema.json b/qapi-schema.json
index 049f6f9..5bbf2c0 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1888,3 +1888,15 @@
 # Since: 1.1.3
 ##
 { 'command': 'query-memhp', 'returns': ['MemHpInfo'] }
+
+##
+# @query-memtotal:
+#
+# Returns total memory in bytes, including hotplugged dimms
+#
+# Returns: a l
+#
+# Since: 1.2
+##
+{ 'command': 'query-memtotal', 'returns': 'int' }
+
diff --git a/qmp-commands.hx b/qmp-commands.hx
index cd1d5f0..6c71696 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2286,3 +2286,23 @@ Example:
    }
 
 EQMP
+
+    {
+        .name       = "query-memtotal",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_input_query_memtotal
+    },
+SQMP
+query-memtotal
+----------
+
+Return total memory in bytes, including hotplugged dimms
+
+Example:
+
+-> { "execute": "query-memtotal" }
+<- {
+      "return": 1073741824
+   }
+
+EQMP
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 19/21] Implement "info memtotal" and "query-memtotal"
@ 2012-07-11 10:32   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:32 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

Returns total memory of guest in bytes, including hotplugged memory.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hmp-commands.hx  |    2 ++
 hmp.c            |    7 +++++++
 hmp.h            |    1 +
 hw/dimm.c        |   15 +++++++++++++++
 monitor.c        |    7 +++++++
 qapi-schema.json |   12 ++++++++++++
 qmp-commands.hx  |   20 ++++++++++++++++++++
 7 files changed, 64 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 3172cde..016062e 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1461,6 +1461,8 @@ show qdev device model list
 show roms
 @item info memhp
 show memhp
+@item info memtotal
+show memtotal
 @end table
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index ec25d9a..8f89c7d 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1017,3 +1017,10 @@ void hmp_info_memhp(Monitor *mon)
 
     qapi_free_MemHpInfoList(info);
 }
+
+void hmp_info_memtotal(Monitor *mon)
+{
+    uint64_t ram_total;
+    ram_total = (uint64_t)qmp_query_memtotal(NULL);
+    monitor_printf(mon, "MemTotal: %lu \n", ram_total);
+}
diff --git a/hmp.h b/hmp.h
index 971e7c4..d6e715e 100644
--- a/hmp.h
+++ b/hmp.h
@@ -65,5 +65,6 @@ void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict);
 void hmp_netdev_add(Monitor *mon, const QDict *qdict);
 void hmp_netdev_del(Monitor *mon, const QDict *qdict);
 void hmp_info_memhp(Monitor *mon);
+void hmp_info_memtotal(Monitor *mon);
 
 #endif
diff --git a/hw/dimm.c b/hw/dimm.c
index 6e324d3..b544173 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -28,6 +28,7 @@ static dimm_hotplug_fn dimm_hotplug;
 static dimm_hotplug_fn dimm_revert;
 static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;
 static QTAILQ_HEAD(dimm_hp_result_head, dimm_hp_result)  dimm_hp_result_queue;
+extern ram_addr_t ram_size;
 
 static Property dimm_properties[] = {
     DEFINE_PROP_END_OF_LIST()
@@ -292,6 +293,20 @@ MemHpInfoList *qmp_query_memhp(Error **errp)
 
     return head;
 }
+
+int64_t qmp_query_memtotal(Error **errp)
+{
+    DimmState *slot;
+    uint64_t info = ram_size;
+
+    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
+        if (slot->populated) {
+            info += slot->size;
+        }
+    }
+    return (int64_t)info;
+}
+
 static int dimm_init(SysBusDevice *s)
 {
     DimmState *slot;
diff --git a/monitor.c b/monitor.c
index 4a14e26..1dd646c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2739,6 +2739,13 @@ static mon_cmd_t info_cmds[] = {
         .mhandler.info = hmp_info_memhp,
     },
     {
+        .name       = "memtotal",
+        .args_type  = "",
+        .params     = "",
+        .help       = "show total memory size",
+        .mhandler.info = hmp_info_memtotal,
+    },
+    {
         .name       = NULL,
     },
 };
diff --git a/qapi-schema.json b/qapi-schema.json
index 049f6f9..5bbf2c0 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1888,3 +1888,15 @@
 # Since: 1.1.3
 ##
 { 'command': 'query-memhp', 'returns': ['MemHpInfo'] }
+
+##
+# @query-memtotal:
+#
+# Returns total memory in bytes, including hotplugged dimms
+#
+# Returns: a l
+#
+# Since: 1.2
+##
+{ 'command': 'query-memtotal', 'returns': 'int' }
+
diff --git a/qmp-commands.hx b/qmp-commands.hx
index cd1d5f0..6c71696 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2286,3 +2286,23 @@ Example:
    }
 
 EQMP
+
+    {
+        .name       = "query-memtotal",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_input_query_memtotal
+    },
+SQMP
+query-memtotal
+----------
+
+Return total memory in bytes, including hotplugged dimms
+
+Example:
+
+-> { "execute": "query-memtotal" }
+<- {
+      "return": 1073741824
+   }
+
+EQMP
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 20/21] Implement -dimms, -dimmspop command line options
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:32   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:32 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

Implement batch dimm creation command line options. These could be useful for
not bloating the command line with a large number of dimms.

syntax: -dimms pfx=poolid,size=sz,num=n
Will create numdimms dimms with ids poolid0, ..., poolidn-1. Each dimm has a
size of sz.
    
Implement -dimmpop option to populate dimms at bootup
syntax: -dimmpop pfx=poolid,num=n
This will populate n dimms with ids poolid0, ..., poolidn-1.

(live-migration could break here without patch 12/21: -dimmspop
needs to be reworked to support populating of individual dimms with
same prefix, and not only a range of dimms starting from 0)

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hw/dimm.c       |    9 ++++++
 hw/dimm.h       |    2 +-
 qemu-config.c   |   45 ++++++++++++++++++++++++++++
 qemu-options.hx |   10 ++++++
 vl.c            |   86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 150 insertions(+), 2 deletions(-)

diff --git a/hw/dimm.c b/hw/dimm.c
index 2115567..6e324d3 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -187,6 +187,15 @@ void dimm_calc_offsets(dimm_calcoffset_fn calcfn)
     }
 }
 
+int dimm_set_populated(DimmState *s)
+{
+    if (s) {
+        s->populated = true;
+        return 0;
+    }    
+    else return -1;
+}
+
 /* used to populate and activate dimms at boot time */
 void dimm_scan_populated(void)
 {
diff --git a/hw/dimm.h b/hw/dimm.h
index b563e3f..0fdf59b 100644
--- a/hw/dimm.h
+++ b/hw/dimm.h
@@ -60,6 +60,6 @@ void dimm_activate(DimmState *slot);
 void dimm_deactivate(DimmState *slot);
 void dimm_scan_populated(void);
 void dimm_notify(uint32_t idx, uint32_t event);
-
+int dimm_set_populated(DimmState *s);
 
 #endif
diff --git a/qemu-config.c b/qemu-config.c
index 4abc31b..7f63186 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -650,6 +650,49 @@ static QemuOptsList qemu_dimm_opts = {
         { /* end of list */ }
     },
 };
+
+static QemuOptsList qemu_dimms_opts = {
+    .name = "dimms",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_dimms_opts.head),
+    .desc = {
+        {
+            .name = "pfx",
+            .type = QEMU_OPT_STRING,
+            .help = "prefix of ids for these dimm devices",
+        },{
+            .name = "size",
+            .type = QEMU_OPT_SIZE,
+            .help = "memory size for these dimm",
+        },{
+            .name = "num",
+            .type = QEMU_OPT_NUMBER,
+            .help = "number of dimm devices in this pool",
+        },{
+            .name = "node",
+            .type = QEMU_OPT_NUMBER,
+            .help = "NUMA node number (i.e. proximity) for these dimms",
+        },
+        { /* end of list */ }
+    },
+};
+
+static QemuOptsList qemu_dimmspop_opts = {
+    .name = "dimmspop",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_dimmspop_opts.head),
+    .desc = {
+        {
+            .name = "pfx",
+            .type = QEMU_OPT_STRING,
+            .help = "pool prefix for this dimm device",
+        },{
+            .name = "num",
+            .type = QEMU_OPT_SIZE,
+            .help = "number of dimm devices to populate",
+        },
+        { /* end of list */ }
+    },
+};
+
 static QemuOptsList *vm_config_groups[32] = {
     &qemu_drive_opts,
     &qemu_chardev_opts,
@@ -666,6 +709,8 @@ static QemuOptsList *vm_config_groups[32] = {
     &qemu_boot_opts,
     &qemu_iscsi_opts,
     &qemu_dimm_opts,
+    &qemu_dimms_opts,
+    &qemu_dimmspop_opts,
     NULL,
 };
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 61909f7..0a9326e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2752,3 +2752,13 @@ DEF("dimm", HAS_ARG, QEMU_OPTION_dimm,
         "-dimm id=dimmid,size=sz,node=nd,populated=on|off\n"
         "specify memory dimm device with name dimmid, size sz on node nd",
         QEMU_ARCH_ALL)
+
+DEF("dimms", HAS_ARG, QEMU_OPTION_dimms,
+        "-dimms pfx=id,size=sz,node=nd\n"
+        "specify pool of num memory dimm devices of size sz each on node nd",
+        QEMU_ARCH_ALL)
+
+DEF("dimmspop", HAS_ARG, QEMU_OPTION_dimmspop,
+        "-dimmspop pfx=id,num=n\n"
+        "populate n dimms of pool id (dimms with ids id0,...,idn-1) at system startup",
+        QEMU_ARCH_ALL)
diff --git a/vl.c b/vl.c
index efe915e..37752be 100644
--- a/vl.c
+++ b/vl.c
@@ -538,6 +538,65 @@ static void configure_dimm(QemuOpts *opts)
     nb_hp_dimms++;
 }
 
+static void configure_dimms(QemuOpts *opts)
+{
+    const char *value, *pfx, *id;
+    uint64_t size, node;
+    int num, dimm;
+    char buf[32];
+
+    id = qemu_opts_id(opts);
+    value = qemu_opt_get(opts, "pfx");
+    if (!value) {
+        fprintf(stderr, "qemu: invalid prefix for dimm pool '%s'\n", id);
+        exit(1);
+    }
+    pfx = value;
+
+    size = qemu_opt_get_size(opts, "size", DEFAULT_DIMMSIZE);
+    num = qemu_opt_get_number(opts, "num", 1);
+    node = qemu_opt_get_number(opts, "node", 0);
+
+    for (dimm = 0; dimm < num; dimm++) {
+        if (nb_hp_dimms == MAX_DIMMS) {
+            fprintf(stderr, "qemu: maximum number of DIMMs (%d) exceeded\n",
+                    MAX_DIMMS);
+            exit(1);
+        }
+        sprintf(buf, "%s%d", pfx, dimm);
+        dimm_create(g_strdup(buf), size, node, nb_hp_dimms, false);
+        nb_hp_dimms++;
+    }
+}
+
+/* populate dimms at startup */ 
+static void configure_dimmspop(QemuOpts *opts)
+{
+    const char *value, *pfx, *id;
+    int num, dimm;
+    char buf[32];
+
+    id = qemu_opts_id(opts);
+    value = qemu_opt_get(opts, "pfx");
+    if (!value) {
+        fprintf(stderr, "qemu: invalid prefix for dimm pool '%s'\n", id);
+        exit(1);
+    }
+    pfx = value;
+    value = qemu_opt_get(opts, "num");
+    if (!value) {
+        fprintf(stderr, "qemu: number not defined for dimm pool '%s'\n", pfx);
+        exit(1);
+    }
+    else num = atoi(value);
+    for (dimm = 0; dimm < num; dimm++) {
+        sprintf(buf, "%s%d", pfx, dimm);
+        if (dimm_set_populated(dimm_find_from_name(buf)) < 0) {
+            fprintf(stderr, "qemu: dimm %s not defined for dimm pool '%s'\n",
+                    buf, pfx);
+        }
+    }
+}
 static void configure_rtc(QemuOpts *opts)
 {
     const char *value;
@@ -2293,7 +2352,9 @@ int main(int argc, char **argv, char **envp)
     int cyls, heads, secs, translation;
     QemuOpts *hda_opts = NULL, *opts, *machine_opts;
     QemuOpts *dimm_opts[MAX_DIMMS];
-    int nb_dimm_opts = 0;
+    QemuOpts *dimms_opts[MAX_DIMMS];
+    QemuOpts *dimmspop_opts[MAX_DIMMS];
+    int nb_dimm_opts = 0, nb_dimms_opts = 0, nb_dimmspop_opts = 0;
     QemuOptsList *olist;
     int optind;
     const char *optarg;
@@ -3233,6 +3294,22 @@ int main(int argc, char **argv, char **envp)
                 }
                 nb_dimm_opts++;
                 break;
+            case QEMU_OPTION_dimms:
+                dimms_opts[nb_dimms_opts] =
+                    qemu_opts_parse(qemu_find_opts("dimms"), optarg, 0);
+                if (!dimms_opts[nb_dimms_opts]) {
+                    exit(1);
+                }
+                nb_dimms_opts++;
+                break;
+            case QEMU_OPTION_dimmspop:
+                dimmspop_opts[nb_dimmspop_opts] =
+                    qemu_opts_parse(qemu_find_opts("dimmspop"), optarg, 0);
+                if (!dimmspop_opts[nb_dimmspop_opts]) {
+                    exit(1);
+                }
+                nb_dimmspop_opts++;
+                break;
             default:
                 os_parse_cmd_args(popt->index, optarg);
             }
@@ -3552,6 +3629,13 @@ int main(int argc, char **argv, char **envp)
 
     for (i = 0; i < nb_dimm_opts; i++)
         configure_dimm(dimm_opts[i]);
+
+    for (i = 0; i < nb_dimms_opts; i++)
+        configure_dimms(dimms_opts[i]);
+
+    for (i = 0; i < nb_dimmspop_opts; i++)
+        configure_dimmspop(dimmspop_opts[i]);
+
     qdev_machine_init();
 
     machine->init(ram_size, boot_devices,
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 20/21] Implement -dimms, -dimmspop command line options
@ 2012-07-11 10:32   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:32 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

Implement batch dimm creation command line options. These could be useful for
not bloating the command line with a large number of dimms.

syntax: -dimms pfx=poolid,size=sz,num=n
Will create numdimms dimms with ids poolid0, ..., poolidn-1. Each dimm has a
size of sz.
    
Implement -dimmpop option to populate dimms at bootup
syntax: -dimmpop pfx=poolid,num=n
This will populate n dimms with ids poolid0, ..., poolidn-1.

(live-migration could break here without patch 12/21: -dimmspop
needs to be reworked to support populating of individual dimms with
same prefix, and not only a range of dimms starting from 0)

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hw/dimm.c       |    9 ++++++
 hw/dimm.h       |    2 +-
 qemu-config.c   |   45 ++++++++++++++++++++++++++++
 qemu-options.hx |   10 ++++++
 vl.c            |   86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 150 insertions(+), 2 deletions(-)

diff --git a/hw/dimm.c b/hw/dimm.c
index 2115567..6e324d3 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -187,6 +187,15 @@ void dimm_calc_offsets(dimm_calcoffset_fn calcfn)
     }
 }
 
+int dimm_set_populated(DimmState *s)
+{
+    if (s) {
+        s->populated = true;
+        return 0;
+    }    
+    else return -1;
+}
+
 /* used to populate and activate dimms at boot time */
 void dimm_scan_populated(void)
 {
diff --git a/hw/dimm.h b/hw/dimm.h
index b563e3f..0fdf59b 100644
--- a/hw/dimm.h
+++ b/hw/dimm.h
@@ -60,6 +60,6 @@ void dimm_activate(DimmState *slot);
 void dimm_deactivate(DimmState *slot);
 void dimm_scan_populated(void);
 void dimm_notify(uint32_t idx, uint32_t event);
-
+int dimm_set_populated(DimmState *s);
 
 #endif
diff --git a/qemu-config.c b/qemu-config.c
index 4abc31b..7f63186 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -650,6 +650,49 @@ static QemuOptsList qemu_dimm_opts = {
         { /* end of list */ }
     },
 };
+
+static QemuOptsList qemu_dimms_opts = {
+    .name = "dimms",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_dimms_opts.head),
+    .desc = {
+        {
+            .name = "pfx",
+            .type = QEMU_OPT_STRING,
+            .help = "prefix of ids for these dimm devices",
+        },{
+            .name = "size",
+            .type = QEMU_OPT_SIZE,
+            .help = "memory size for these dimm",
+        },{
+            .name = "num",
+            .type = QEMU_OPT_NUMBER,
+            .help = "number of dimm devices in this pool",
+        },{
+            .name = "node",
+            .type = QEMU_OPT_NUMBER,
+            .help = "NUMA node number (i.e. proximity) for these dimms",
+        },
+        { /* end of list */ }
+    },
+};
+
+static QemuOptsList qemu_dimmspop_opts = {
+    .name = "dimmspop",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_dimmspop_opts.head),
+    .desc = {
+        {
+            .name = "pfx",
+            .type = QEMU_OPT_STRING,
+            .help = "pool prefix for this dimm device",
+        },{
+            .name = "num",
+            .type = QEMU_OPT_SIZE,
+            .help = "number of dimm devices to populate",
+        },
+        { /* end of list */ }
+    },
+};
+
 static QemuOptsList *vm_config_groups[32] = {
     &qemu_drive_opts,
     &qemu_chardev_opts,
@@ -666,6 +709,8 @@ static QemuOptsList *vm_config_groups[32] = {
     &qemu_boot_opts,
     &qemu_iscsi_opts,
     &qemu_dimm_opts,
+    &qemu_dimms_opts,
+    &qemu_dimmspop_opts,
     NULL,
 };
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 61909f7..0a9326e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2752,3 +2752,13 @@ DEF("dimm", HAS_ARG, QEMU_OPTION_dimm,
         "-dimm id=dimmid,size=sz,node=nd,populated=on|off\n"
         "specify memory dimm device with name dimmid, size sz on node nd",
         QEMU_ARCH_ALL)
+
+DEF("dimms", HAS_ARG, QEMU_OPTION_dimms,
+        "-dimms pfx=id,size=sz,node=nd\n"
+        "specify pool of num memory dimm devices of size sz each on node nd",
+        QEMU_ARCH_ALL)
+
+DEF("dimmspop", HAS_ARG, QEMU_OPTION_dimmspop,
+        "-dimmspop pfx=id,num=n\n"
+        "populate n dimms of pool id (dimms with ids id0,...,idn-1) at system startup",
+        QEMU_ARCH_ALL)
diff --git a/vl.c b/vl.c
index efe915e..37752be 100644
--- a/vl.c
+++ b/vl.c
@@ -538,6 +538,65 @@ static void configure_dimm(QemuOpts *opts)
     nb_hp_dimms++;
 }
 
+static void configure_dimms(QemuOpts *opts)
+{
+    const char *value, *pfx, *id;
+    uint64_t size, node;
+    int num, dimm;
+    char buf[32];
+
+    id = qemu_opts_id(opts);
+    value = qemu_opt_get(opts, "pfx");
+    if (!value) {
+        fprintf(stderr, "qemu: invalid prefix for dimm pool '%s'\n", id);
+        exit(1);
+    }
+    pfx = value;
+
+    size = qemu_opt_get_size(opts, "size", DEFAULT_DIMMSIZE);
+    num = qemu_opt_get_number(opts, "num", 1);
+    node = qemu_opt_get_number(opts, "node", 0);
+
+    for (dimm = 0; dimm < num; dimm++) {
+        if (nb_hp_dimms == MAX_DIMMS) {
+            fprintf(stderr, "qemu: maximum number of DIMMs (%d) exceeded\n",
+                    MAX_DIMMS);
+            exit(1);
+        }
+        sprintf(buf, "%s%d", pfx, dimm);
+        dimm_create(g_strdup(buf), size, node, nb_hp_dimms, false);
+        nb_hp_dimms++;
+    }
+}
+
+/* populate dimms at startup */ 
+static void configure_dimmspop(QemuOpts *opts)
+{
+    const char *value, *pfx, *id;
+    int num, dimm;
+    char buf[32];
+
+    id = qemu_opts_id(opts);
+    value = qemu_opt_get(opts, "pfx");
+    if (!value) {
+        fprintf(stderr, "qemu: invalid prefix for dimm pool '%s'\n", id);
+        exit(1);
+    }
+    pfx = value;
+    value = qemu_opt_get(opts, "num");
+    if (!value) {
+        fprintf(stderr, "qemu: number not defined for dimm pool '%s'\n", pfx);
+        exit(1);
+    }
+    else num = atoi(value);
+    for (dimm = 0; dimm < num; dimm++) {
+        sprintf(buf, "%s%d", pfx, dimm);
+        if (dimm_set_populated(dimm_find_from_name(buf)) < 0) {
+            fprintf(stderr, "qemu: dimm %s not defined for dimm pool '%s'\n",
+                    buf, pfx);
+        }
+    }
+}
 static void configure_rtc(QemuOpts *opts)
 {
     const char *value;
@@ -2293,7 +2352,9 @@ int main(int argc, char **argv, char **envp)
     int cyls, heads, secs, translation;
     QemuOpts *hda_opts = NULL, *opts, *machine_opts;
     QemuOpts *dimm_opts[MAX_DIMMS];
-    int nb_dimm_opts = 0;
+    QemuOpts *dimms_opts[MAX_DIMMS];
+    QemuOpts *dimmspop_opts[MAX_DIMMS];
+    int nb_dimm_opts = 0, nb_dimms_opts = 0, nb_dimmspop_opts = 0;
     QemuOptsList *olist;
     int optind;
     const char *optarg;
@@ -3233,6 +3294,22 @@ int main(int argc, char **argv, char **envp)
                 }
                 nb_dimm_opts++;
                 break;
+            case QEMU_OPTION_dimms:
+                dimms_opts[nb_dimms_opts] =
+                    qemu_opts_parse(qemu_find_opts("dimms"), optarg, 0);
+                if (!dimms_opts[nb_dimms_opts]) {
+                    exit(1);
+                }
+                nb_dimms_opts++;
+                break;
+            case QEMU_OPTION_dimmspop:
+                dimmspop_opts[nb_dimmspop_opts] =
+                    qemu_opts_parse(qemu_find_opts("dimmspop"), optarg, 0);
+                if (!dimmspop_opts[nb_dimmspop_opts]) {
+                    exit(1);
+                }
+                nb_dimmspop_opts++;
+                break;
             default:
                 os_parse_cmd_args(popt->index, optarg);
             }
@@ -3552,6 +3629,13 @@ int main(int argc, char **argv, char **envp)
 
     for (i = 0; i < nb_dimm_opts; i++)
         configure_dimm(dimm_opts[i]);
+
+    for (i = 0; i < nb_dimms_opts; i++)
+        configure_dimms(dimms_opts[i]);
+
+    for (i = 0; i < nb_dimmspop_opts; i++)
+        configure_dimmspop(dimmspop_opts[i]);
+
     qdev_machine_init();
 
     machine->init(ram_size, boot_devices,
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH v2 21/21] Implement mem_increase, mem_decrease hmp/qmp commands
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:32   ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:32 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: avi, anthony, gleb, imammedo, kevin, wency, Vasilis Liaskovitis

This implements batch monitor operations for hot-add and hot-remove. These are
probably better suited for a higher-level management layer, but are useful for
testing. Let me know if there is interest for such commands upstream.

syntax: mem_increase poolid num
will hotplug num dimms from pool poolid. This starts from lowest unpopulated
physical memory (dimm) and trying to cover any existing physical holes.

syntax: mem_decrease poolid num
will hot-unplug num dimms from pool poolid, This starts from highest populated
physical memory (dimm).

Respective qmp commands are "mem-increase", "mem-decrease".

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hmp-commands.hx |   31 ++++++++++++++++
 hw/dimm.c       |  104 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/dimm.h       |    7 ++++
 monitor.c       |   10 +++++
 monitor.h       |    2 +
 qmp-commands.hx |   40 +++++++++++++++++++++
 6 files changed, 194 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 016062e..e0c1cf4 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -648,6 +648,37 @@ STEXI
 
 Hot-add dimm.
 ETEXI
+    {
+        .name       = "mem_increase",
+        .args_type  = "pfx:s,num:s",
+        .params     = "pfx num",
+        .help       = "hot-plug num dimms of memory pool pfx",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_dimm_add_range,
+    },
+
+STEXI
+@item mem_increase @var{config}
+@findex mem_increase
+
+Hotplug dimms.
+ETEXI
+
+    {
+        .name       = "mem_decrease",
+        .args_type  = "pfx:s,num:s",
+        .params     = "pfx num",
+        .help       = "hot-unplug num dimms of memory pool pfx",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_dimm_del_range,
+    },
+
+STEXI
+@item mem_decrease @var{config}
+@findex mem_decrease
+
+Hot-unplug dimms.
+ETEXI
 
     {
         .name       = "device_del",
diff --git a/hw/dimm.c b/hw/dimm.c
index b544173..48542ba 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -166,6 +166,110 @@ int dimm_do(Monitor *mon, const QDict *qdict, bool add)
     return 0;
 }
 
+/* Find for dimm_do_range operation
+    DIMM_MIN_UNPOPULATED: used for finding next DIMM to hotplug
+    DIMM_MAX_POPULATED: used for finding next DIMM for hot-unplug
+ */
+
+DimmState *dimm_find_next(char *pfx, uint32_t mode)
+{
+    DeviceState *dev;
+    DimmState *slot, *ret;
+    const char *type;
+    uint32_t idx;
+
+    Error *err = NULL;
+    BusChild *kid;
+    BusState *bus = sysbus_get_default();
+    ret = NULL;
+
+    if (mode == DIMM_MIN_UNPOPULATED)
+        idx =  MAX_DIMMS;
+    else if (mode == DIMM_MAX_POPULATED)
+        idx = 0;
+    else
+        return false;
+
+    QTAILQ_FOREACH(kid, &bus->children, sibling) {
+        dev = kid->child;
+        type = object_property_get_str(OBJECT(dev), "type", &err);
+        if (err) {
+            error_free(err);
+            fprintf(stderr, "error getting device type\n");
+            exit(1);
+        }
+
+        if (!strcmp(type, "dimm")) {
+            slot = DIMM(dev);
+            if (strstr(dev->id, pfx) && strcmp(dev->id, pfx)) {
+                if (mode == DIMM_MIN_UNPOPULATED &&
+                        (slot->populated == false) &&
+                        (slot->pending == false) &&
+                        (idx > slot->idx)) {
+                    idx = slot->idx;
+                    ret = slot;
+                }
+                else if (mode == DIMM_MAX_POPULATED &&
+                        (slot->populated == true) &&
+                        (slot->pending == false) &&
+                        (idx <= slot->idx)) {
+                    idx = slot->idx;
+                    ret = slot;
+                }
+            }
+        }
+    }
+    return ret;
+}
+
+int dimm_do_range(Monitor *mon, const QDict *qdict, bool add)
+{
+    DimmState *slot = NULL;
+    uint32_t mode;
+    uint32_t idx;
+    int num, ndimms;
+
+    char *pfx = (char*) qdict_get_try_str(qdict, "pfx");
+    if (!pfx) {
+        fprintf(stderr, "ERROR %s invalid pfx\n",__FUNCTION__);
+        return 1;
+    }
+
+    char *value = (char*) qdict_get_try_str(qdict, "num");
+    if (!value) {
+        fprintf(stderr, "ERROR %s invalid pfx\n",__FUNCTION__);
+        return 1;
+    }
+    num = atoi(value);
+
+    if (add)
+        mode = DIMM_MIN_UNPOPULATED;
+    else
+        mode = DIMM_MAX_POPULATED;
+
+    ndimms = 0;
+    while (ndimms < num) {
+        slot = dimm_find_next(pfx, mode);
+        if (slot == NULL) {
+            fprintf(stderr, "%s no further slot found for pool %s\n",
+                    __FUNCTION__, pfx);
+            fprintf(stderr, "%s operated on %d / %d requested dimms\n",
+                    __FUNCTION__, ndimms, num);
+            return 1;
+        }
+
+        if (add) {
+            dimm_activate(slot);
+        }
+        else {
+            dimm_deactivate(slot);
+        }
+        ndimms++;
+        idx++;
+    }
+
+    return 0;
+}
 DimmState *dimm_find_from_idx(uint32_t idx)
 {
     DimmState *slot;
diff --git a/hw/dimm.h b/hw/dimm.h
index 0fdf59b..7c456fa 100644
--- a/hw/dimm.h
+++ b/hw/dimm.h
@@ -11,6 +11,11 @@
 #define DIMM_BITMAP_BYTES (MAX_DIMMS + 7) / 8
 #define DEFAULT_DIMMSIZE 1024*1024*1024
 
+enum {
+    DIMM_MIN_UNPOPULATED= 0,
+    DIMM_MAX_POPULATED = 1
+};
+
 typedef enum {
     DIMM_REMOVE_SUCCESS = 0,
     DIMM_REMOVE_FAIL = 1,
@@ -61,5 +66,7 @@ void dimm_deactivate(DimmState *slot);
 void dimm_scan_populated(void);
 void dimm_notify(uint32_t idx, uint32_t event);
 int dimm_set_populated(DimmState *s);
+DimmState *dimm_find_next(char *pfx, uint32_t mode);
+int dimm_do_range(Monitor *mon, const QDict *qdict, bool add);
 
 #endif
diff --git a/monitor.c b/monitor.c
index 1dd646c..2e0ce1f 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4838,3 +4838,13 @@ int do_dimm_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
 {
     return dimm_do(mon, qdict, false);
 }
+
+int do_dimm_add_range(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+    return dimm_do_range(mon, qdict, true);
+}
+
+int do_dimm_del_range(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+    return dimm_do_range(mon, qdict, false);
+}
diff --git a/monitor.h b/monitor.h
index afdd721..8224301 100644
--- a/monitor.h
+++ b/monitor.h
@@ -88,5 +88,7 @@ int qmp_qom_get(Monitor *mon, const QDict *qdict, QObject **ret);
 
 int do_dimm_add(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_dimm_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_dimm_add_range(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_dimm_del_range(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif /* !MONITOR_H */
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 6c71696..c3f74ea 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2306,3 +2306,43 @@ Example:
    }
 
 EQMP
+
+    {
+        .name       = "mem-increase",
+        .args_type  = "pfx:s,num:s",
+        .mhandler.cmd_new = do_dimm_add_range,
+    },
+SQMP
+mem-increase
+-------------
+
+Hotplug memory DIMMs from memory pool
+
+Will hotplug num memory DIMMs from pool with name pfx.
+
+Example:
+
+-> { "execute": "mem-increase", "arguments": { "pfx" : "pool", "num": "10" } }
+<- { "return": {} }
+
+EQMP
+
+    {
+        .name       = "mem-decrease",
+        .args_type  = "pfx:s,num:s",
+        .mhandler.cmd_new = do_dimm_del_range,
+    },
+SQMP
+mem-decrease
+-------------
+
+Hot-unplug memory DIMMs from memory pool
+
+Will hot-unplug num memory DIMMs from pool with name pfx.
+
+Example:
+
+-> { "execute": "mem-decrease", "arguments": { "pfx" : "pool", "num": "10" } }
+<- { "return": {} }
+
+EQMP
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC PATCH v2 21/21] Implement mem_increase, mem_decrease hmp/qmp commands
@ 2012-07-11 10:32   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 10:32 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios
  Cc: gleb, Vasilis Liaskovitis, kevin, avi, anthony, imammedo

This implements batch monitor operations for hot-add and hot-remove. These are
probably better suited for a higher-level management layer, but are useful for
testing. Let me know if there is interest for such commands upstream.

syntax: mem_increase poolid num
will hotplug num dimms from pool poolid. This starts from lowest unpopulated
physical memory (dimm) and trying to cover any existing physical holes.

syntax: mem_decrease poolid num
will hot-unplug num dimms from pool poolid, This starts from highest populated
physical memory (dimm).

Respective qmp commands are "mem-increase", "mem-decrease".

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hmp-commands.hx |   31 ++++++++++++++++
 hw/dimm.c       |  104 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/dimm.h       |    7 ++++
 monitor.c       |   10 +++++
 monitor.h       |    2 +
 qmp-commands.hx |   40 +++++++++++++++++++++
 6 files changed, 194 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 016062e..e0c1cf4 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -648,6 +648,37 @@ STEXI
 
 Hot-add dimm.
 ETEXI
+    {
+        .name       = "mem_increase",
+        .args_type  = "pfx:s,num:s",
+        .params     = "pfx num",
+        .help       = "hot-plug num dimms of memory pool pfx",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_dimm_add_range,
+    },
+
+STEXI
+@item mem_increase @var{config}
+@findex mem_increase
+
+Hotplug dimms.
+ETEXI
+
+    {
+        .name       = "mem_decrease",
+        .args_type  = "pfx:s,num:s",
+        .params     = "pfx num",
+        .help       = "hot-unplug num dimms of memory pool pfx",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_dimm_del_range,
+    },
+
+STEXI
+@item mem_decrease @var{config}
+@findex mem_decrease
+
+Hot-unplug dimms.
+ETEXI
 
     {
         .name       = "device_del",
diff --git a/hw/dimm.c b/hw/dimm.c
index b544173..48542ba 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -166,6 +166,110 @@ int dimm_do(Monitor *mon, const QDict *qdict, bool add)
     return 0;
 }
 
+/* Find for dimm_do_range operation
+    DIMM_MIN_UNPOPULATED: used for finding next DIMM to hotplug
+    DIMM_MAX_POPULATED: used for finding next DIMM for hot-unplug
+ */
+
+DimmState *dimm_find_next(char *pfx, uint32_t mode)
+{
+    DeviceState *dev;
+    DimmState *slot, *ret;
+    const char *type;
+    uint32_t idx;
+
+    Error *err = NULL;
+    BusChild *kid;
+    BusState *bus = sysbus_get_default();
+    ret = NULL;
+
+    if (mode == DIMM_MIN_UNPOPULATED)
+        idx =  MAX_DIMMS;
+    else if (mode == DIMM_MAX_POPULATED)
+        idx = 0;
+    else
+        return false;
+
+    QTAILQ_FOREACH(kid, &bus->children, sibling) {
+        dev = kid->child;
+        type = object_property_get_str(OBJECT(dev), "type", &err);
+        if (err) {
+            error_free(err);
+            fprintf(stderr, "error getting device type\n");
+            exit(1);
+        }
+
+        if (!strcmp(type, "dimm")) {
+            slot = DIMM(dev);
+            if (strstr(dev->id, pfx) && strcmp(dev->id, pfx)) {
+                if (mode == DIMM_MIN_UNPOPULATED &&
+                        (slot->populated == false) &&
+                        (slot->pending == false) &&
+                        (idx > slot->idx)) {
+                    idx = slot->idx;
+                    ret = slot;
+                }
+                else if (mode == DIMM_MAX_POPULATED &&
+                        (slot->populated == true) &&
+                        (slot->pending == false) &&
+                        (idx <= slot->idx)) {
+                    idx = slot->idx;
+                    ret = slot;
+                }
+            }
+        }
+    }
+    return ret;
+}
+
+int dimm_do_range(Monitor *mon, const QDict *qdict, bool add)
+{
+    DimmState *slot = NULL;
+    uint32_t mode;
+    uint32_t idx;
+    int num, ndimms;
+
+    char *pfx = (char*) qdict_get_try_str(qdict, "pfx");
+    if (!pfx) {
+        fprintf(stderr, "ERROR %s invalid pfx\n",__FUNCTION__);
+        return 1;
+    }
+
+    char *value = (char*) qdict_get_try_str(qdict, "num");
+    if (!value) {
+        fprintf(stderr, "ERROR %s invalid pfx\n",__FUNCTION__);
+        return 1;
+    }
+    num = atoi(value);
+
+    if (add)
+        mode = DIMM_MIN_UNPOPULATED;
+    else
+        mode = DIMM_MAX_POPULATED;
+
+    ndimms = 0;
+    while (ndimms < num) {
+        slot = dimm_find_next(pfx, mode);
+        if (slot == NULL) {
+            fprintf(stderr, "%s no further slot found for pool %s\n",
+                    __FUNCTION__, pfx);
+            fprintf(stderr, "%s operated on %d / %d requested dimms\n",
+                    __FUNCTION__, ndimms, num);
+            return 1;
+        }
+
+        if (add) {
+            dimm_activate(slot);
+        }
+        else {
+            dimm_deactivate(slot);
+        }
+        ndimms++;
+        idx++;
+    }
+
+    return 0;
+}
 DimmState *dimm_find_from_idx(uint32_t idx)
 {
     DimmState *slot;
diff --git a/hw/dimm.h b/hw/dimm.h
index 0fdf59b..7c456fa 100644
--- a/hw/dimm.h
+++ b/hw/dimm.h
@@ -11,6 +11,11 @@
 #define DIMM_BITMAP_BYTES (MAX_DIMMS + 7) / 8
 #define DEFAULT_DIMMSIZE 1024*1024*1024
 
+enum {
+    DIMM_MIN_UNPOPULATED= 0,
+    DIMM_MAX_POPULATED = 1
+};
+
 typedef enum {
     DIMM_REMOVE_SUCCESS = 0,
     DIMM_REMOVE_FAIL = 1,
@@ -61,5 +66,7 @@ void dimm_deactivate(DimmState *slot);
 void dimm_scan_populated(void);
 void dimm_notify(uint32_t idx, uint32_t event);
 int dimm_set_populated(DimmState *s);
+DimmState *dimm_find_next(char *pfx, uint32_t mode);
+int dimm_do_range(Monitor *mon, const QDict *qdict, bool add);
 
 #endif
diff --git a/monitor.c b/monitor.c
index 1dd646c..2e0ce1f 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4838,3 +4838,13 @@ int do_dimm_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
 {
     return dimm_do(mon, qdict, false);
 }
+
+int do_dimm_add_range(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+    return dimm_do_range(mon, qdict, true);
+}
+
+int do_dimm_del_range(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+    return dimm_do_range(mon, qdict, false);
+}
diff --git a/monitor.h b/monitor.h
index afdd721..8224301 100644
--- a/monitor.h
+++ b/monitor.h
@@ -88,5 +88,7 @@ int qmp_qom_get(Monitor *mon, const QDict *qdict, QObject **ret);
 
 int do_dimm_add(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_dimm_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_dimm_add_range(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_dimm_del_range(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif /* !MONITOR_H */
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 6c71696..c3f74ea 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2306,3 +2306,43 @@ Example:
    }
 
 EQMP
+
+    {
+        .name       = "mem-increase",
+        .args_type  = "pfx:s,num:s",
+        .mhandler.cmd_new = do_dimm_add_range,
+    },
+SQMP
+mem-increase
+-------------
+
+Hotplug memory DIMMs from memory pool
+
+Will hotplug num memory DIMMs from pool with name pfx.
+
+Example:
+
+-> { "execute": "mem-increase", "arguments": { "pfx" : "pool", "num": "10" } }
+<- { "return": {} }
+
+EQMP
+
+    {
+        .name       = "mem-decrease",
+        .args_type  = "pfx:s,num:s",
+        .mhandler.cmd_new = do_dimm_del_range,
+    },
+SQMP
+mem-decrease
+-------------
+
+Hot-unplug memory DIMMs from memory pool
+
+Will hot-unplug num memory DIMMs from pool with name pfx.
+
+Example:
+
+-> { "execute": "mem-decrease", "arguments": { "pfx" : "pool", "num": "10" } }
+<- { "return": {} }
+
+EQMP
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH v2 04/21][SeaBIOS] acpi: generate hotplug memory devices
  2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 10:48     ` Wen Congyang
  -1 siblings, 0 replies; 86+ messages in thread
From: Wen Congyang @ 2012-07-11 10:48 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

At 07/11/2012 06:31 PM, Vasilis Liaskovitis Wrote:
> The memory device generation is guided by qemu paravirt info. Seabios
> first uses the info to setup SRAT entries for the hotplug-able memory slots.
> Afterwards, build_memssdt uses the created SRAT entries to generate
> appropriate memory device objects. One memory device (and corresponding SRAT
> entry) is generated for each hotplug-able qemu memslot. Currently no SSDT
> memory device is created for initial system memory.
> 
> We only support up to 255 DIMMs for now (PackageOp used for the MEON array can
> only describe an array of at most 255 elements. VarPackageOp would be needed to
> support more than 255 devices)
> 
> v1->v2:
> Seabios reads mems_sts from qemu to build e820_map
> SSDT size and some offsets are calculated with extraction macros.
> 
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> ---
>  src/acpi.c |  158 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 files changed, 152 insertions(+), 6 deletions(-)
> 
> diff --git a/src/acpi.c b/src/acpi.c
> index 55e4607..c83e8c7 100644
> --- a/src/acpi.c
> +++ b/src/acpi.c
> @@ -510,6 +510,127 @@ build_ssdt(void)
>      return ssdt;
>  }
>  
> +#include "ssdt-mem.hex"
> +
> +/* 0x5B 0x82 DeviceOp PkgLength NameString DimmID */
> +#define MEM_BASE 0xaf80
> +#define SD_MEM (ssdm_mem_aml + *ssdt_mem_start)
> +#define SD_MEMSIZEOF (*ssdt_mem_end - *ssdt_mem_start)
> +#define SD_OFFSET_MEMHEX (*ssdt_mem_name - *ssdt_mem_start + 2)
> +#define SD_OFFSET_MEMID (*ssdt_mem_id - *ssdt_mem_start)
> +#define SD_OFFSET_PXMID 31
> +#define SD_OFFSET_MEMSTART 55
> +#define SD_OFFSET_MEMEND   63
> +#define SD_OFFSET_MEMSIZE  79
> +
> +u64 nb_hp_memslots = 0;
> +struct srat_memory_affinity *mem;
> +
> +static void build_memdev(u8 *ssdt_ptr, int i, u64 mem_base, u64 mem_len, u8 node)
> +{
> +    memcpy(ssdt_ptr, SD_MEM, SD_MEMSIZEOF);
> +    ssdt_ptr[SD_OFFSET_MEMHEX] = getHex(i >> 4);
> +    ssdt_ptr[SD_OFFSET_MEMHEX+1] = getHex(i);
> +    ssdt_ptr[SD_OFFSET_MEMID] = i;
> +    ssdt_ptr[SD_OFFSET_PXMID] = node;
> +    *(u64*)(ssdt_ptr + SD_OFFSET_MEMSTART) = mem_base;
> +    *(u64*)(ssdt_ptr + SD_OFFSET_MEMEND) = mem_base + mem_len;
> +    *(u64*)(ssdt_ptr + SD_OFFSET_MEMSIZE) = mem_len;
> +}
> +
> +static void*
> +build_memssdt(void)
> +{
> +    u64 mem_base;
> +    u64 mem_len;
> +    u8  node;
> +    int i;
> +    struct srat_memory_affinity *entry = mem;
> +    u64 nb_memdevs = nb_hp_memslots;
> +    u8  memslot_status, enabled;
> +
> +    int length = ((1+3+4)
> +                  + (nb_memdevs * SD_MEMSIZEOF)
> +                  + (1+2+5+(12*nb_memdevs))
> +                  + (6+2+1+(1*nb_memdevs)));
> +    u8 *ssdt = malloc_high(sizeof(struct acpi_table_header) + length);
> +    if (! ssdt) {
> +        warn_noalloc();
> +        return NULL;
> +    }
> +    u8 *ssdt_ptr = ssdt + sizeof(struct acpi_table_header);
> +
> +    // build Scope(_SB_) header
> +    *(ssdt_ptr++) = 0x10; // ScopeOp
> +    ssdt_ptr = encodeLen(ssdt_ptr, length-1, 3);
> +    *(ssdt_ptr++) = '_';
> +   *(ssdt_ptr++) = 'S';
> +    *(ssdt_ptr++) = 'B';
> +    *(ssdt_ptr++) = '_';
> +
> +    for (i = 0; i < nb_memdevs; i++) {
> +        mem_base = (((u64)(entry->base_addr_high) << 32 )| entry->base_addr_low);
> +        mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low);
> +        node = entry->proximity[0];
> +        build_memdev(ssdt_ptr, i, mem_base, mem_len, node);
> +        ssdt_ptr += SD_MEMSIZEOF;
> +        entry++;
> +    }
> +
> +    // build "Method(MTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CM00, Arg1)} ...}"
> +    *(ssdt_ptr++) = 0x14; // MethodOp
> +    ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*nb_memdevs), 2);
> +    *(ssdt_ptr++) = 'M';
> +    *(ssdt_ptr++) = 'T';
> +    *(ssdt_ptr++) = 'F';
> +    *(ssdt_ptr++) = 'Y';
> +    *(ssdt_ptr++) = 0x02;
> +    for (i=0; i<nb_memdevs; i++) {
> +        *(ssdt_ptr++) = 0xA0; // IfOp
> +       ssdt_ptr = encodeLen(ssdt_ptr, 11, 1);
> +        *(ssdt_ptr++) = 0x93; // LEqualOp
> +        *(ssdt_ptr++) = 0x68; // Arg0Op
> +        *(ssdt_ptr++) = 0x0A; // BytePrefix
> +        *(ssdt_ptr++) = i;
> +        *(ssdt_ptr++) = 0x86; // NotifyOp
> +        *(ssdt_ptr++) = 'M';
> +        *(ssdt_ptr++) = 'P';
> +        *(ssdt_ptr++) = getHex(i >> 4);
> +        *(ssdt_ptr++) = getHex(i);
> +        *(ssdt_ptr++) = 0x69; // Arg1Op
> +    }
> +
> +    // build "Name(MEON, Package() { One, One, ..., Zero, Zero, ... })"
> +    *(ssdt_ptr++) = 0x08; // NameOp
> +    *(ssdt_ptr++) = 'M';
> +    *(ssdt_ptr++) = 'E';
> +    *(ssdt_ptr++) = 'O';
> +    *(ssdt_ptr++) = 'N';
> +    *(ssdt_ptr++) = 0x12; // PackageOp
> +    ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*nb_memdevs), 2);
> +    *(ssdt_ptr++) = nb_memdevs;
> +
> +    entry = mem;
> +    memslot_status = 0;
> +
> +    for (i = 0; i < nb_memdevs; i++) {
> +        enabled = 0;
> +        if (i % 8 == 0)
> +            memslot_status = inb(MEM_BASE + i/8);
> +        enabled = memslot_status & 1;
> +        mem_base = (((u64)(entry->base_addr_high) << 32 )| entry->base_addr_low);
> +        mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low);
> +        *(ssdt_ptr++) = enabled ? 0x01 : 0x00;
> +        if (enabled)
> +            add_e820(mem_base, mem_len, E820_RAM);

add_e820() is declared in memmap.h. You should include this header file,
otherwise, seabios cannot be built.

Thanks
Wen Congyang

> +        memslot_status = memslot_status >> 1;
> +        entry++;
> +    }
> +    build_header((void*)ssdt, SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
> +
> +    return ssdt;
> +}
> +
>  #include "ssdt-pcihp.hex"
>  
>  #define PCI_RMV_BASE 0xae0c
> @@ -618,9 +739,6 @@ build_srat(void)
>  {
>      int nb_numa_nodes = qemu_cfg_get_numa_nodes();
>  
> -    if (nb_numa_nodes == 0)
> -        return NULL;
> -
>      u64 *numadata = malloc_tmphigh(sizeof(u64) * (MaxCountCPUs + nb_numa_nodes));
>      if (!numadata) {
>          warn_noalloc();
> @@ -629,10 +747,11 @@ build_srat(void)
>  
>      qemu_cfg_get_numa_data(numadata, MaxCountCPUs + nb_numa_nodes);
>  
> +    qemu_cfg_get_numa_data(&nb_hp_memslots, 1);
>      struct system_resource_affinity_table *srat;
>      int srat_size = sizeof(*srat) +
>          sizeof(struct srat_processor_affinity) * MaxCountCPUs +
> -        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + 2);
> +        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + nb_hp_memslots + 2);
>  
>      srat = malloc_high(srat_size);
>      if (!srat) {
> @@ -667,7 +786,7 @@ build_srat(void)
>       * from 640k-1M and possibly another one from 3.5G-4G.
>       */
>      struct srat_memory_affinity *numamem = (void*)core;
> -    int slots = 0;
> +    int slots = 0, node;
>      u64 mem_len, mem_base, next_base = 0;
>  
>      acpi_build_srat_memory(numamem, 0, 640*1024, 0, 1);
> @@ -694,10 +813,36 @@ build_srat(void)
>              next_base += (1ULL << 32) - RamSize;
>          }
>          acpi_build_srat_memory(numamem, mem_base, mem_len, i-1, 1);
> +
>          numamem++;
>          slots++;
> +
> +    }
> +    mem = (void*)numamem;
> +
> +    if (nb_hp_memslots) {
> +        u64 *hpmemdata = malloc_tmphigh(sizeof(u64) * (3 * nb_hp_memslots));
> +        if (!hpmemdata) {
> +            warn_noalloc();
> +            free(hpmemdata);
> +            free(numadata);
> +            return NULL;
> +        }
> +
> +        qemu_cfg_get_numa_data(hpmemdata, 3 * nb_hp_memslots);
> +
> +        for (i = 1; i < nb_hp_memslots + 1; ++i) {
> +            mem_base = *hpmemdata++;
> +            mem_len = *hpmemdata++;
> +            node = *hpmemdata++;
> +            acpi_build_srat_memory(numamem, mem_base, mem_len, node, 1);
> +            numamem++;
> +            slots++;
> +        }
> +        free(hpmemdata);
>      }
> -    for (; slots < nb_numa_nodes + 2; slots++) {
> +
> +    for (; slots < nb_numa_nodes + nb_hp_memslots + 2; slots++) {
>          acpi_build_srat_memory(numamem, 0, 0, 0, 0);
>          numamem++;
>      }
> @@ -748,6 +893,7 @@ acpi_bios_init(void)
>      ACPI_INIT_TABLE(build_madt());
>      ACPI_INIT_TABLE(build_hpet());
>      ACPI_INIT_TABLE(build_srat());
> +    ACPI_INIT_TABLE(build_memssdt());
>      ACPI_INIT_TABLE(build_pcihp());
>  
>      u16 i, external_tables = qemu_cfg_acpi_additional_tables();

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 04/21][SeaBIOS] acpi: generate hotplug memory devices
@ 2012-07-11 10:48     ` Wen Congyang
  0 siblings, 0 replies; 86+ messages in thread
From: Wen Congyang @ 2012-07-11 10:48 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

At 07/11/2012 06:31 PM, Vasilis Liaskovitis Wrote:
> The memory device generation is guided by qemu paravirt info. Seabios
> first uses the info to setup SRAT entries for the hotplug-able memory slots.
> Afterwards, build_memssdt uses the created SRAT entries to generate
> appropriate memory device objects. One memory device (and corresponding SRAT
> entry) is generated for each hotplug-able qemu memslot. Currently no SSDT
> memory device is created for initial system memory.
> 
> We only support up to 255 DIMMs for now (PackageOp used for the MEON array can
> only describe an array of at most 255 elements. VarPackageOp would be needed to
> support more than 255 devices)
> 
> v1->v2:
> Seabios reads mems_sts from qemu to build e820_map
> SSDT size and some offsets are calculated with extraction macros.
> 
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> ---
>  src/acpi.c |  158 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 files changed, 152 insertions(+), 6 deletions(-)
> 
> diff --git a/src/acpi.c b/src/acpi.c
> index 55e4607..c83e8c7 100644
> --- a/src/acpi.c
> +++ b/src/acpi.c
> @@ -510,6 +510,127 @@ build_ssdt(void)
>      return ssdt;
>  }
>  
> +#include "ssdt-mem.hex"
> +
> +/* 0x5B 0x82 DeviceOp PkgLength NameString DimmID */
> +#define MEM_BASE 0xaf80
> +#define SD_MEM (ssdm_mem_aml + *ssdt_mem_start)
> +#define SD_MEMSIZEOF (*ssdt_mem_end - *ssdt_mem_start)
> +#define SD_OFFSET_MEMHEX (*ssdt_mem_name - *ssdt_mem_start + 2)
> +#define SD_OFFSET_MEMID (*ssdt_mem_id - *ssdt_mem_start)
> +#define SD_OFFSET_PXMID 31
> +#define SD_OFFSET_MEMSTART 55
> +#define SD_OFFSET_MEMEND   63
> +#define SD_OFFSET_MEMSIZE  79
> +
> +u64 nb_hp_memslots = 0;
> +struct srat_memory_affinity *mem;
> +
> +static void build_memdev(u8 *ssdt_ptr, int i, u64 mem_base, u64 mem_len, u8 node)
> +{
> +    memcpy(ssdt_ptr, SD_MEM, SD_MEMSIZEOF);
> +    ssdt_ptr[SD_OFFSET_MEMHEX] = getHex(i >> 4);
> +    ssdt_ptr[SD_OFFSET_MEMHEX+1] = getHex(i);
> +    ssdt_ptr[SD_OFFSET_MEMID] = i;
> +    ssdt_ptr[SD_OFFSET_PXMID] = node;
> +    *(u64*)(ssdt_ptr + SD_OFFSET_MEMSTART) = mem_base;
> +    *(u64*)(ssdt_ptr + SD_OFFSET_MEMEND) = mem_base + mem_len;
> +    *(u64*)(ssdt_ptr + SD_OFFSET_MEMSIZE) = mem_len;
> +}
> +
> +static void*
> +build_memssdt(void)
> +{
> +    u64 mem_base;
> +    u64 mem_len;
> +    u8  node;
> +    int i;
> +    struct srat_memory_affinity *entry = mem;
> +    u64 nb_memdevs = nb_hp_memslots;
> +    u8  memslot_status, enabled;
> +
> +    int length = ((1+3+4)
> +                  + (nb_memdevs * SD_MEMSIZEOF)
> +                  + (1+2+5+(12*nb_memdevs))
> +                  + (6+2+1+(1*nb_memdevs)));
> +    u8 *ssdt = malloc_high(sizeof(struct acpi_table_header) + length);
> +    if (! ssdt) {
> +        warn_noalloc();
> +        return NULL;
> +    }
> +    u8 *ssdt_ptr = ssdt + sizeof(struct acpi_table_header);
> +
> +    // build Scope(_SB_) header
> +    *(ssdt_ptr++) = 0x10; // ScopeOp
> +    ssdt_ptr = encodeLen(ssdt_ptr, length-1, 3);
> +    *(ssdt_ptr++) = '_';
> +   *(ssdt_ptr++) = 'S';
> +    *(ssdt_ptr++) = 'B';
> +    *(ssdt_ptr++) = '_';
> +
> +    for (i = 0; i < nb_memdevs; i++) {
> +        mem_base = (((u64)(entry->base_addr_high) << 32 )| entry->base_addr_low);
> +        mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low);
> +        node = entry->proximity[0];
> +        build_memdev(ssdt_ptr, i, mem_base, mem_len, node);
> +        ssdt_ptr += SD_MEMSIZEOF;
> +        entry++;
> +    }
> +
> +    // build "Method(MTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CM00, Arg1)} ...}"
> +    *(ssdt_ptr++) = 0x14; // MethodOp
> +    ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*nb_memdevs), 2);
> +    *(ssdt_ptr++) = 'M';
> +    *(ssdt_ptr++) = 'T';
> +    *(ssdt_ptr++) = 'F';
> +    *(ssdt_ptr++) = 'Y';
> +    *(ssdt_ptr++) = 0x02;
> +    for (i=0; i<nb_memdevs; i++) {
> +        *(ssdt_ptr++) = 0xA0; // IfOp
> +       ssdt_ptr = encodeLen(ssdt_ptr, 11, 1);
> +        *(ssdt_ptr++) = 0x93; // LEqualOp
> +        *(ssdt_ptr++) = 0x68; // Arg0Op
> +        *(ssdt_ptr++) = 0x0A; // BytePrefix
> +        *(ssdt_ptr++) = i;
> +        *(ssdt_ptr++) = 0x86; // NotifyOp
> +        *(ssdt_ptr++) = 'M';
> +        *(ssdt_ptr++) = 'P';
> +        *(ssdt_ptr++) = getHex(i >> 4);
> +        *(ssdt_ptr++) = getHex(i);
> +        *(ssdt_ptr++) = 0x69; // Arg1Op
> +    }
> +
> +    // build "Name(MEON, Package() { One, One, ..., Zero, Zero, ... })"
> +    *(ssdt_ptr++) = 0x08; // NameOp
> +    *(ssdt_ptr++) = 'M';
> +    *(ssdt_ptr++) = 'E';
> +    *(ssdt_ptr++) = 'O';
> +    *(ssdt_ptr++) = 'N';
> +    *(ssdt_ptr++) = 0x12; // PackageOp
> +    ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*nb_memdevs), 2);
> +    *(ssdt_ptr++) = nb_memdevs;
> +
> +    entry = mem;
> +    memslot_status = 0;
> +
> +    for (i = 0; i < nb_memdevs; i++) {
> +        enabled = 0;
> +        if (i % 8 == 0)
> +            memslot_status = inb(MEM_BASE + i/8);
> +        enabled = memslot_status & 1;
> +        mem_base = (((u64)(entry->base_addr_high) << 32 )| entry->base_addr_low);
> +        mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low);
> +        *(ssdt_ptr++) = enabled ? 0x01 : 0x00;
> +        if (enabled)
> +            add_e820(mem_base, mem_len, E820_RAM);

add_e820() is declared in memmap.h. You should include this header file,
otherwise, seabios cannot be built.

Thanks
Wen Congyang

> +        memslot_status = memslot_status >> 1;
> +        entry++;
> +    }
> +    build_header((void*)ssdt, SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
> +
> +    return ssdt;
> +}
> +
>  #include "ssdt-pcihp.hex"
>  
>  #define PCI_RMV_BASE 0xae0c
> @@ -618,9 +739,6 @@ build_srat(void)
>  {
>      int nb_numa_nodes = qemu_cfg_get_numa_nodes();
>  
> -    if (nb_numa_nodes == 0)
> -        return NULL;
> -
>      u64 *numadata = malloc_tmphigh(sizeof(u64) * (MaxCountCPUs + nb_numa_nodes));
>      if (!numadata) {
>          warn_noalloc();
> @@ -629,10 +747,11 @@ build_srat(void)
>  
>      qemu_cfg_get_numa_data(numadata, MaxCountCPUs + nb_numa_nodes);
>  
> +    qemu_cfg_get_numa_data(&nb_hp_memslots, 1);
>      struct system_resource_affinity_table *srat;
>      int srat_size = sizeof(*srat) +
>          sizeof(struct srat_processor_affinity) * MaxCountCPUs +
> -        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + 2);
> +        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + nb_hp_memslots + 2);
>  
>      srat = malloc_high(srat_size);
>      if (!srat) {
> @@ -667,7 +786,7 @@ build_srat(void)
>       * from 640k-1M and possibly another one from 3.5G-4G.
>       */
>      struct srat_memory_affinity *numamem = (void*)core;
> -    int slots = 0;
> +    int slots = 0, node;
>      u64 mem_len, mem_base, next_base = 0;
>  
>      acpi_build_srat_memory(numamem, 0, 640*1024, 0, 1);
> @@ -694,10 +813,36 @@ build_srat(void)
>              next_base += (1ULL << 32) - RamSize;
>          }
>          acpi_build_srat_memory(numamem, mem_base, mem_len, i-1, 1);
> +
>          numamem++;
>          slots++;
> +
> +    }
> +    mem = (void*)numamem;
> +
> +    if (nb_hp_memslots) {
> +        u64 *hpmemdata = malloc_tmphigh(sizeof(u64) * (3 * nb_hp_memslots));
> +        if (!hpmemdata) {
> +            warn_noalloc();
> +            free(hpmemdata);
> +            free(numadata);
> +            return NULL;
> +        }
> +
> +        qemu_cfg_get_numa_data(hpmemdata, 3 * nb_hp_memslots);
> +
> +        for (i = 1; i < nb_hp_memslots + 1; ++i) {
> +            mem_base = *hpmemdata++;
> +            mem_len = *hpmemdata++;
> +            node = *hpmemdata++;
> +            acpi_build_srat_memory(numamem, mem_base, mem_len, node, 1);
> +            numamem++;
> +            slots++;
> +        }
> +        free(hpmemdata);
>      }
> -    for (; slots < nb_numa_nodes + 2; slots++) {
> +
> +    for (; slots < nb_numa_nodes + nb_hp_memslots + 2; slots++) {
>          acpi_build_srat_memory(numamem, 0, 0, 0, 0);
>          numamem++;
>      }
> @@ -748,6 +893,7 @@ acpi_bios_init(void)
>      ACPI_INIT_TABLE(build_madt());
>      ACPI_INIT_TABLE(build_hpet());
>      ACPI_INIT_TABLE(build_srat());
> +    ACPI_INIT_TABLE(build_memssdt());
>      ACPI_INIT_TABLE(build_pcihp());
>  
>      u16 i, external_tables = qemu_cfg_acpi_additional_tables();

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH v2 05/21][SeaBIOS] pciinit: Fix pcimem_start value
  2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 11:56     ` Gerd Hoffmann
  -1 siblings, 0 replies; 86+ messages in thread
From: Gerd Hoffmann @ 2012-07-11 11:56 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

On 07/11/12 12:31, Vasilis Liaskovitis wrote:
> In order to hotplug memory between RamSize and BUILD_PCIMEM_START, the pci
> window needs to start at BUILD_PCIMEM_START (0xe0000000).
> Otherwise, the guest cannot online new dimms at those ranges due to pci_root
> window conflicts. (workaround for linux guest is booting with pci=nocrs)

>  static void pci_bios_map_devices(struct pci_bus *busses)
>  {
> -    pcimem_start = RamSize;
> +    pcimem_start = BUILD_PCIMEM_START;

It isn't that simple.  For the 32bit pci window it will work, but will
leaves address space unused instead of assigning it to the 32bit pci
window.  For the 64bit pci window it will not work.

You have to walk the dimms and figure what the highest used address is,
for both below-4g and above-4g.  Then fill two variable with it and make
the pci init code use that instead of RamSize and RamSizeOver4G.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 05/21][SeaBIOS] pciinit: Fix pcimem_start value
@ 2012-07-11 11:56     ` Gerd Hoffmann
  0 siblings, 0 replies; 86+ messages in thread
From: Gerd Hoffmann @ 2012-07-11 11:56 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

On 07/11/12 12:31, Vasilis Liaskovitis wrote:
> In order to hotplug memory between RamSize and BUILD_PCIMEM_START, the pci
> window needs to start at BUILD_PCIMEM_START (0xe0000000).
> Otherwise, the guest cannot online new dimms at those ranges due to pci_root
> window conflicts. (workaround for linux guest is booting with pci=nocrs)

>  static void pci_bios_map_devices(struct pci_bus *busses)
>  {
> -    pcimem_start = RamSize;
> +    pcimem_start = BUILD_PCIMEM_START;

It isn't that simple.  For the 32bit pci window it will work, but will
leaves address space unused instead of assigning it to the 32bit pci
window.  For the 64bit pci window it will not work.

You have to walk the dimms and figure what the highest used address is,
for both below-4g and above-4g.  Then fill two variable with it and make
the pci init code use that instead of RamSize and RamSizeOver4G.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH v2 20/21] Implement -dimms, -dimmspop command line options
  2012-07-11 10:32   ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 14:55     ` Avi Kivity
  -1 siblings, 0 replies; 86+ messages in thread
From: Avi Kivity @ 2012-07-11 14:55 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: qemu-devel, kvm, seabios, anthony, gleb, imammedo, kevin, wency

On 07/11/2012 01:32 PM, Vasilis Liaskovitis wrote:
> Implement batch dimm creation command line options. These could be useful for
> not bloating the command line with a large number of dimms.

IMO this is unneeded.  With a management tool there is no problem
generating a long command line; from the command line -dimm will be a
rarely used option.

> 
> syntax: -dimms pfx=poolid,size=sz,num=n
> Will create numdimms dimms with ids poolid0, ..., poolidn-1. Each dimm has a
> size of sz.
>     
> Implement -dimmpop option to populate dimms at bootup
> syntax: -dimmpop pfx=poolid,num=n
> This will populate n dimms with ids poolid0, ..., poolidn-1.
> 
> (live-migration could break here without patch 12/21: -dimmspop
> needs to be reworked to support populating of individual dimms with
> same prefix, and not only a range of dimms starting from 0)
> 

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 20/21] Implement -dimms, -dimmspop command line options
@ 2012-07-11 14:55     ` Avi Kivity
  0 siblings, 0 replies; 86+ messages in thread
From: Avi Kivity @ 2012-07-11 14:55 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, anthony, imammedo

On 07/11/2012 01:32 PM, Vasilis Liaskovitis wrote:
> Implement batch dimm creation command line options. These could be useful for
> not bloating the command line with a large number of dimms.

IMO this is unneeded.  With a management tool there is no problem
generating a long command line; from the command line -dimm will be a
rarely used option.

> 
> syntax: -dimms pfx=poolid,size=sz,num=n
> Will create numdimms dimms with ids poolid0, ..., poolidn-1. Each dimm has a
> size of sz.
>     
> Implement -dimmpop option to populate dimms at bootup
> syntax: -dimmpop pfx=poolid,num=n
> This will populate n dimms with ids poolid0, ..., poolidn-1.
> 
> (live-migration could break here without patch 12/21: -dimmspop
> needs to be reworked to support populating of individual dimms with
> same prefix, and not only a range of dimms starting from 0)
> 

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 13/21] Implement memory hotplug notification lists
  2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 14:59     ` Eric Blake
  -1 siblings, 0 replies; 86+ messages in thread
From: Eric Blake @ 2012-07-11 14:59 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: qemu-devel, kvm, seabios, gleb, kevin, avi, anthony, imammedo

[-- Attachment #1: Type: text/plain, Size: 1691 bytes --]

On 07/11/2012 04:31 AM, Vasilis Liaskovitis wrote:
> Guest can respond to ACPI hotplug events e.g. with _EJ or _OST method.
> This patch implements a tail queue to store guest notifications for memory
> hot-add and hot-remove requests.
> 
> Guest responses for memory hotplug command on a per-dimm basis can be detected
> with the new hmp command "info memhp" or the new qmp command "query-memhp"
> Examples:
>     

> +++ b/qapi-schema.json
> @@ -1862,3 +1862,29 @@
>  # Since: 0.14.0
>  ##
>  { 'command': 'netdev_del', 'data': {'id': 'str'} }
> +
> +##
> +# @MemHpInfo:
> +#
> +# Information about status of a memory hotplug command
> +#
> +# @Dimm: the Dimm associated with the result
> +#
> +# @result: the result of the hotplug command
> +#
> +# Since: 1.1.3

Should probably be 1.2, not 1.1.3.

> +#
> +##
> +{ 'type': 'MemHpInfo',
> +  'data': {'Dimm': 'str', 'request': 'str', 'result': 'str'} }

Why the upper case?  Wouldn't 'dimm' be more consistent?

> +
> +##
> +# @query-memhp:

Why are we abbreviating?  It might be better to name the QMP command
query-memory-hotplug

> +#
> +# Returns a list of information about pending hotplug commands
> +#
> +# Returns: a list of @MemhpInfo
> +#
> +# Since: 1.1.3

Likewise for 1.2.

> +
> +- "Dimm": Dimm name (json-str)
> +- "request": type of hot request: hot-add or hot-remove  (json-str)
> +- "result": result of the hotplug request for this Dimm success or failure (json-str)

This may need tweaks (such as s/Dimm/dimm/) based on resolution of above
comments.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 13/21] Implement memory hotplug notification lists
@ 2012-07-11 14:59     ` Eric Blake
  0 siblings, 0 replies; 86+ messages in thread
From: Eric Blake @ 2012-07-11 14:59 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

[-- Attachment #1: Type: text/plain, Size: 1691 bytes --]

On 07/11/2012 04:31 AM, Vasilis Liaskovitis wrote:
> Guest can respond to ACPI hotplug events e.g. with _EJ or _OST method.
> This patch implements a tail queue to store guest notifications for memory
> hot-add and hot-remove requests.
> 
> Guest responses for memory hotplug command on a per-dimm basis can be detected
> with the new hmp command "info memhp" or the new qmp command "query-memhp"
> Examples:
>     

> +++ b/qapi-schema.json
> @@ -1862,3 +1862,29 @@
>  # Since: 0.14.0
>  ##
>  { 'command': 'netdev_del', 'data': {'id': 'str'} }
> +
> +##
> +# @MemHpInfo:
> +#
> +# Information about status of a memory hotplug command
> +#
> +# @Dimm: the Dimm associated with the result
> +#
> +# @result: the result of the hotplug command
> +#
> +# Since: 1.1.3

Should probably be 1.2, not 1.1.3.

> +#
> +##
> +{ 'type': 'MemHpInfo',
> +  'data': {'Dimm': 'str', 'request': 'str', 'result': 'str'} }

Why the upper case?  Wouldn't 'dimm' be more consistent?

> +
> +##
> +# @query-memhp:

Why are we abbreviating?  It might be better to name the QMP command
query-memory-hotplug

> +#
> +# Returns a list of information about pending hotplug commands
> +#
> +# Returns: a list of @MemhpInfo
> +#
> +# Since: 1.1.3

Likewise for 1.2.

> +
> +- "Dimm": Dimm name (json-str)
> +- "request": type of hot request: hot-add or hot-remove  (json-str)
> +- "result": result of the hotplug request for this Dimm success or failure (json-str)

This may need tweaks (such as s/Dimm/dimm/) based on resolution of above
comments.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH v2 19/21] Implement "info memtotal" and "query-memtotal"
  2012-07-11 10:32   ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-11 15:14     ` Eric Blake
  -1 siblings, 0 replies; 86+ messages in thread
From: Eric Blake @ 2012-07-11 15:14 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

[-- Attachment #1: Type: text/plain, Size: 872 bytes --]

On 07/11/2012 04:32 AM, Vasilis Liaskovitis wrote:
> Returns total memory of guest in bytes, including hotplugged memory.
> 
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

Should this instead be merged with query-balloon output, so that we have
a single command that shows all aspects of memory usage (both balloon
and hotplug at once)?


> @@ -1888,3 +1888,15 @@
>  # Since: 1.1.3
>  ##
>  { 'command': 'query-memhp', 'returns': ['MemHpInfo'] }
> +
> +##
> +# @query-memtotal:

A more generic name might be 'query-memory', especially if we merge
balloon and hotplug information into one command.

> +#
> +# Returns total memory in bytes, including hotplugged dimms
> +#
> +# Returns: a l

truncated

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 19/21] Implement "info memtotal" and "query-memtotal"
@ 2012-07-11 15:14     ` Eric Blake
  0 siblings, 0 replies; 86+ messages in thread
From: Eric Blake @ 2012-07-11 15:14 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

[-- Attachment #1: Type: text/plain, Size: 872 bytes --]

On 07/11/2012 04:32 AM, Vasilis Liaskovitis wrote:
> Returns total memory of guest in bytes, including hotplugged memory.
> 
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>

Should this instead be merged with query-balloon output, so that we have
a single command that shows all aspects of memory usage (both balloon
and hotplug at once)?


> @@ -1888,3 +1888,15 @@
>  # Since: 1.1.3
>  ##
>  { 'command': 'query-memhp', 'returns': ['MemHpInfo'] }
> +
> +##
> +# @query-memtotal:

A more generic name might be 'query-memory', especially if we merge
balloon and hotplug information into one command.

> +#
> +# Returns total memory in bytes, including hotplugged dimms
> +#
> +# Returns: a l

truncated

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH v2 04/21][SeaBIOS] acpi: generate hotplug memory devices
  2012-07-11 10:48     ` [Qemu-devel] " Wen Congyang
@ 2012-07-11 16:39       ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 16:39 UTC (permalink / raw)
  To: Wen Congyang
  Cc: qemu-devel, kvm, seabios, avi, anthony, gleb, imammedo, kevin

Hi,

On Wed, Jul 11, 2012 at 06:48:38PM +0800, Wen Congyang wrote:
> > +        if (enabled)
> > +            add_e820(mem_base, mem_len, E820_RAM);
> 
> add_e820() is declared in memmap.h. You should include this header file,
> otherwise, seabios cannot be built.

thanks. you had the same comment on v1 but I forgot to address it. I will
update.

- Vasilis
> 
> Thanks
> Wen Congyang
> 
> > +        memslot_status = memslot_status >> 1;
> > +        entry++;
> > +    }
> > +    build_header((void*)ssdt, SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
> > +
> > +    return ssdt;
> > +}
> > +
> >  #include "ssdt-pcihp.hex"
> >  
> >  #define PCI_RMV_BASE 0xae0c
> > @@ -618,9 +739,6 @@ build_srat(void)
> >  {
> >      int nb_numa_nodes = qemu_cfg_get_numa_nodes();
> >  
> > -    if (nb_numa_nodes == 0)
> > -        return NULL;
> > -
> >      u64 *numadata = malloc_tmphigh(sizeof(u64) * (MaxCountCPUs + nb_numa_nodes));
> >      if (!numadata) {
> >          warn_noalloc();
> > @@ -629,10 +747,11 @@ build_srat(void)
> >  
> >      qemu_cfg_get_numa_data(numadata, MaxCountCPUs + nb_numa_nodes);
> >  
> > +    qemu_cfg_get_numa_data(&nb_hp_memslots, 1);
> >      struct system_resource_affinity_table *srat;
> >      int srat_size = sizeof(*srat) +
> >          sizeof(struct srat_processor_affinity) * MaxCountCPUs +
> > -        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + 2);
> > +        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + nb_hp_memslots + 2);
> >  
> >      srat = malloc_high(srat_size);
> >      if (!srat) {
> > @@ -667,7 +786,7 @@ build_srat(void)
> >       * from 640k-1M and possibly another one from 3.5G-4G.
> >       */
> >      struct srat_memory_affinity *numamem = (void*)core;
> > -    int slots = 0;
> > +    int slots = 0, node;
> >      u64 mem_len, mem_base, next_base = 0;
> >  
> >      acpi_build_srat_memory(numamem, 0, 640*1024, 0, 1);
> > @@ -694,10 +813,36 @@ build_srat(void)
> >              next_base += (1ULL << 32) - RamSize;
> >          }
> >          acpi_build_srat_memory(numamem, mem_base, mem_len, i-1, 1);
> > +
> >          numamem++;
> >          slots++;
> > +
> > +    }
> > +    mem = (void*)numamem;
> > +
> > +    if (nb_hp_memslots) {
> > +        u64 *hpmemdata = malloc_tmphigh(sizeof(u64) * (3 * nb_hp_memslots));
> > +        if (!hpmemdata) {
> > +            warn_noalloc();
> > +            free(hpmemdata);
> > +            free(numadata);
> > +            return NULL;
> > +        }
> > +
> > +        qemu_cfg_get_numa_data(hpmemdata, 3 * nb_hp_memslots);
> > +
> > +        for (i = 1; i < nb_hp_memslots + 1; ++i) {
> > +            mem_base = *hpmemdata++;
> > +            mem_len = *hpmemdata++;
> > +            node = *hpmemdata++;
> > +            acpi_build_srat_memory(numamem, mem_base, mem_len, node, 1);
> > +            numamem++;
> > +            slots++;
> > +        }
> > +        free(hpmemdata);
> >      }
> > -    for (; slots < nb_numa_nodes + 2; slots++) {
> > +
> > +    for (; slots < nb_numa_nodes + nb_hp_memslots + 2; slots++) {
> >          acpi_build_srat_memory(numamem, 0, 0, 0, 0);
> >          numamem++;
> >      }
> > @@ -748,6 +893,7 @@ acpi_bios_init(void)
> >      ACPI_INIT_TABLE(build_madt());
> >      ACPI_INIT_TABLE(build_hpet());
> >      ACPI_INIT_TABLE(build_srat());
> > +    ACPI_INIT_TABLE(build_memssdt());
> >      ACPI_INIT_TABLE(build_pcihp());
> >  
> >      u16 i, external_tables = qemu_cfg_acpi_additional_tables();
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 04/21][SeaBIOS] acpi: generate hotplug memory devices
@ 2012-07-11 16:39       ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 16:39 UTC (permalink / raw)
  To: Wen Congyang
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

Hi,

On Wed, Jul 11, 2012 at 06:48:38PM +0800, Wen Congyang wrote:
> > +        if (enabled)
> > +            add_e820(mem_base, mem_len, E820_RAM);
> 
> add_e820() is declared in memmap.h. You should include this header file,
> otherwise, seabios cannot be built.

thanks. you had the same comment on v1 but I forgot to address it. I will
update.

- Vasilis
> 
> Thanks
> Wen Congyang
> 
> > +        memslot_status = memslot_status >> 1;
> > +        entry++;
> > +    }
> > +    build_header((void*)ssdt, SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
> > +
> > +    return ssdt;
> > +}
> > +
> >  #include "ssdt-pcihp.hex"
> >  
> >  #define PCI_RMV_BASE 0xae0c
> > @@ -618,9 +739,6 @@ build_srat(void)
> >  {
> >      int nb_numa_nodes = qemu_cfg_get_numa_nodes();
> >  
> > -    if (nb_numa_nodes == 0)
> > -        return NULL;
> > -
> >      u64 *numadata = malloc_tmphigh(sizeof(u64) * (MaxCountCPUs + nb_numa_nodes));
> >      if (!numadata) {
> >          warn_noalloc();
> > @@ -629,10 +747,11 @@ build_srat(void)
> >  
> >      qemu_cfg_get_numa_data(numadata, MaxCountCPUs + nb_numa_nodes);
> >  
> > +    qemu_cfg_get_numa_data(&nb_hp_memslots, 1);
> >      struct system_resource_affinity_table *srat;
> >      int srat_size = sizeof(*srat) +
> >          sizeof(struct srat_processor_affinity) * MaxCountCPUs +
> > -        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + 2);
> > +        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + nb_hp_memslots + 2);
> >  
> >      srat = malloc_high(srat_size);
> >      if (!srat) {
> > @@ -667,7 +786,7 @@ build_srat(void)
> >       * from 640k-1M and possibly another one from 3.5G-4G.
> >       */
> >      struct srat_memory_affinity *numamem = (void*)core;
> > -    int slots = 0;
> > +    int slots = 0, node;
> >      u64 mem_len, mem_base, next_base = 0;
> >  
> >      acpi_build_srat_memory(numamem, 0, 640*1024, 0, 1);
> > @@ -694,10 +813,36 @@ build_srat(void)
> >              next_base += (1ULL << 32) - RamSize;
> >          }
> >          acpi_build_srat_memory(numamem, mem_base, mem_len, i-1, 1);
> > +
> >          numamem++;
> >          slots++;
> > +
> > +    }
> > +    mem = (void*)numamem;
> > +
> > +    if (nb_hp_memslots) {
> > +        u64 *hpmemdata = malloc_tmphigh(sizeof(u64) * (3 * nb_hp_memslots));
> > +        if (!hpmemdata) {
> > +            warn_noalloc();
> > +            free(hpmemdata);
> > +            free(numadata);
> > +            return NULL;
> > +        }
> > +
> > +        qemu_cfg_get_numa_data(hpmemdata, 3 * nb_hp_memslots);
> > +
> > +        for (i = 1; i < nb_hp_memslots + 1; ++i) {
> > +            mem_base = *hpmemdata++;
> > +            mem_len = *hpmemdata++;
> > +            node = *hpmemdata++;
> > +            acpi_build_srat_memory(numamem, mem_base, mem_len, node, 1);
> > +            numamem++;
> > +            slots++;
> > +        }
> > +        free(hpmemdata);
> >      }
> > -    for (; slots < nb_numa_nodes + 2; slots++) {
> > +
> > +    for (; slots < nb_numa_nodes + nb_hp_memslots + 2; slots++) {
> >          acpi_build_srat_memory(numamem, 0, 0, 0, 0);
> >          numamem++;
> >      }
> > @@ -748,6 +893,7 @@ acpi_bios_init(void)
> >      ACPI_INIT_TABLE(build_madt());
> >      ACPI_INIT_TABLE(build_hpet());
> >      ACPI_INIT_TABLE(build_srat());
> > +    ACPI_INIT_TABLE(build_memssdt());
> >      ACPI_INIT_TABLE(build_pcihp());
> >  
> >      u16 i, external_tables = qemu_cfg_acpi_additional_tables();
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH v2 05/21][SeaBIOS] pciinit: Fix pcimem_start value
  2012-07-11 11:56     ` [Qemu-devel] " Gerd Hoffmann
@ 2012-07-11 16:45       ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 16:45 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: qemu-devel, kvm, seabios, avi, anthony, gleb, imammedo, kevin, wency

Hi,

On Wed, Jul 11, 2012 at 01:56:19PM +0200, Gerd Hoffmann wrote:
> On 07/11/12 12:31, Vasilis Liaskovitis wrote:
> > In order to hotplug memory between RamSize and BUILD_PCIMEM_START, the pci
> > window needs to start at BUILD_PCIMEM_START (0xe0000000).
> > Otherwise, the guest cannot online new dimms at those ranges due to pci_root
> > window conflicts. (workaround for linux guest is booting with pci=nocrs)
> 
> >  static void pci_bios_map_devices(struct pci_bus *busses)
> >  {
> > -    pcimem_start = RamSize;
> > +    pcimem_start = BUILD_PCIMEM_START;
> 
> It isn't that simple.  For the 32bit pci window it will work, but will
> leaves address space unused instead of assigning it to the 32bit pci
> window.  For the 64bit pci window it will not work.
> 
> You have to walk the dimms and figure what the highest used address is,
> for both below-4g and above-4g.  Then fill two variable with it and make
> the pci init code use that instead of RamSize and RamSizeOver4G.

I see. I already have these values values computed in qemu-kvm, so I can pass
them in a paravirt struct, or infer them from the dimm/srat paravirt info that I
already pass to seabios. 

If i understand correctly, we would like the pcimem windows to use the maximum
possible address space (constrained by the exact dimms/ranges which are defined)
instead of leaving unused space.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 05/21][SeaBIOS] pciinit: Fix pcimem_start value
@ 2012-07-11 16:45       ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 16:45 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

Hi,

On Wed, Jul 11, 2012 at 01:56:19PM +0200, Gerd Hoffmann wrote:
> On 07/11/12 12:31, Vasilis Liaskovitis wrote:
> > In order to hotplug memory between RamSize and BUILD_PCIMEM_START, the pci
> > window needs to start at BUILD_PCIMEM_START (0xe0000000).
> > Otherwise, the guest cannot online new dimms at those ranges due to pci_root
> > window conflicts. (workaround for linux guest is booting with pci=nocrs)
> 
> >  static void pci_bios_map_devices(struct pci_bus *busses)
> >  {
> > -    pcimem_start = RamSize;
> > +    pcimem_start = BUILD_PCIMEM_START;
> 
> It isn't that simple.  For the 32bit pci window it will work, but will
> leaves address space unused instead of assigning it to the 32bit pci
> window.  For the 64bit pci window it will not work.
> 
> You have to walk the dimms and figure what the highest used address is,
> for both below-4g and above-4g.  Then fill two variable with it and make
> the pci init code use that instead of RamSize and RamSizeOver4G.

I see. I already have these values values computed in qemu-kvm, so I can pass
them in a paravirt struct, or infer them from the dimm/srat paravirt info that I
already pass to seabios. 

If i understand correctly, we would like the pcimem windows to use the maximum
possible address space (constrained by the exact dimms/ranges which are defined)
instead of leaving unused space.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 13/21] Implement memory hotplug notification lists
  2012-07-11 14:59     ` Eric Blake
@ 2012-07-11 16:47       ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 16:47 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, kvm, seabios, gleb, kevin, avi, anthony, imammedo

Hi,

On Wed, Jul 11, 2012 at 08:59:03AM -0600, Eric Blake wrote:
> On 07/11/2012 04:31 AM, Vasilis Liaskovitis wrote:
> > Guest can respond to ACPI hotplug events e.g. with _EJ or _OST method.
> > This patch implements a tail queue to store guest notifications for memory
> > hot-add and hot-remove requests.
> > 
> > Guest responses for memory hotplug command on a per-dimm basis can be detected
> > with the new hmp command "info memhp" or the new qmp command "query-memhp"
> > Examples:
> >     
> 
> > +++ b/qapi-schema.json
> > @@ -1862,3 +1862,29 @@
> >  # Since: 0.14.0
> >  ##
> >  { 'command': 'netdev_del', 'data': {'id': 'str'} }
> > +
> > +##
> > +# @MemHpInfo:
> > +#
> > +# Information about status of a memory hotplug command
> > +#
> > +# @Dimm: the Dimm associated with the result
> > +#
> > +# @result: the result of the hotplug command
> > +#
> > +# Since: 1.1.3
> 
> Should probably be 1.2, not 1.1.3.
>

right

> > +#
> > +##
> > +{ 'type': 'MemHpInfo',
> > +  'data': {'Dimm': 'str', 'request': 'str', 'result': 'str'} }
> 
> Why the upper case?  Wouldn't 'dimm' be more consistent?
>

I will change to "dimm"

> > +
> > +##
> > +# @query-memhp:
> 
> Why are we abbreviating?  It might be better to name the QMP command
> query-memory-hotplug
> 

agreed, memhp is a bit cryptic. I will change to your suggestion 

> > +#
> > +# Returns a list of information about pending hotplug commands
> > +#
> > +# Returns: a list of @MemhpInfo
> > +#
> > +# Since: 1.1.3
> 
> Likewise for 1.2.

right

> 
> > +
> > +- "Dimm": Dimm name (json-str)
> > +- "request": type of hot request: hot-add or hot-remove  (json-str)
> > +- "result": result of the hotplug request for this Dimm success or failure (json-str)
> 
> This may need tweaks (such as s/Dimm/dimm/) based on resolution of above
> comments.
ok, it will be "dimm"

thanks,

- Vasilis


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 13/21] Implement memory hotplug notification lists
@ 2012-07-11 16:47       ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 16:47 UTC (permalink / raw)
  To: Eric Blake; +Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

Hi,

On Wed, Jul 11, 2012 at 08:59:03AM -0600, Eric Blake wrote:
> On 07/11/2012 04:31 AM, Vasilis Liaskovitis wrote:
> > Guest can respond to ACPI hotplug events e.g. with _EJ or _OST method.
> > This patch implements a tail queue to store guest notifications for memory
> > hot-add and hot-remove requests.
> > 
> > Guest responses for memory hotplug command on a per-dimm basis can be detected
> > with the new hmp command "info memhp" or the new qmp command "query-memhp"
> > Examples:
> >     
> 
> > +++ b/qapi-schema.json
> > @@ -1862,3 +1862,29 @@
> >  # Since: 0.14.0
> >  ##
> >  { 'command': 'netdev_del', 'data': {'id': 'str'} }
> > +
> > +##
> > +# @MemHpInfo:
> > +#
> > +# Information about status of a memory hotplug command
> > +#
> > +# @Dimm: the Dimm associated with the result
> > +#
> > +# @result: the result of the hotplug command
> > +#
> > +# Since: 1.1.3
> 
> Should probably be 1.2, not 1.1.3.
>

right

> > +#
> > +##
> > +{ 'type': 'MemHpInfo',
> > +  'data': {'Dimm': 'str', 'request': 'str', 'result': 'str'} }
> 
> Why the upper case?  Wouldn't 'dimm' be more consistent?
>

I will change to "dimm"

> > +
> > +##
> > +# @query-memhp:
> 
> Why are we abbreviating?  It might be better to name the QMP command
> query-memory-hotplug
> 

agreed, memhp is a bit cryptic. I will change to your suggestion 

> > +#
> > +# Returns a list of information about pending hotplug commands
> > +#
> > +# Returns: a list of @MemhpInfo
> > +#
> > +# Since: 1.1.3
> 
> Likewise for 1.2.

right

> 
> > +
> > +- "Dimm": Dimm name (json-str)
> > +- "request": type of hot request: hot-add or hot-remove  (json-str)
> > +- "result": result of the hotplug request for this Dimm success or failure (json-str)
> 
> This may need tweaks (such as s/Dimm/dimm/) based on resolution of above
> comments.
ok, it will be "dimm"

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 19/21] Implement "info memtotal" and "query-memtotal"
  2012-07-11 15:14     ` [Qemu-devel] " Eric Blake
@ 2012-07-11 16:55       ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 16:55 UTC (permalink / raw)
  To: Eric Blake; +Cc: kvm, seabios, qemu-devel, avi

Hi,

On Wed, Jul 11, 2012 at 09:14:29AM -0600, Eric Blake wrote:
> On 07/11/2012 04:32 AM, Vasilis Liaskovitis wrote:
> > Returns total memory of guest in bytes, including hotplugged memory.
> > 
> > Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> 
> Should this instead be merged with query-balloon output, so that we have
> a single command that shows all aspects of memory usage (both balloon
> and hotplug at once)?
> 
> 
> > @@ -1888,3 +1888,15 @@
> >  # Since: 1.1.3
> >  ##
> >  { 'command': 'query-memhp', 'returns': ['MemHpInfo'] }
> > +
> > +##
> > +# @query-memtotal:
> 
> A more generic name might be 'query-memory', especially if we merge
> balloon and hotplug information into one command.

"query-memory" sounds reasonable to me.

"query-balloon" should also be updated to show the correct memory. 
Do you foresee any issues with merging them?  the "query-memory" command
should work independently of the balloon driver.

> > +#
> > +# Returns total memory in bytes, including hotplugged dimms
> > +#
> > +# Returns: a l
> 
> truncated

sorry about that.

thanks,

- Vasilis

> 
> -- 
> Eric Blake   eblake@redhat.com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 
> 
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 19/21] Implement "info memtotal" and "query-memtotal"
@ 2012-07-11 16:55       ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 16:55 UTC (permalink / raw)
  To: Eric Blake; +Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

Hi,

On Wed, Jul 11, 2012 at 09:14:29AM -0600, Eric Blake wrote:
> On 07/11/2012 04:32 AM, Vasilis Liaskovitis wrote:
> > Returns total memory of guest in bytes, including hotplugged memory.
> > 
> > Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> 
> Should this instead be merged with query-balloon output, so that we have
> a single command that shows all aspects of memory usage (both balloon
> and hotplug at once)?
> 
> 
> > @@ -1888,3 +1888,15 @@
> >  # Since: 1.1.3
> >  ##
> >  { 'command': 'query-memhp', 'returns': ['MemHpInfo'] }
> > +
> > +##
> > +# @query-memtotal:
> 
> A more generic name might be 'query-memory', especially if we merge
> balloon and hotplug information into one command.

"query-memory" sounds reasonable to me.

"query-balloon" should also be updated to show the correct memory. 
Do you foresee any issues with merging them?  the "query-memory" command
should work independently of the balloon driver.

> > +#
> > +# Returns total memory in bytes, including hotplugged dimms
> > +#
> > +# Returns: a l
> 
> truncated

sorry about that.

thanks,

- Vasilis

> 
> -- 
> Eric Blake   eblake@redhat.com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 
> 
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH v2 20/21] Implement -dimms, -dimmspop command line options
  2012-07-11 14:55     ` [Qemu-devel] " Avi Kivity
@ 2012-07-11 16:57       ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 16:57 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, gleb, seabios, qemu-devel, kevin, anthony, imammedo

Hi,

On Wed, Jul 11, 2012 at 05:55:25PM +0300, Avi Kivity wrote:
> On 07/11/2012 01:32 PM, Vasilis Liaskovitis wrote:
> > Implement batch dimm creation command line options. These could be useful for
> > not bloating the command line with a large number of dimms.
> 
> IMO this is unneeded.  With a management tool there is no problem
> generating a long command line; from the command line -dimm will be a
> rarely used option.

ok, I thought so. I guess this patch and the next are unwanted, unless there is a
strong opinion for using them coming from others.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 20/21] Implement -dimms, -dimmspop command line options
@ 2012-07-11 16:57       ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-11 16:57 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, gleb, seabios, qemu-devel, kevin, anthony, imammedo

Hi,

On Wed, Jul 11, 2012 at 05:55:25PM +0300, Avi Kivity wrote:
> On 07/11/2012 01:32 PM, Vasilis Liaskovitis wrote:
> > Implement batch dimm creation command line options. These could be useful for
> > not bloating the command line with a large number of dimms.
> 
> IMO this is unneeded.  With a management tool there is no problem
> generating a long command line; from the command line -dimm will be a
> rarely used option.

ok, I thought so. I guess this patch and the next are unwanted, unless there is a
strong opinion for using them coming from others.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH v2 05/21][SeaBIOS] pciinit: Fix pcimem_start value
  2012-07-11 16:45       ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-12  7:22         ` Gerd Hoffmann
  -1 siblings, 0 replies; 86+ messages in thread
From: Gerd Hoffmann @ 2012-07-12  7:22 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

On 07/11/12 18:45, Vasilis Liaskovitis wrote:
> Hi,
> 
> On Wed, Jul 11, 2012 at 01:56:19PM +0200, Gerd Hoffmann wrote:
>> On 07/11/12 12:31, Vasilis Liaskovitis wrote:
>>> In order to hotplug memory between RamSize and BUILD_PCIMEM_START, the pci
>>> window needs to start at BUILD_PCIMEM_START (0xe0000000).
>>> Otherwise, the guest cannot online new dimms at those ranges due to pci_root
>>> window conflicts. (workaround for linux guest is booting with pci=nocrs)
>>
>>>  static void pci_bios_map_devices(struct pci_bus *busses)
>>>  {
>>> -    pcimem_start = RamSize;
>>> +    pcimem_start = BUILD_PCIMEM_START;
>>
>> It isn't that simple.  For the 32bit pci window it will work, but will
>> leaves address space unused instead of assigning it to the 32bit pci
>> window.  For the 64bit pci window it will not work.
>>
>> You have to walk the dimms and figure what the highest used address is,
>> for both below-4g and above-4g.  Then fill two variable with it and make
>> the pci init code use that instead of RamSize and RamSizeOver4G.
> 
> I see. I already have these values values computed in qemu-kvm, so I can pass
> them in a paravirt struct, or infer them from the dimm/srat paravirt info that I
> already pass to seabios. 

I'd suggest to infer from the dimm info, to limit the amout of
information which needs to be passed from qemu to seabios.

> If i understand correctly, we would like the pcimem windows to use the maximum
> possible address space (constrained by the exact dimms/ranges which are defined)
> instead of leaving unused space.

Yes, for the 32bit pci window.

The 64bit pci window is mapped above all memory, and it must likewise
consider defined+unfilled dimms so the start address doesn't collide
with memory hot-plugged above 4G later on.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 05/21][SeaBIOS] pciinit: Fix pcimem_start value
@ 2012-07-12  7:22         ` Gerd Hoffmann
  0 siblings, 0 replies; 86+ messages in thread
From: Gerd Hoffmann @ 2012-07-12  7:22 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

On 07/11/12 18:45, Vasilis Liaskovitis wrote:
> Hi,
> 
> On Wed, Jul 11, 2012 at 01:56:19PM +0200, Gerd Hoffmann wrote:
>> On 07/11/12 12:31, Vasilis Liaskovitis wrote:
>>> In order to hotplug memory between RamSize and BUILD_PCIMEM_START, the pci
>>> window needs to start at BUILD_PCIMEM_START (0xe0000000).
>>> Otherwise, the guest cannot online new dimms at those ranges due to pci_root
>>> window conflicts. (workaround for linux guest is booting with pci=nocrs)
>>
>>>  static void pci_bios_map_devices(struct pci_bus *busses)
>>>  {
>>> -    pcimem_start = RamSize;
>>> +    pcimem_start = BUILD_PCIMEM_START;
>>
>> It isn't that simple.  For the 32bit pci window it will work, but will
>> leaves address space unused instead of assigning it to the 32bit pci
>> window.  For the 64bit pci window it will not work.
>>
>> You have to walk the dimms and figure what the highest used address is,
>> for both below-4g and above-4g.  Then fill two variable with it and make
>> the pci init code use that instead of RamSize and RamSizeOver4G.
> 
> I see. I already have these values values computed in qemu-kvm, so I can pass
> them in a paravirt struct, or infer them from the dimm/srat paravirt info that I
> already pass to seabios. 

I'd suggest to infer from the dimm info, to limit the amout of
information which needs to be passed from qemu to seabios.

> If i understand correctly, we would like the pcimem windows to use the maximum
> possible address space (constrained by the exact dimms/ranges which are defined)
> instead of leaving unused space.

Yes, for the 32bit pci window.

The 64bit pci window is mapped above all memory, and it must likewise
consider defined+unfilled dimms so the start address doesn't collide
with memory hot-plugged above 4G later on.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH v2 05/21][SeaBIOS] pciinit: Fix pcimem_start value
  2012-07-12  7:22         ` [Qemu-devel] " Gerd Hoffmann
@ 2012-07-12  9:09           ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-12  9:09 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: qemu-devel, kvm, seabios, avi, anthony, gleb, imammedo, kevin, wency

On Thu, Jul 12, 2012 at 09:22:14AM +0200, Gerd Hoffmann wrote:
> On 07/11/12 18:45, Vasilis Liaskovitis wrote:
> > Hi,
> > 
> > On Wed, Jul 11, 2012 at 01:56:19PM +0200, Gerd Hoffmann wrote:
> >> On 07/11/12 12:31, Vasilis Liaskovitis wrote:
> >>> In order to hotplug memory between RamSize and BUILD_PCIMEM_START, the pci
> >>> window needs to start at BUILD_PCIMEM_START (0xe0000000).
> >>> Otherwise, the guest cannot online new dimms at those ranges due to pci_root
> >>> window conflicts. (workaround for linux guest is booting with pci=nocrs)
> >>
> >>>  static void pci_bios_map_devices(struct pci_bus *busses)
> >>>  {
> >>> -    pcimem_start = RamSize;
> >>> +    pcimem_start = BUILD_PCIMEM_START;
> >>
> >> It isn't that simple.  For the 32bit pci window it will work, but will
> >> leaves address space unused instead of assigning it to the 32bit pci
> >> window.  For the 64bit pci window it will not work.
> >>
> >> You have to walk the dimms and figure what the highest used address is,
> >> for both below-4g and above-4g.  Then fill two variable with it and make
> >> the pci init code use that instead of RamSize and RamSizeOver4G.
> > 
> > I see. I already have these values values computed in qemu-kvm, so I can pass
> > them in a paravirt struct, or infer them from the dimm/srat paravirt info that I
> > already pass to seabios. 
> 
> I'd suggest to infer from the dimm info, to limit the amout of
> information which needs to be passed from qemu to seabios.

ok.Currently dimm info is processed in bios_init_tables(), which is called after
pci_setup(). I 'll see if i can do the processing earlier.

> 
> > If i understand correctly, we would like the pcimem windows to use the maximum
> > possible address space (constrained by the exact dimms/ranges which are defined)
> > instead of leaving unused space.
> 
> Yes, for the 32bit pci window.
> 
> The 64bit pci window is mapped above all memory, and it must likewise
> consider defined+unfilled dimms so the start address doesn't collide
> with memory hot-plugged above 4G later on.

yes, understood.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 05/21][SeaBIOS] pciinit: Fix pcimem_start value
@ 2012-07-12  9:09           ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-12  9:09 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

On Thu, Jul 12, 2012 at 09:22:14AM +0200, Gerd Hoffmann wrote:
> On 07/11/12 18:45, Vasilis Liaskovitis wrote:
> > Hi,
> > 
> > On Wed, Jul 11, 2012 at 01:56:19PM +0200, Gerd Hoffmann wrote:
> >> On 07/11/12 12:31, Vasilis Liaskovitis wrote:
> >>> In order to hotplug memory between RamSize and BUILD_PCIMEM_START, the pci
> >>> window needs to start at BUILD_PCIMEM_START (0xe0000000).
> >>> Otherwise, the guest cannot online new dimms at those ranges due to pci_root
> >>> window conflicts. (workaround for linux guest is booting with pci=nocrs)
> >>
> >>>  static void pci_bios_map_devices(struct pci_bus *busses)
> >>>  {
> >>> -    pcimem_start = RamSize;
> >>> +    pcimem_start = BUILD_PCIMEM_START;
> >>
> >> It isn't that simple.  For the 32bit pci window it will work, but will
> >> leaves address space unused instead of assigning it to the 32bit pci
> >> window.  For the 64bit pci window it will not work.
> >>
> >> You have to walk the dimms and figure what the highest used address is,
> >> for both below-4g and above-4g.  Then fill two variable with it and make
> >> the pci init code use that instead of RamSize and RamSizeOver4G.
> > 
> > I see. I already have these values values computed in qemu-kvm, so I can pass
> > them in a paravirt struct, or infer them from the dimm/srat paravirt info that I
> > already pass to seabios. 
> 
> I'd suggest to infer from the dimm info, to limit the amout of
> information which needs to be passed from qemu to seabios.

ok.Currently dimm info is processed in bios_init_tables(), which is called after
pci_setup(). I 'll see if i can do the processing earlier.

> 
> > If i understand correctly, we would like the pcimem windows to use the maximum
> > possible address space (constrained by the exact dimms/ranges which are defined)
> > instead of leaving unused space.
> 
> Yes, for the 32bit pci window.
> 
> The 64bit pci window is mapped above all memory, and it must likewise
> consider defined+unfilled dimms so the start address doesn't collide
> with memory hot-plugged above 4G later on.

yes, understood.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH v2 09/21] pc: Add dimm paravirt SRAT info
  2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-12 19:48     ` Blue Swirl
  -1 siblings, 0 replies; 86+ messages in thread
From: Blue Swirl @ 2012-07-12 19:48 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
<vasilis.liaskovitis@profitbricks.com> wrote:
> The numa_fw_cfg paravirt interface is extended to include SRAT information for
> all hotplug-able dimms. There are 3 words for each hotplug-able memory slot,
> denoting start address, size and node proximity. The new info is appended after
> existing numa info, so that the fw_cfg layout does not break.  This information
> is used by Seabios to build hotplug memory device objects at runtime.
> nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info
> to SeaBIOS.
>
> v1->v2:
> Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order not
> to break existing layout
> Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt
>
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> ---
>  docs/specs/fwcfg.txt |   28 ++++++++++++++++++++++++++
>  hw/pc.c              |   53 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  vl.c                 |    2 +-
>  3 files changed, 80 insertions(+), 3 deletions(-)
>  create mode 100644 docs/specs/fwcfg.txt
>
> diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt
> new file mode 100644
> index 0000000..e6fcd8f
> --- /dev/null
> +++ b/docs/specs/fwcfg.txt
> @@ -0,0 +1,28 @@
> +QEMU<->BIOS Paravirt Documentation
> +--------------------------------------
> +
> +This document describes paravirt data structures passed from QEMU to BIOS.
> +
> +fw_cfg SRAT paravirt info
> +--------------------
> +The SRAT info passed from QEMU to BIOS has the following layout:
> +
> +-----------------------------------------------------------------------------------------------
> +#nodes | cpu0_pxm | cpu1_pxm | ... | cpulast_pxm | node0_mem | node1_mem | ... | nodelast_mem
> +
> +-----------------------------------------------------------------------------------------------
> +#dimms | dimm0_start | dimm0_sz | dimm0_pxm | ... | dimmlast_start | dimmlast_sz | dimmlast_pxm
> +
> +Entry 0 contains the number of numa nodes (nb_numa_nodes).
> +
> +Entries 1..max_cpus: The next max_cpus entries describe node proximity for each
> +one of the vCPUs in the system.
> +
> +Entries max_cpus+1..max_cpus+nb_numa_nodes+1:  The next nb_numa_nodes entries
> +describe the memory size for each one of the NUMA nodes in the system.
> +
> +Entry max_cpus+nb_numa_nodes+1 contains the number of memory dimms (nb_hp_dimms)
> +
> +The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet contains
> +the physical address offset, size (in bytes), and node proximity for the
> +respective dimm.

The size and endianness are not specified, you are using LE 64 bit
values for each item.

> diff --git a/hw/pc.c b/hw/pc.c
> index ef9901a..cf651d0 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -598,12 +598,15 @@ int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
>      return index;
>  }
>
> +static void setup_hp_dimms(uint64_t *fw_cfg_slots);
> +
>  static void *bochs_bios_init(void)
>  {
>      void *fw_cfg;
>      uint8_t *smbios_table;
>      size_t smbios_len;
>      uint64_t *numa_fw_cfg;
> +    uint64_t *hp_dimms_fw_cfg;
>      int i, j;
>
>      register_ioport_write(0x400, 1, 2, bochs_bios_write, NULL);
> @@ -638,8 +641,10 @@ static void *bochs_bios_init(void)
>      /* allocate memory for the NUMA channel: one (64bit) word for the number
>       * of nodes, one word for each VCPU->node and one word for each node to
>       * hold the amount of memory.
> +     * Finally one word for the number of hotplug memory slots and three words
> +     * for each hotplug memory slot (start address, size and node proximity).
>       */
> -    numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8);
> +    numa_fw_cfg = g_malloc0((2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8);
>      numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
>      for (i = 0; i < max_cpus; i++) {
>          for (j = 0; j < nb_numa_nodes; j++) {
> @@ -652,8 +657,15 @@ static void *bochs_bios_init(void)
>      for (i = 0; i < nb_numa_nodes; i++) {
>          numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]);
>      }
> +
> +    numa_fw_cfg[1 + max_cpus + nb_numa_nodes] = cpu_to_le64(nb_hp_dimms);
> +
> +    hp_dimms_fw_cfg = numa_fw_cfg + 2 + max_cpus + nb_numa_nodes;
> +    if (nb_hp_dimms)
> +        setup_hp_dimms(hp_dimms_fw_cfg);

Braces.

> +
>      fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg,
> -                     (1 + max_cpus + nb_numa_nodes) * 8);
> +                     (2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8);
>
>      return fw_cfg;
>  }
> @@ -1223,3 +1235,40 @@ target_phys_addr_t pc_set_hp_memory_offset(uint64_t size)
>
>      return ret;
>  }
> +
> +static void setup_hp_dimms(uint64_t *fw_cfg_slots)
> +{
> +    int i = 0;
> +    Error *err = NULL;
> +    DeviceState *dev;
> +    DimmState *slot;
> +    const char *type;
> +    BusChild *kid;
> +    BusState *bus = sysbus_get_default();
> +
> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
> +        dev = kid->child;
> +        type = object_property_get_str(OBJECT(dev), "type", &err);
> +        if (err) {
> +            error_free(err);
> +            fprintf(stderr, "error getting device type\n");
> +            exit(1);
> +        }
> +
> +        if (!strcmp(type, "dimm")) {
> +            if (!dev->id) {
> +                fprintf(stderr, "error getting dimm device id\n");
> +                exit(1);
> +            }
> +            slot = DIMM(dev);
> +            /* determine starting physical address for this memory slot */
> +            assert(slot->start);
> +            fw_cfg_slots[3 * slot->idx] = cpu_to_le64(slot->start);
> +            fw_cfg_slots[3 * slot->idx + 1] = cpu_to_le64(slot->size);
> +            fw_cfg_slots[3 * slot->idx + 2] = cpu_to_le64(slot->node);
> +            i++;
> +        }
> +    }
> +    assert(i == nb_hp_dimms);
> +}
> +
> diff --git a/vl.c b/vl.c
> index 0ff8818..37c9798 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -2335,7 +2335,7 @@ int main(int argc, char **argv, char **envp)
>          node_cpumask[i] = 0;
>      }
>
> -    nb_numa_nodes = 0;
> +    nb_numa_nodes = 1;
>      nb_nics = 0;
>
>      autostart= 1;
> --
> 1.7.9
>
>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 09/21] pc: Add dimm paravirt SRAT info
@ 2012-07-12 19:48     ` Blue Swirl
  0 siblings, 0 replies; 86+ messages in thread
From: Blue Swirl @ 2012-07-12 19:48 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
<vasilis.liaskovitis@profitbricks.com> wrote:
> The numa_fw_cfg paravirt interface is extended to include SRAT information for
> all hotplug-able dimms. There are 3 words for each hotplug-able memory slot,
> denoting start address, size and node proximity. The new info is appended after
> existing numa info, so that the fw_cfg layout does not break.  This information
> is used by Seabios to build hotplug memory device objects at runtime.
> nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info
> to SeaBIOS.
>
> v1->v2:
> Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order not
> to break existing layout
> Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt
>
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> ---
>  docs/specs/fwcfg.txt |   28 ++++++++++++++++++++++++++
>  hw/pc.c              |   53 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  vl.c                 |    2 +-
>  3 files changed, 80 insertions(+), 3 deletions(-)
>  create mode 100644 docs/specs/fwcfg.txt
>
> diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt
> new file mode 100644
> index 0000000..e6fcd8f
> --- /dev/null
> +++ b/docs/specs/fwcfg.txt
> @@ -0,0 +1,28 @@
> +QEMU<->BIOS Paravirt Documentation
> +--------------------------------------
> +
> +This document describes paravirt data structures passed from QEMU to BIOS.
> +
> +fw_cfg SRAT paravirt info
> +--------------------
> +The SRAT info passed from QEMU to BIOS has the following layout:
> +
> +-----------------------------------------------------------------------------------------------
> +#nodes | cpu0_pxm | cpu1_pxm | ... | cpulast_pxm | node0_mem | node1_mem | ... | nodelast_mem
> +
> +-----------------------------------------------------------------------------------------------
> +#dimms | dimm0_start | dimm0_sz | dimm0_pxm | ... | dimmlast_start | dimmlast_sz | dimmlast_pxm
> +
> +Entry 0 contains the number of numa nodes (nb_numa_nodes).
> +
> +Entries 1..max_cpus: The next max_cpus entries describe node proximity for each
> +one of the vCPUs in the system.
> +
> +Entries max_cpus+1..max_cpus+nb_numa_nodes+1:  The next nb_numa_nodes entries
> +describe the memory size for each one of the NUMA nodes in the system.
> +
> +Entry max_cpus+nb_numa_nodes+1 contains the number of memory dimms (nb_hp_dimms)
> +
> +The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet contains
> +the physical address offset, size (in bytes), and node proximity for the
> +respective dimm.

The size and endianness are not specified, you are using LE 64 bit
values for each item.

> diff --git a/hw/pc.c b/hw/pc.c
> index ef9901a..cf651d0 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -598,12 +598,15 @@ int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
>      return index;
>  }
>
> +static void setup_hp_dimms(uint64_t *fw_cfg_slots);
> +
>  static void *bochs_bios_init(void)
>  {
>      void *fw_cfg;
>      uint8_t *smbios_table;
>      size_t smbios_len;
>      uint64_t *numa_fw_cfg;
> +    uint64_t *hp_dimms_fw_cfg;
>      int i, j;
>
>      register_ioport_write(0x400, 1, 2, bochs_bios_write, NULL);
> @@ -638,8 +641,10 @@ static void *bochs_bios_init(void)
>      /* allocate memory for the NUMA channel: one (64bit) word for the number
>       * of nodes, one word for each VCPU->node and one word for each node to
>       * hold the amount of memory.
> +     * Finally one word for the number of hotplug memory slots and three words
> +     * for each hotplug memory slot (start address, size and node proximity).
>       */
> -    numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8);
> +    numa_fw_cfg = g_malloc0((2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8);
>      numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
>      for (i = 0; i < max_cpus; i++) {
>          for (j = 0; j < nb_numa_nodes; j++) {
> @@ -652,8 +657,15 @@ static void *bochs_bios_init(void)
>      for (i = 0; i < nb_numa_nodes; i++) {
>          numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]);
>      }
> +
> +    numa_fw_cfg[1 + max_cpus + nb_numa_nodes] = cpu_to_le64(nb_hp_dimms);
> +
> +    hp_dimms_fw_cfg = numa_fw_cfg + 2 + max_cpus + nb_numa_nodes;
> +    if (nb_hp_dimms)
> +        setup_hp_dimms(hp_dimms_fw_cfg);

Braces.

> +
>      fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg,
> -                     (1 + max_cpus + nb_numa_nodes) * 8);
> +                     (2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8);
>
>      return fw_cfg;
>  }
> @@ -1223,3 +1235,40 @@ target_phys_addr_t pc_set_hp_memory_offset(uint64_t size)
>
>      return ret;
>  }
> +
> +static void setup_hp_dimms(uint64_t *fw_cfg_slots)
> +{
> +    int i = 0;
> +    Error *err = NULL;
> +    DeviceState *dev;
> +    DimmState *slot;
> +    const char *type;
> +    BusChild *kid;
> +    BusState *bus = sysbus_get_default();
> +
> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
> +        dev = kid->child;
> +        type = object_property_get_str(OBJECT(dev), "type", &err);
> +        if (err) {
> +            error_free(err);
> +            fprintf(stderr, "error getting device type\n");
> +            exit(1);
> +        }
> +
> +        if (!strcmp(type, "dimm")) {
> +            if (!dev->id) {
> +                fprintf(stderr, "error getting dimm device id\n");
> +                exit(1);
> +            }
> +            slot = DIMM(dev);
> +            /* determine starting physical address for this memory slot */
> +            assert(slot->start);
> +            fw_cfg_slots[3 * slot->idx] = cpu_to_le64(slot->start);
> +            fw_cfg_slots[3 * slot->idx + 1] = cpu_to_le64(slot->size);
> +            fw_cfg_slots[3 * slot->idx + 2] = cpu_to_le64(slot->node);
> +            i++;
> +        }
> +    }
> +    assert(i == nb_hp_dimms);
> +}
> +
> diff --git a/vl.c b/vl.c
> index 0ff8818..37c9798 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -2335,7 +2335,7 @@ int main(int argc, char **argv, char **envp)
>          node_cpumask[i] = 0;
>      }
>
> -    nb_numa_nodes = 0;
> +    nb_numa_nodes = 1;
>      nb_nics = 0;
>
>      autostart= 1;
> --
> 1.7.9
>
>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH v2 06/21] dimm: Implement memory device abstraction
  2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-12 19:55     ` Blue Swirl
  -1 siblings, 0 replies; 86+ messages in thread
From: Blue Swirl @ 2012-07-12 19:55 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
<vasilis.liaskovitis@profitbricks.com> wrote:
> Each hotplug-able memory slot is a SysBusDevice. A hot-add operation for a
> particular dimm creates a new MemoryRegion of the given physical address
> offset, size and node proximity, and attaches it to main system memory as a
> sub_region. A hot-remove operation detaches and frees the MemoryRegion from
> system memory.
>
> This prototype still lacks proper qdev integration: a separate
> hotplug side-channel is used and main system bus hotplug capability is
> ignored.
>
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> ---
>  hw/Makefile.objs |    2 +-
>  hw/dimm.c        |  234 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/dimm.h        |   58 +++++++++++++
>  3 files changed, 293 insertions(+), 1 deletions(-)
>  create mode 100644 hw/dimm.c
>  create mode 100644 hw/dimm.h
>
> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> index 3d77259..e2184bf 100644
> --- a/hw/Makefile.objs
> +++ b/hw/Makefile.objs
> @@ -26,7 +26,7 @@ hw-obj-$(CONFIG_I8254) += i8254_common.o i8254.o
>  hw-obj-$(CONFIG_PCSPK) += pcspk.o
>  hw-obj-$(CONFIG_PCKBD) += pckbd.o
>  hw-obj-$(CONFIG_FDC) += fdc.o
> -hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o
> +hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o dimm.o
>  hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
>  hw-obj-$(CONFIG_DMA) += dma.o
>  hw-obj-$(CONFIG_I82374) += i82374.o
> diff --git a/hw/dimm.c b/hw/dimm.c
> new file mode 100644
> index 0000000..00c4623
> --- /dev/null
> +++ b/hw/dimm.c
> @@ -0,0 +1,234 @@
> +/*
> + * Dimm device for Memory Hotplug
> + *
> + * Copyright ProfitBricks GmbH 2012
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>
> + */
> +
> +#include "trace.h"
> +#include "qdev.h"
> +#include "dimm.h"
> +#include <time.h>
> +#include "../exec-memory.h"
> +#include "qmp-commands.h"
> +
> +static DeviceState *dimm_hotplug_qdev;
> +static dimm_hotplug_fn dimm_hotplug;
> +static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;

Using global state does not look right. It should always be possible
to pass around structures to avoid it.

> +
> +static Property dimm_properties[] = {
> +    DEFINE_PROP_END_OF_LIST()
> +};
> +
> +void dimm_populate(DimmState *s)

All functions are global and exported but there does not seem to be
users. Please make all static which you can.

> +{
> +    DeviceState *dev= (DeviceState*)s;
> +    MemoryRegion *new = NULL;
> +
> +    new = g_malloc(sizeof(MemoryRegion));
> +    memory_region_init_ram(new, dev->id, s->size);
> +    vmstate_register_ram_global(new);
> +    memory_region_add_subregion(get_system_memory(), s->start, new);
> +    s->mr = new;
> +    s->populated = true;
> +}
> +
> +
> +void dimm_depopulate(DimmState *s)
> +{
> +    assert(s);
> +    if (s->populated) {
> +        vmstate_unregister_ram(s->mr, NULL);
> +        memory_region_del_subregion(get_system_memory(), s->mr);
> +        memory_region_destroy(s->mr);
> +        s->populated = false;
> +        s->mr = NULL;
> +    }
> +}
> +
> +DimmState *dimm_create(char *id, uint64_t size, uint64_t node, uint32_t
> +        dimm_idx, bool populated)
> +{
> +    DeviceState *dev;
> +    DimmState *mdev;
> +
> +    dev = sysbus_create_simple("dimm", -1, NULL);
> +    dev->id = id;
> +
> +    mdev = DIMM(dev);
> +    mdev->idx = dimm_idx;
> +    mdev->start = 0;
> +    mdev->size = size;
> +    mdev->node = node;
> +    mdev->populated = populated;
> +    QTAILQ_INSERT_TAIL(&dimmlist, mdev, nextdimm);
> +    return mdev;
> +}
> +
> +void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev)
> +{
> +    dimm_hotplug_qdev = qdev;
> +    dimm_hotplug = hotplug;
> +    dimm_scan_populated();
> +}
> +
> +void dimm_activate(DimmState *slot)
> +{
> +    dimm_populate(slot);
> +    if (dimm_hotplug)
> +        dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 1);

Why the cast?

Also braces, please check your patches with checkpatch.pl.

> +}
> +
> +void dimm_deactivate(DimmState *slot)
> +{
> +    if (dimm_hotplug)
> +        dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 0);
> +}
> +
> +DimmState *dimm_find_from_name(char *id)

const char *id?

> +{
> +    Error *err = NULL;
> +    DeviceState *qdev;
> +    const char *type;
> +    qdev = qdev_find_recursive(sysbus_get_default(), id);
> +    if (qdev) {
> +        type = object_property_get_str(OBJECT(qdev), "type", &err);
> +        if (!type) {
> +            return NULL;
> +        }
> +        if (!strcmp(type, "dimm")) {
> +            return DIMM(qdev);
> +        }
> +    }
> +    return NULL;
> +}
> +
> +int dimm_do(Monitor *mon, const QDict *qdict, bool add)
> +{
> +    DimmState *slot = NULL;
> +
> +    char *id = (char*) qdict_get_try_str(qdict, "id");

Why this cast?

> +    if (!id) {
> +        fprintf(stderr, "ERROR %s invalid id\n",__FUNCTION__);
> +        return 1;
> +    }
> +
> +    slot = dimm_find_from_name(id);
> +
> +    if (!slot) {
> +        fprintf(stderr, "%s no slot %s found\n", __FUNCTION__, id);
> +        return 1;
> +    }
> +
> +    if (add) {
> +        if (slot->populated) {
> +            fprintf(stderr, "ERROR %s slot %s already populated\n",
> +                    __FUNCTION__, id);
> +            return 1;
> +        }
> +        dimm_activate(slot);
> +    }
> +    else {
> +        if (!slot->populated) {
> +            fprintf(stderr, "ERROR %s slot %s is not populated\n",
> +                    __FUNCTION__, id);
> +            return 1;
> +        }
> +        dimm_deactivate(slot);
> +    }
> +
> +    return 0;
> +}
> +
> +DimmState *dimm_find_from_idx(uint32_t idx)
> +{
> +    DimmState *slot;
> +
> +    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
> +        if (slot->idx == idx) {
> +            return slot;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +/* used to calculate physical address offsets for all dimms */
> +void dimm_calc_offsets(dimm_calcoffset_fn calcfn)
> +{
> +    DimmState *slot;
> +    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
> +        if (!slot->start)
> +            slot->start = calcfn(slot->size);
> +    }
> +}
> +
> +/* used to populate and activate dimms at boot time */
> +void dimm_scan_populated(void)
> +{
> +    DimmState *slot;
> +    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
> +        if (slot->populated && !slot->mr) {
> +            dimm_activate(slot);
> +        }
> +    }
> +}
> +
> +void dimm_notify(uint32_t idx, uint32_t event)
> +{
> +    DimmState *s;
> +    s = dimm_find_from_idx(idx);
> +    assert(s != NULL);
> +
> +    switch(event) {
> +        case DIMM_REMOVE_SUCCESS:
> +            dimm_depopulate(s);
> +            break;
> +        default:
> +            break;
> +    }
> +}
> +
> +static int dimm_init(SysBusDevice *s)
> +{
> +    DimmState *slot;
> +    slot = DIMM(s);
> +    slot->mr = NULL;
> +    slot->populated = false;
> +    return 0;
> +}
> +
> +static void dimm_class_init(ObjectClass *klass, void *data)
> +{
> +    SysBusDeviceClass *sc = SYS_BUS_DEVICE_CLASS(klass);
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->props = dimm_properties;
> +    sc->init = dimm_init;
> +    dimm_hotplug = NULL;
> +    QTAILQ_INIT(&dimmlist);
> +}
> +
> +static TypeInfo dimm_info = {
> +    .name          = "dimm",
> +    .parent        = TYPE_SYS_BUS_DEVICE,
> +    .instance_size = sizeof(DimmState),
> +    .class_init    = dimm_class_init,
> +};
> +
> +static void dimm_register_types(void)
> +{
> +    type_register_static(&dimm_info);
> +}
> +
> +type_init(dimm_register_types)
> diff --git a/hw/dimm.h b/hw/dimm.h
> new file mode 100644
> index 0000000..643f319
> --- /dev/null
> +++ b/hw/dimm.h
> @@ -0,0 +1,58 @@
> +#ifndef QEMU_DIMM_H
> +#define QEMU_DIMM_H

Should be HW_DIMM_H.

> +
> +#include "qemu-common.h"
> +#include "memory.h"
> +#include "sysbus.h"
> +#include "qapi-types.h"
> +#include "qemu-queue.h"
> +#include "cpus.h"
> +#define MAX_DIMMS 255
> +#define DIMM_BITMAP_BYTES (MAX_DIMMS + 7) / 8
> +#define DEFAULT_DIMMSIZE 1024*1024*1024
> +
> +typedef enum {
> +    DIMM_REMOVE_SUCCESS = 0,
> +    DIMM_REMOVE_FAIL = 1,
> +    DIMM_ADD_SUCCESS = 2,
> +    DIMM_ADD_FAIL = 3
> +} dimm_hp_result_code;
> +
> +#define TYPE_DIMM "dimm"
> +#define DIMM(obj) \
> +    OBJECT_CHECK(DimmState, (obj), TYPE_DIMM)
> +#define DIMM_CLASS(klass) \
> +    OBJECT_CLASS_CHECK(DimmClass, (obj), TYPE_DIMM)
> +#define DIMM_GET_CLASS(obj) \
> +    OBJECT_GET_CLASS(DimmClass, (obj), TYPE_DIMM)
> +
> +typedef struct DimmState {
> +    SysBusDevice busdev;
> +    uint32_t idx; /* index in memory hotplug register/bitmap */
> +    ram_addr_t start; /* starting physical address */
> +    ram_addr_t size;
> +    uint32_t node; /* numa node proximity */
> +    MemoryRegion *mr; /* MemoryRegion for this slot. !NULL only if populated */
> +    bool populated; /* 1 means device has been hotplugged. Default is 0. */
> +    QTAILQ_ENTRY (DimmState) nextdimm;
> +} DimmState;
> +
> +typedef int (*dimm_hotplug_fn)(DeviceState *qdev, SysBusDevice *dev, int add);
> +typedef target_phys_addr_t (*dimm_calcoffset_fn)(uint64_t size);
> +
> +DimmState *dimm_create(char *id, uint64_t size, uint64_t node, uint32_t
> +        dimm_idx, bool populated);
> +void dimm_populate(DimmState *s);
> +void dimm_depopulate(DimmState *s);
> +int dimm_do(Monitor *mon, const QDict *qdict, bool add);
> +DimmState *dimm_find_from_idx(uint32_t idx);
> +DimmState *dimm_find_from_name(char *id);
> +void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev);
> +void dimm_calc_offsets(dimm_calcoffset_fn calcfn);
> +void dimm_activate(DimmState *slot);
> +void dimm_deactivate(DimmState *slot);
> +void dimm_scan_populated(void);
> +void dimm_notify(uint32_t idx, uint32_t event);
> +
> +
> +#endif
> --
> 1.7.9
>
>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 06/21] dimm: Implement memory device abstraction
@ 2012-07-12 19:55     ` Blue Swirl
  0 siblings, 0 replies; 86+ messages in thread
From: Blue Swirl @ 2012-07-12 19:55 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
<vasilis.liaskovitis@profitbricks.com> wrote:
> Each hotplug-able memory slot is a SysBusDevice. A hot-add operation for a
> particular dimm creates a new MemoryRegion of the given physical address
> offset, size and node proximity, and attaches it to main system memory as a
> sub_region. A hot-remove operation detaches and frees the MemoryRegion from
> system memory.
>
> This prototype still lacks proper qdev integration: a separate
> hotplug side-channel is used and main system bus hotplug capability is
> ignored.
>
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> ---
>  hw/Makefile.objs |    2 +-
>  hw/dimm.c        |  234 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/dimm.h        |   58 +++++++++++++
>  3 files changed, 293 insertions(+), 1 deletions(-)
>  create mode 100644 hw/dimm.c
>  create mode 100644 hw/dimm.h
>
> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> index 3d77259..e2184bf 100644
> --- a/hw/Makefile.objs
> +++ b/hw/Makefile.objs
> @@ -26,7 +26,7 @@ hw-obj-$(CONFIG_I8254) += i8254_common.o i8254.o
>  hw-obj-$(CONFIG_PCSPK) += pcspk.o
>  hw-obj-$(CONFIG_PCKBD) += pckbd.o
>  hw-obj-$(CONFIG_FDC) += fdc.o
> -hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o
> +hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o dimm.o
>  hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
>  hw-obj-$(CONFIG_DMA) += dma.o
>  hw-obj-$(CONFIG_I82374) += i82374.o
> diff --git a/hw/dimm.c b/hw/dimm.c
> new file mode 100644
> index 0000000..00c4623
> --- /dev/null
> +++ b/hw/dimm.c
> @@ -0,0 +1,234 @@
> +/*
> + * Dimm device for Memory Hotplug
> + *
> + * Copyright ProfitBricks GmbH 2012
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>
> + */
> +
> +#include "trace.h"
> +#include "qdev.h"
> +#include "dimm.h"
> +#include <time.h>
> +#include "../exec-memory.h"
> +#include "qmp-commands.h"
> +
> +static DeviceState *dimm_hotplug_qdev;
> +static dimm_hotplug_fn dimm_hotplug;
> +static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;

Using global state does not look right. It should always be possible
to pass around structures to avoid it.

> +
> +static Property dimm_properties[] = {
> +    DEFINE_PROP_END_OF_LIST()
> +};
> +
> +void dimm_populate(DimmState *s)

All functions are global and exported but there does not seem to be
users. Please make all static which you can.

> +{
> +    DeviceState *dev= (DeviceState*)s;
> +    MemoryRegion *new = NULL;
> +
> +    new = g_malloc(sizeof(MemoryRegion));
> +    memory_region_init_ram(new, dev->id, s->size);
> +    vmstate_register_ram_global(new);
> +    memory_region_add_subregion(get_system_memory(), s->start, new);
> +    s->mr = new;
> +    s->populated = true;
> +}
> +
> +
> +void dimm_depopulate(DimmState *s)
> +{
> +    assert(s);
> +    if (s->populated) {
> +        vmstate_unregister_ram(s->mr, NULL);
> +        memory_region_del_subregion(get_system_memory(), s->mr);
> +        memory_region_destroy(s->mr);
> +        s->populated = false;
> +        s->mr = NULL;
> +    }
> +}
> +
> +DimmState *dimm_create(char *id, uint64_t size, uint64_t node, uint32_t
> +        dimm_idx, bool populated)
> +{
> +    DeviceState *dev;
> +    DimmState *mdev;
> +
> +    dev = sysbus_create_simple("dimm", -1, NULL);
> +    dev->id = id;
> +
> +    mdev = DIMM(dev);
> +    mdev->idx = dimm_idx;
> +    mdev->start = 0;
> +    mdev->size = size;
> +    mdev->node = node;
> +    mdev->populated = populated;
> +    QTAILQ_INSERT_TAIL(&dimmlist, mdev, nextdimm);
> +    return mdev;
> +}
> +
> +void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev)
> +{
> +    dimm_hotplug_qdev = qdev;
> +    dimm_hotplug = hotplug;
> +    dimm_scan_populated();
> +}
> +
> +void dimm_activate(DimmState *slot)
> +{
> +    dimm_populate(slot);
> +    if (dimm_hotplug)
> +        dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 1);

Why the cast?

Also braces, please check your patches with checkpatch.pl.

> +}
> +
> +void dimm_deactivate(DimmState *slot)
> +{
> +    if (dimm_hotplug)
> +        dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 0);
> +}
> +
> +DimmState *dimm_find_from_name(char *id)

const char *id?

> +{
> +    Error *err = NULL;
> +    DeviceState *qdev;
> +    const char *type;
> +    qdev = qdev_find_recursive(sysbus_get_default(), id);
> +    if (qdev) {
> +        type = object_property_get_str(OBJECT(qdev), "type", &err);
> +        if (!type) {
> +            return NULL;
> +        }
> +        if (!strcmp(type, "dimm")) {
> +            return DIMM(qdev);
> +        }
> +    }
> +    return NULL;
> +}
> +
> +int dimm_do(Monitor *mon, const QDict *qdict, bool add)
> +{
> +    DimmState *slot = NULL;
> +
> +    char *id = (char*) qdict_get_try_str(qdict, "id");

Why this cast?

> +    if (!id) {
> +        fprintf(stderr, "ERROR %s invalid id\n",__FUNCTION__);
> +        return 1;
> +    }
> +
> +    slot = dimm_find_from_name(id);
> +
> +    if (!slot) {
> +        fprintf(stderr, "%s no slot %s found\n", __FUNCTION__, id);
> +        return 1;
> +    }
> +
> +    if (add) {
> +        if (slot->populated) {
> +            fprintf(stderr, "ERROR %s slot %s already populated\n",
> +                    __FUNCTION__, id);
> +            return 1;
> +        }
> +        dimm_activate(slot);
> +    }
> +    else {
> +        if (!slot->populated) {
> +            fprintf(stderr, "ERROR %s slot %s is not populated\n",
> +                    __FUNCTION__, id);
> +            return 1;
> +        }
> +        dimm_deactivate(slot);
> +    }
> +
> +    return 0;
> +}
> +
> +DimmState *dimm_find_from_idx(uint32_t idx)
> +{
> +    DimmState *slot;
> +
> +    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
> +        if (slot->idx == idx) {
> +            return slot;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +/* used to calculate physical address offsets for all dimms */
> +void dimm_calc_offsets(dimm_calcoffset_fn calcfn)
> +{
> +    DimmState *slot;
> +    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
> +        if (!slot->start)
> +            slot->start = calcfn(slot->size);
> +    }
> +}
> +
> +/* used to populate and activate dimms at boot time */
> +void dimm_scan_populated(void)
> +{
> +    DimmState *slot;
> +    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
> +        if (slot->populated && !slot->mr) {
> +            dimm_activate(slot);
> +        }
> +    }
> +}
> +
> +void dimm_notify(uint32_t idx, uint32_t event)
> +{
> +    DimmState *s;
> +    s = dimm_find_from_idx(idx);
> +    assert(s != NULL);
> +
> +    switch(event) {
> +        case DIMM_REMOVE_SUCCESS:
> +            dimm_depopulate(s);
> +            break;
> +        default:
> +            break;
> +    }
> +}
> +
> +static int dimm_init(SysBusDevice *s)
> +{
> +    DimmState *slot;
> +    slot = DIMM(s);
> +    slot->mr = NULL;
> +    slot->populated = false;
> +    return 0;
> +}
> +
> +static void dimm_class_init(ObjectClass *klass, void *data)
> +{
> +    SysBusDeviceClass *sc = SYS_BUS_DEVICE_CLASS(klass);
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->props = dimm_properties;
> +    sc->init = dimm_init;
> +    dimm_hotplug = NULL;
> +    QTAILQ_INIT(&dimmlist);
> +}
> +
> +static TypeInfo dimm_info = {
> +    .name          = "dimm",
> +    .parent        = TYPE_SYS_BUS_DEVICE,
> +    .instance_size = sizeof(DimmState),
> +    .class_init    = dimm_class_init,
> +};
> +
> +static void dimm_register_types(void)
> +{
> +    type_register_static(&dimm_info);
> +}
> +
> +type_init(dimm_register_types)
> diff --git a/hw/dimm.h b/hw/dimm.h
> new file mode 100644
> index 0000000..643f319
> --- /dev/null
> +++ b/hw/dimm.h
> @@ -0,0 +1,58 @@
> +#ifndef QEMU_DIMM_H
> +#define QEMU_DIMM_H

Should be HW_DIMM_H.

> +
> +#include "qemu-common.h"
> +#include "memory.h"
> +#include "sysbus.h"
> +#include "qapi-types.h"
> +#include "qemu-queue.h"
> +#include "cpus.h"
> +#define MAX_DIMMS 255
> +#define DIMM_BITMAP_BYTES (MAX_DIMMS + 7) / 8
> +#define DEFAULT_DIMMSIZE 1024*1024*1024
> +
> +typedef enum {
> +    DIMM_REMOVE_SUCCESS = 0,
> +    DIMM_REMOVE_FAIL = 1,
> +    DIMM_ADD_SUCCESS = 2,
> +    DIMM_ADD_FAIL = 3
> +} dimm_hp_result_code;
> +
> +#define TYPE_DIMM "dimm"
> +#define DIMM(obj) \
> +    OBJECT_CHECK(DimmState, (obj), TYPE_DIMM)
> +#define DIMM_CLASS(klass) \
> +    OBJECT_CLASS_CHECK(DimmClass, (obj), TYPE_DIMM)
> +#define DIMM_GET_CLASS(obj) \
> +    OBJECT_GET_CLASS(DimmClass, (obj), TYPE_DIMM)
> +
> +typedef struct DimmState {
> +    SysBusDevice busdev;
> +    uint32_t idx; /* index in memory hotplug register/bitmap */
> +    ram_addr_t start; /* starting physical address */
> +    ram_addr_t size;
> +    uint32_t node; /* numa node proximity */
> +    MemoryRegion *mr; /* MemoryRegion for this slot. !NULL only if populated */
> +    bool populated; /* 1 means device has been hotplugged. Default is 0. */
> +    QTAILQ_ENTRY (DimmState) nextdimm;
> +} DimmState;
> +
> +typedef int (*dimm_hotplug_fn)(DeviceState *qdev, SysBusDevice *dev, int add);
> +typedef target_phys_addr_t (*dimm_calcoffset_fn)(uint64_t size);
> +
> +DimmState *dimm_create(char *id, uint64_t size, uint64_t node, uint32_t
> +        dimm_idx, bool populated);
> +void dimm_populate(DimmState *s);
> +void dimm_depopulate(DimmState *s);
> +int dimm_do(Monitor *mon, const QDict *qdict, bool add);
> +DimmState *dimm_find_from_idx(uint32_t idx);
> +DimmState *dimm_find_from_name(char *id);
> +void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev);
> +void dimm_calc_offsets(dimm_calcoffset_fn calcfn);
> +void dimm_activate(DimmState *slot);
> +void dimm_deactivate(DimmState *slot);
> +void dimm_scan_populated(void);
> +void dimm_notify(uint32_t idx, uint32_t event);
> +
> +
> +#endif
> --
> 1.7.9
>
>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/21] ACPI memory hotplug
  2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-12 20:04   ` Blue Swirl
  -1 siblings, 0 replies; 86+ messages in thread
From: Blue Swirl @ 2012-07-12 20:04 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: qemu-devel, kvm, seabios, gleb, kevin, avi, anthony, imammedo

On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
<vasilis.liaskovitis@profitbricks.com> wrote:
> This is v2 of the ACPI memory hotplug prototype for x86_64 target.

I think the concept of DIMMs (what about SIMMs? SODIMMs? I liked
memslot) would be useful for most targets, but hotplugging may be
limited to x86 only. It would be nice to keep these two separate or as
loosely coupled as possible.

>
> Changes v1->v2
>
> - memory map is automatically calculated for hotplug dimms. Dimms are added from
> top-of-memory skipping the pci hole at [PCI_HOLE_START, 4G).
> - Renamed from "-memslot" to "-dimm". Commands changed to "dimm_add", "dimm_del".
> - Seabios ejection array reduced to a byte. Use extraction macros for dimm ssdt.
> - additional SRAT paravirt info does not break previous SRAT fw_cfg layout.
> - Documentation of new acpi_piix4 registers and paravirt data.
> - add ACPI _OST support for _OST enabled guests. This allows qemu to receive
> notification for success / failure of memory hot-add and hot-remove operations.
> Guest needs to support _OST (https://lkml.org/lkml/2012/6/25/321)
> - add monitor info command to report total guest memory (initial + hot-added)
> - add command line options and monitor commands for batch dimm creation/population
>
> Overview:
>
> Dimm devices are modeled with a new qemu command line
>
> "-dimm id=name,size=sz,node=pxm,populated=on|off"
>
> As already mentioned, the starting physical address for all dimms is calculated
> automatically from top of memory, skipping the pci hole at [PCI_HOLE_START, 4G).
> Node is defining numa proximity for this dimm. When not defined it defaults
> to zero.
> "-dimm id=dimm0,size=512M,node=0,populated=off"
> will define a 512M memory slot belonging to numa node 0.
>
> Dimms are added or removed with a new hmp command "dimm_add/dimm_del":
> Hot-add syntax: "dimm_add id"
> Hot-remove syntax: "dimm_del id"
>
> Issues:
>
> - Live migration works as long as populated field is changed to "on" for
> hotplugged dimms at the destination qemu command line (patch 12/21 lifts
> this requirement). The DimmState structure does not yet define a
> VMStateDescription, but i assume this is the preferred way to pass state
> for migration.
>
> - Dimms are abstracted as qdevices attached to the main system bus. However,
> memory hotplugging has its own side channel ignoring main_system_bus's hotplug
> incapability. A cleaner integration is still needed, probably attaching memory
> devices as children-links of an acpi-capable device (in the pc case acpi_piix4)
> instead of the system bus (TBD). Then device_add/device_del instead of new
> commands can hopefully be used.
>
> Comments/review welcome.
>
> series is based on uq/master for qemu-kvm, and master for seabios. Can be found
> also at:
> http://github.com/vliaskov/qemu-kvm/commits/memhp-v2
> http://github.com/vliaskov/seabios/commits/memhp-v2
>
> Vasilis Liaskovitis (14):
>   dimm: Implement memory device abstraction
>   acpi_piix4: Implement memory device hotplug registers
>   pc: calculate dimm physical addresses and adjust memory map
>   pc: Add dimm paravirt SRAT info
>   Implement "-dimm" command line option
>   Implement dimm_add and dimm_del commands for hmp and qmp
>   fix live-migration when "populated=on" is missing
>   Implement memory hotplug notification lists
>   acpi_piix4: _OST dimm support
>   acpi_piix4: Update dimm state on VM reboot
>   acpi_piix4: Update dimm bitmap state on hot-remove fail
>   Implement "info memtotal" and "query-memtotal"
>   Implement -dimms, -dimmspop command line options
>   Implement mem_increase, mem_decrease hmp/qmp commands
>
>  arch_init.c                 |   23 ++-
>  docs/specs/acpi_hotplug.txt |   46 +++++
>  docs/specs/fwcfg.txt        |   28 +++
>  hmp-commands.hx             |   67 +++++++
>  hmp.c                       |   24 +++
>  hmp.h                       |    2 +
>  hw/Makefile.objs            |    2 +-
>  hw/acpi_piix4.c             |  131 ++++++++++++-
>  hw/dimm.c                   |  449 +++++++++++++++++++++++++++++++++++++++++++
>  hw/dimm.h                   |   72 +++++++
>  hw/pc.c                     |   94 +++++++++-
>  hw/pc.h                     |    6 +
>  hw/pc_piix.c                |   18 ++-
>  monitor.c                   |   35 ++++
>  monitor.h                   |    5 +
>  qapi-schema.json            |   38 ++++
>  qemu-config.c               |   70 +++++++
>  qemu-options.hx             |   15 ++
>  qmp-commands.hx             |  137 +++++++++++++
>  sysemu.h                    |    1 +
>  vl.c                        |  122 ++++++++++++-
>  21 files changed, 1368 insertions(+), 17 deletions(-)
>  create mode 100644 docs/specs/acpi_hotplug.txt
>  create mode 100644 docs/specs/fwcfg.txt
>  create mode 100644 hw/dimm.c
>  create mode 100644 hw/dimm.h
>
> Vasilis Liaskovitis (7):
>   Add ACPI_EXTRACT_DEVICE* macros
>   Add SSDT memory device support
>   acpi-dsdt: Implement functions for memory hotplug.
>   acpi: generate hotplug memory devices.
>   pciinit: Fix pcimem_start value
>   acpi_dsdt: Support _OST dimm method
>   acpi_dsdt: Revert internal dimm state on _OST failure
>
>  Makefile              |    2 +-
>  src/acpi-dsdt.dsl     |  120 ++++++++++++++++++++++++++++++++++++-
>  src/acpi.c            |  158 +++++++++++++++++++++++++++++++++++++++++++++++--
>  src/pciinit.c         |    2 +-
>  src/ssdt-mem.dsl      |   69 +++++++++++++++++++++
>  tools/acpi_extract.py |   28 +++++++++
>  6 files changed, 369 insertions(+), 10 deletions(-)
>  create mode 100644 src/ssdt-mem.dsl
>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/21] ACPI memory hotplug
@ 2012-07-12 20:04   ` Blue Swirl
  0 siblings, 0 replies; 86+ messages in thread
From: Blue Swirl @ 2012-07-12 20:04 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
<vasilis.liaskovitis@profitbricks.com> wrote:
> This is v2 of the ACPI memory hotplug prototype for x86_64 target.

I think the concept of DIMMs (what about SIMMs? SODIMMs? I liked
memslot) would be useful for most targets, but hotplugging may be
limited to x86 only. It would be nice to keep these two separate or as
loosely coupled as possible.

>
> Changes v1->v2
>
> - memory map is automatically calculated for hotplug dimms. Dimms are added from
> top-of-memory skipping the pci hole at [PCI_HOLE_START, 4G).
> - Renamed from "-memslot" to "-dimm". Commands changed to "dimm_add", "dimm_del".
> - Seabios ejection array reduced to a byte. Use extraction macros for dimm ssdt.
> - additional SRAT paravirt info does not break previous SRAT fw_cfg layout.
> - Documentation of new acpi_piix4 registers and paravirt data.
> - add ACPI _OST support for _OST enabled guests. This allows qemu to receive
> notification for success / failure of memory hot-add and hot-remove operations.
> Guest needs to support _OST (https://lkml.org/lkml/2012/6/25/321)
> - add monitor info command to report total guest memory (initial + hot-added)
> - add command line options and monitor commands for batch dimm creation/population
>
> Overview:
>
> Dimm devices are modeled with a new qemu command line
>
> "-dimm id=name,size=sz,node=pxm,populated=on|off"
>
> As already mentioned, the starting physical address for all dimms is calculated
> automatically from top of memory, skipping the pci hole at [PCI_HOLE_START, 4G).
> Node is defining numa proximity for this dimm. When not defined it defaults
> to zero.
> "-dimm id=dimm0,size=512M,node=0,populated=off"
> will define a 512M memory slot belonging to numa node 0.
>
> Dimms are added or removed with a new hmp command "dimm_add/dimm_del":
> Hot-add syntax: "dimm_add id"
> Hot-remove syntax: "dimm_del id"
>
> Issues:
>
> - Live migration works as long as populated field is changed to "on" for
> hotplugged dimms at the destination qemu command line (patch 12/21 lifts
> this requirement). The DimmState structure does not yet define a
> VMStateDescription, but i assume this is the preferred way to pass state
> for migration.
>
> - Dimms are abstracted as qdevices attached to the main system bus. However,
> memory hotplugging has its own side channel ignoring main_system_bus's hotplug
> incapability. A cleaner integration is still needed, probably attaching memory
> devices as children-links of an acpi-capable device (in the pc case acpi_piix4)
> instead of the system bus (TBD). Then device_add/device_del instead of new
> commands can hopefully be used.
>
> Comments/review welcome.
>
> series is based on uq/master for qemu-kvm, and master for seabios. Can be found
> also at:
> http://github.com/vliaskov/qemu-kvm/commits/memhp-v2
> http://github.com/vliaskov/seabios/commits/memhp-v2
>
> Vasilis Liaskovitis (14):
>   dimm: Implement memory device abstraction
>   acpi_piix4: Implement memory device hotplug registers
>   pc: calculate dimm physical addresses and adjust memory map
>   pc: Add dimm paravirt SRAT info
>   Implement "-dimm" command line option
>   Implement dimm_add and dimm_del commands for hmp and qmp
>   fix live-migration when "populated=on" is missing
>   Implement memory hotplug notification lists
>   acpi_piix4: _OST dimm support
>   acpi_piix4: Update dimm state on VM reboot
>   acpi_piix4: Update dimm bitmap state on hot-remove fail
>   Implement "info memtotal" and "query-memtotal"
>   Implement -dimms, -dimmspop command line options
>   Implement mem_increase, mem_decrease hmp/qmp commands
>
>  arch_init.c                 |   23 ++-
>  docs/specs/acpi_hotplug.txt |   46 +++++
>  docs/specs/fwcfg.txt        |   28 +++
>  hmp-commands.hx             |   67 +++++++
>  hmp.c                       |   24 +++
>  hmp.h                       |    2 +
>  hw/Makefile.objs            |    2 +-
>  hw/acpi_piix4.c             |  131 ++++++++++++-
>  hw/dimm.c                   |  449 +++++++++++++++++++++++++++++++++++++++++++
>  hw/dimm.h                   |   72 +++++++
>  hw/pc.c                     |   94 +++++++++-
>  hw/pc.h                     |    6 +
>  hw/pc_piix.c                |   18 ++-
>  monitor.c                   |   35 ++++
>  monitor.h                   |    5 +
>  qapi-schema.json            |   38 ++++
>  qemu-config.c               |   70 +++++++
>  qemu-options.hx             |   15 ++
>  qmp-commands.hx             |  137 +++++++++++++
>  sysemu.h                    |    1 +
>  vl.c                        |  122 ++++++++++++-
>  21 files changed, 1368 insertions(+), 17 deletions(-)
>  create mode 100644 docs/specs/acpi_hotplug.txt
>  create mode 100644 docs/specs/fwcfg.txt
>  create mode 100644 hw/dimm.c
>  create mode 100644 hw/dimm.h
>
> Vasilis Liaskovitis (7):
>   Add ACPI_EXTRACT_DEVICE* macros
>   Add SSDT memory device support
>   acpi-dsdt: Implement functions for memory hotplug.
>   acpi: generate hotplug memory devices.
>   pciinit: Fix pcimem_start value
>   acpi_dsdt: Support _OST dimm method
>   acpi_dsdt: Revert internal dimm state on _OST failure
>
>  Makefile              |    2 +-
>  src/acpi-dsdt.dsl     |  120 ++++++++++++++++++++++++++++++++++++-
>  src/acpi.c            |  158 +++++++++++++++++++++++++++++++++++++++++++++++--
>  src/pciinit.c         |    2 +-
>  src/ssdt-mem.dsl      |   69 +++++++++++++++++++++
>  tools/acpi_extract.py |   28 +++++++++
>  6 files changed, 369 insertions(+), 10 deletions(-)
>  create mode 100644 src/ssdt-mem.dsl
>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH v2 06/21] dimm: Implement memory device abstraction
  2012-07-12 19:55     ` [Qemu-devel] " Blue Swirl
@ 2012-07-13 17:39       ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-13 17:39 UTC (permalink / raw)
  To: Blue Swirl; +Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

Hi,

On Thu, Jul 12, 2012 at 07:55:42PM +0000, Blue Swirl wrote:
> On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
> <vasilis.liaskovitis@profitbricks.com> wrote:
> > Each hotplug-able memory slot is a SysBusDevice. A hot-add operation for a
> > particular dimm creates a new MemoryRegion of the given physical address
> > offset, size and node proximity, and attaches it to main system memory as a
> > sub_region. A hot-remove operation detaches and frees the MemoryRegion from
> > system memory.
> >
> > This prototype still lacks proper qdev integration: a separate
> > hotplug side-channel is used and main system bus hotplug capability is
> > ignored.
> >
> > Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> > ---
> >  hw/Makefile.objs |    2 +-
> >  hw/dimm.c        |  234 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  hw/dimm.h        |   58 +++++++++++++
> >  3 files changed, 293 insertions(+), 1 deletions(-)
> >  create mode 100644 hw/dimm.c
> >  create mode 100644 hw/dimm.h
> >
> > diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> > index 3d77259..e2184bf 100644
> > --- a/hw/Makefile.objs
> > +++ b/hw/Makefile.objs
> > @@ -26,7 +26,7 @@ hw-obj-$(CONFIG_I8254) += i8254_common.o i8254.o
> >  hw-obj-$(CONFIG_PCSPK) += pcspk.o
> >  hw-obj-$(CONFIG_PCKBD) += pckbd.o
> >  hw-obj-$(CONFIG_FDC) += fdc.o
> > -hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o
> > +hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o dimm.o
> >  hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
> >  hw-obj-$(CONFIG_DMA) += dma.o
> >  hw-obj-$(CONFIG_I82374) += i82374.o
> > diff --git a/hw/dimm.c b/hw/dimm.c
> > new file mode 100644
> > index 0000000..00c4623
> > --- /dev/null
> > +++ b/hw/dimm.c
> > @@ -0,0 +1,234 @@
> > +/*
> > + * Dimm device for Memory Hotplug
> > + *
> > + * Copyright ProfitBricks GmbH 2012
> > + * This library is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2 of the License, or (at your option) any later version.
> > + *
> > + * This library is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with this library; if not, see <http://www.gnu.org/licenses/>
> > + */
> > +
> > +#include "trace.h"
> > +#include "qdev.h"
> > +#include "dimm.h"
> > +#include <time.h>
> > +#include "../exec-memory.h"
> > +#include "qmp-commands.h"
> > +
> > +static DeviceState *dimm_hotplug_qdev;
> > +static dimm_hotplug_fn dimm_hotplug;
> > +static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;
> 
> Using global state does not look right. It should always be possible
> to pass around structures to avoid it.

ok, I 'll try to remove the global state.

> 
> > +
> > +static Property dimm_properties[] = {
> > +    DEFINE_PROP_END_OF_LIST()
> > +};
> > +
> > +void dimm_populate(DimmState *s)
> 
> All functions are global and exported but there does not seem to be
> users. Please make all static which you can.

will do

> 
> > +{
> > +    DeviceState *dev= (DeviceState*)s;
> > +    MemoryRegion *new = NULL;
> > +
> > +    new = g_malloc(sizeof(MemoryRegion));
> > +    memory_region_init_ram(new, dev->id, s->size);
> > +    vmstate_register_ram_global(new);
> > +    memory_region_add_subregion(get_system_memory(), s->start, new);
> > +    s->mr = new;
> > +    s->populated = true;
> > +}
> > +
> > +
> > +void dimm_depopulate(DimmState *s)
> > +{
> > +    assert(s);
> > +    if (s->populated) {
> > +        vmstate_unregister_ram(s->mr, NULL);
> > +        memory_region_del_subregion(get_system_memory(), s->mr);
> > +        memory_region_destroy(s->mr);
> > +        s->populated = false;
> > +        s->mr = NULL;
> > +    }
> > +}
> > +
> > +DimmState *dimm_create(char *id, uint64_t size, uint64_t node, uint32_t
> > +        dimm_idx, bool populated)
> > +{
> > +    DeviceState *dev;
> > +    DimmState *mdev;
> > +
> > +    dev = sysbus_create_simple("dimm", -1, NULL);
> > +    dev->id = id;
> > +
> > +    mdev = DIMM(dev);
> > +    mdev->idx = dimm_idx;
> > +    mdev->start = 0;
> > +    mdev->size = size;
> > +    mdev->node = node;
> > +    mdev->populated = populated;
> > +    QTAILQ_INSERT_TAIL(&dimmlist, mdev, nextdimm);
> > +    return mdev;
> > +}
> > +
> > +void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev)
> > +{
> > +    dimm_hotplug_qdev = qdev;
> > +    dimm_hotplug = hotplug;
> > +    dimm_scan_populated();
> > +}
> > +
> > +void dimm_activate(DimmState *slot)
> > +{
> > +    dimm_populate(slot);
> > +    if (dimm_hotplug)
> > +        dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 1);
> 
> Why the cast?

dimm_hotplug accepts SysBusDevice, not DimmState, though that can be changed.
> 
> Also braces, please check your patches with checkpatch.pl.
>

ok, I 'll do checks with checkpatch.pl. 

> > +}
> > +
> > +void dimm_deactivate(DimmState *slot)
> > +{
> > +    if (dimm_hotplug)
> > +        dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 0);
> > +}
> > +
> > +DimmState *dimm_find_from_name(char *id)
> 
> const char *id?
ok
> 
> > +{
> > +    Error *err = NULL;
> > +    DeviceState *qdev;
> > +    const char *type;
> > +    qdev = qdev_find_recursive(sysbus_get_default(), id);
> > +    if (qdev) {
> > +        type = object_property_get_str(OBJECT(qdev), "type", &err);
> > +        if (!type) {
> > +            return NULL;
> > +        }
> > +        if (!strcmp(type, "dimm")) {
> > +            return DIMM(qdev);
> > +        }
> > +    }
> > +    return NULL;
> > +}
> > +
> > +int dimm_do(Monitor *mon, const QDict *qdict, bool add)
> > +{
> > +    DimmState *slot = NULL;
> > +
> > +    char *id = (char*) qdict_get_try_str(qdict, "id");
> 
> Why this cast?

unneeded, because id should be declared as const char*. will fix.

> 
> > +    if (!id) {
> > +        fprintf(stderr, "ERROR %s invalid id\n",__FUNCTION__);
> > +        return 1;
> > +    }
> > +
> > +    slot = dimm_find_from_name(id);
> > +
> > +    if (!slot) {
> > +        fprintf(stderr, "%s no slot %s found\n", __FUNCTION__, id);
> > +        return 1;
> > +    }
> > +
> > +    if (add) {
> > +        if (slot->populated) {
> > +            fprintf(stderr, "ERROR %s slot %s already populated\n",
> > +                    __FUNCTION__, id);
> > +            return 1;
> > +        }
> > +        dimm_activate(slot);
> > +    }
> > +    else {
> > +        if (!slot->populated) {
> > +            fprintf(stderr, "ERROR %s slot %s is not populated\n",
> > +                    __FUNCTION__, id);
> > +            return 1;
> > +        }
> > +        dimm_deactivate(slot);
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +DimmState *dimm_find_from_idx(uint32_t idx)
> > +{
> > +    DimmState *slot;
> > +
> > +    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
> > +        if (slot->idx == idx) {
> > +            return slot;
> > +        }
> > +    }
> > +    return NULL;
> > +}
> > +
> > +/* used to calculate physical address offsets for all dimms */
> > +void dimm_calc_offsets(dimm_calcoffset_fn calcfn)
> > +{
> > +    DimmState *slot;
> > +    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
> > +        if (!slot->start)
> > +            slot->start = calcfn(slot->size);
> > +    }
> > +}
> > +
> > +/* used to populate and activate dimms at boot time */
> > +void dimm_scan_populated(void)
> > +{
> > +    DimmState *slot;
> > +    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
> > +        if (slot->populated && !slot->mr) {
> > +            dimm_activate(slot);
> > +        }
> > +    }
> > +}
> > +
> > +void dimm_notify(uint32_t idx, uint32_t event)
> > +{
> > +    DimmState *s;
> > +    s = dimm_find_from_idx(idx);
> > +    assert(s != NULL);
> > +
> > +    switch(event) {
> > +        case DIMM_REMOVE_SUCCESS:
> > +            dimm_depopulate(s);
> > +            break;
> > +        default:
> > +            break;
> > +    }
> > +}
> > +
> > +static int dimm_init(SysBusDevice *s)
> > +{
> > +    DimmState *slot;
> > +    slot = DIMM(s);
> > +    slot->mr = NULL;
> > +    slot->populated = false;
> > +    return 0;
> > +}
> > +
> > +static void dimm_class_init(ObjectClass *klass, void *data)
> > +{
> > +    SysBusDeviceClass *sc = SYS_BUS_DEVICE_CLASS(klass);
> > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > +
> > +    dc->props = dimm_properties;
> > +    sc->init = dimm_init;
> > +    dimm_hotplug = NULL;
> > +    QTAILQ_INIT(&dimmlist);
> > +}
> > +
> > +static TypeInfo dimm_info = {
> > +    .name          = "dimm",
> > +    .parent        = TYPE_SYS_BUS_DEVICE,
> > +    .instance_size = sizeof(DimmState),
> > +    .class_init    = dimm_class_init,
> > +};
> > +
> > +static void dimm_register_types(void)
> > +{
> > +    type_register_static(&dimm_info);
> > +}
> > +
> > +type_init(dimm_register_types)
> > diff --git a/hw/dimm.h b/hw/dimm.h
> > new file mode 100644
> > index 0000000..643f319
> > --- /dev/null
> > +++ b/hw/dimm.h
> > @@ -0,0 +1,58 @@
> > +#ifndef QEMU_DIMM_H
> > +#define QEMU_DIMM_H
> 
> Should be HW_DIMM_H.
ok.

> 
> > +
> > +#include "qemu-common.h"
> > +#include "memory.h"
> > +#include "sysbus.h"
> > +#include "qapi-types.h"
> > +#include "qemu-queue.h"
> > +#include "cpus.h"
> > +#define MAX_DIMMS 255
> > +#define DIMM_BITMAP_BYTES (MAX_DIMMS + 7) / 8
> > +#define DEFAULT_DIMMSIZE 1024*1024*1024
> > +
> > +typedef enum {
> > +    DIMM_REMOVE_SUCCESS = 0,
> > +    DIMM_REMOVE_FAIL = 1,
> > +    DIMM_ADD_SUCCESS = 2,
> > +    DIMM_ADD_FAIL = 3
> > +} dimm_hp_result_code;
> > +
> > +#define TYPE_DIMM "dimm"
> > +#define DIMM(obj) \
> > +    OBJECT_CHECK(DimmState, (obj), TYPE_DIMM)
> > +#define DIMM_CLASS(klass) \
> > +    OBJECT_CLASS_CHECK(DimmClass, (obj), TYPE_DIMM)
> > +#define DIMM_GET_CLASS(obj) \
> > +    OBJECT_GET_CLASS(DimmClass, (obj), TYPE_DIMM)
> > +
> > +typedef struct DimmState {
> > +    SysBusDevice busdev;
> > +    uint32_t idx; /* index in memory hotplug register/bitmap */
> > +    ram_addr_t start; /* starting physical address */
> > +    ram_addr_t size;
> > +    uint32_t node; /* numa node proximity */
> > +    MemoryRegion *mr; /* MemoryRegion for this slot. !NULL only if populated */
> > +    bool populated; /* 1 means device has been hotplugged. Default is 0. */
> > +    QTAILQ_ENTRY (DimmState) nextdimm;
> > +} DimmState;
> > +
> > +typedef int (*dimm_hotplug_fn)(DeviceState *qdev, SysBusDevice *dev, int add);
> > +typedef target_phys_addr_t (*dimm_calcoffset_fn)(uint64_t size);
> > +
> > +DimmState *dimm_create(char *id, uint64_t size, uint64_t node, uint32_t
> > +        dimm_idx, bool populated);
> > +void dimm_populate(DimmState *s);
> > +void dimm_depopulate(DimmState *s);
> > +int dimm_do(Monitor *mon, const QDict *qdict, bool add);
> > +DimmState *dimm_find_from_idx(uint32_t idx);
> > +DimmState *dimm_find_from_name(char *id);
> > +void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev);
> > +void dimm_calc_offsets(dimm_calcoffset_fn calcfn);
> > +void dimm_activate(DimmState *slot);
> > +void dimm_deactivate(DimmState *slot);
> > +void dimm_scan_populated(void);
> > +void dimm_notify(uint32_t idx, uint32_t event);
> > +
> > +
> > +#endif
> > --
> > 1.7.9
> >
> >

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 06/21] dimm: Implement memory device abstraction
@ 2012-07-13 17:39       ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-13 17:39 UTC (permalink / raw)
  To: Blue Swirl; +Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

Hi,

On Thu, Jul 12, 2012 at 07:55:42PM +0000, Blue Swirl wrote:
> On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
> <vasilis.liaskovitis@profitbricks.com> wrote:
> > Each hotplug-able memory slot is a SysBusDevice. A hot-add operation for a
> > particular dimm creates a new MemoryRegion of the given physical address
> > offset, size and node proximity, and attaches it to main system memory as a
> > sub_region. A hot-remove operation detaches and frees the MemoryRegion from
> > system memory.
> >
> > This prototype still lacks proper qdev integration: a separate
> > hotplug side-channel is used and main system bus hotplug capability is
> > ignored.
> >
> > Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> > ---
> >  hw/Makefile.objs |    2 +-
> >  hw/dimm.c        |  234 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  hw/dimm.h        |   58 +++++++++++++
> >  3 files changed, 293 insertions(+), 1 deletions(-)
> >  create mode 100644 hw/dimm.c
> >  create mode 100644 hw/dimm.h
> >
> > diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> > index 3d77259..e2184bf 100644
> > --- a/hw/Makefile.objs
> > +++ b/hw/Makefile.objs
> > @@ -26,7 +26,7 @@ hw-obj-$(CONFIG_I8254) += i8254_common.o i8254.o
> >  hw-obj-$(CONFIG_PCSPK) += pcspk.o
> >  hw-obj-$(CONFIG_PCKBD) += pckbd.o
> >  hw-obj-$(CONFIG_FDC) += fdc.o
> > -hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o
> > +hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o dimm.o
> >  hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
> >  hw-obj-$(CONFIG_DMA) += dma.o
> >  hw-obj-$(CONFIG_I82374) += i82374.o
> > diff --git a/hw/dimm.c b/hw/dimm.c
> > new file mode 100644
> > index 0000000..00c4623
> > --- /dev/null
> > +++ b/hw/dimm.c
> > @@ -0,0 +1,234 @@
> > +/*
> > + * Dimm device for Memory Hotplug
> > + *
> > + * Copyright ProfitBricks GmbH 2012
> > + * This library is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2 of the License, or (at your option) any later version.
> > + *
> > + * This library is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with this library; if not, see <http://www.gnu.org/licenses/>
> > + */
> > +
> > +#include "trace.h"
> > +#include "qdev.h"
> > +#include "dimm.h"
> > +#include <time.h>
> > +#include "../exec-memory.h"
> > +#include "qmp-commands.h"
> > +
> > +static DeviceState *dimm_hotplug_qdev;
> > +static dimm_hotplug_fn dimm_hotplug;
> > +static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;
> 
> Using global state does not look right. It should always be possible
> to pass around structures to avoid it.

ok, I 'll try to remove the global state.

> 
> > +
> > +static Property dimm_properties[] = {
> > +    DEFINE_PROP_END_OF_LIST()
> > +};
> > +
> > +void dimm_populate(DimmState *s)
> 
> All functions are global and exported but there does not seem to be
> users. Please make all static which you can.

will do

> 
> > +{
> > +    DeviceState *dev= (DeviceState*)s;
> > +    MemoryRegion *new = NULL;
> > +
> > +    new = g_malloc(sizeof(MemoryRegion));
> > +    memory_region_init_ram(new, dev->id, s->size);
> > +    vmstate_register_ram_global(new);
> > +    memory_region_add_subregion(get_system_memory(), s->start, new);
> > +    s->mr = new;
> > +    s->populated = true;
> > +}
> > +
> > +
> > +void dimm_depopulate(DimmState *s)
> > +{
> > +    assert(s);
> > +    if (s->populated) {
> > +        vmstate_unregister_ram(s->mr, NULL);
> > +        memory_region_del_subregion(get_system_memory(), s->mr);
> > +        memory_region_destroy(s->mr);
> > +        s->populated = false;
> > +        s->mr = NULL;
> > +    }
> > +}
> > +
> > +DimmState *dimm_create(char *id, uint64_t size, uint64_t node, uint32_t
> > +        dimm_idx, bool populated)
> > +{
> > +    DeviceState *dev;
> > +    DimmState *mdev;
> > +
> > +    dev = sysbus_create_simple("dimm", -1, NULL);
> > +    dev->id = id;
> > +
> > +    mdev = DIMM(dev);
> > +    mdev->idx = dimm_idx;
> > +    mdev->start = 0;
> > +    mdev->size = size;
> > +    mdev->node = node;
> > +    mdev->populated = populated;
> > +    QTAILQ_INSERT_TAIL(&dimmlist, mdev, nextdimm);
> > +    return mdev;
> > +}
> > +
> > +void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev)
> > +{
> > +    dimm_hotplug_qdev = qdev;
> > +    dimm_hotplug = hotplug;
> > +    dimm_scan_populated();
> > +}
> > +
> > +void dimm_activate(DimmState *slot)
> > +{
> > +    dimm_populate(slot);
> > +    if (dimm_hotplug)
> > +        dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 1);
> 
> Why the cast?

dimm_hotplug accepts SysBusDevice, not DimmState, though that can be changed.
> 
> Also braces, please check your patches with checkpatch.pl.
>

ok, I 'll do checks with checkpatch.pl. 

> > +}
> > +
> > +void dimm_deactivate(DimmState *slot)
> > +{
> > +    if (dimm_hotplug)
> > +        dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 0);
> > +}
> > +
> > +DimmState *dimm_find_from_name(char *id)
> 
> const char *id?
ok
> 
> > +{
> > +    Error *err = NULL;
> > +    DeviceState *qdev;
> > +    const char *type;
> > +    qdev = qdev_find_recursive(sysbus_get_default(), id);
> > +    if (qdev) {
> > +        type = object_property_get_str(OBJECT(qdev), "type", &err);
> > +        if (!type) {
> > +            return NULL;
> > +        }
> > +        if (!strcmp(type, "dimm")) {
> > +            return DIMM(qdev);
> > +        }
> > +    }
> > +    return NULL;
> > +}
> > +
> > +int dimm_do(Monitor *mon, const QDict *qdict, bool add)
> > +{
> > +    DimmState *slot = NULL;
> > +
> > +    char *id = (char*) qdict_get_try_str(qdict, "id");
> 
> Why this cast?

unneeded, because id should be declared as const char*. will fix.

> 
> > +    if (!id) {
> > +        fprintf(stderr, "ERROR %s invalid id\n",__FUNCTION__);
> > +        return 1;
> > +    }
> > +
> > +    slot = dimm_find_from_name(id);
> > +
> > +    if (!slot) {
> > +        fprintf(stderr, "%s no slot %s found\n", __FUNCTION__, id);
> > +        return 1;
> > +    }
> > +
> > +    if (add) {
> > +        if (slot->populated) {
> > +            fprintf(stderr, "ERROR %s slot %s already populated\n",
> > +                    __FUNCTION__, id);
> > +            return 1;
> > +        }
> > +        dimm_activate(slot);
> > +    }
> > +    else {
> > +        if (!slot->populated) {
> > +            fprintf(stderr, "ERROR %s slot %s is not populated\n",
> > +                    __FUNCTION__, id);
> > +            return 1;
> > +        }
> > +        dimm_deactivate(slot);
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +DimmState *dimm_find_from_idx(uint32_t idx)
> > +{
> > +    DimmState *slot;
> > +
> > +    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
> > +        if (slot->idx == idx) {
> > +            return slot;
> > +        }
> > +    }
> > +    return NULL;
> > +}
> > +
> > +/* used to calculate physical address offsets for all dimms */
> > +void dimm_calc_offsets(dimm_calcoffset_fn calcfn)
> > +{
> > +    DimmState *slot;
> > +    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
> > +        if (!slot->start)
> > +            slot->start = calcfn(slot->size);
> > +    }
> > +}
> > +
> > +/* used to populate and activate dimms at boot time */
> > +void dimm_scan_populated(void)
> > +{
> > +    DimmState *slot;
> > +    QTAILQ_FOREACH(slot, &dimmlist, nextdimm) {
> > +        if (slot->populated && !slot->mr) {
> > +            dimm_activate(slot);
> > +        }
> > +    }
> > +}
> > +
> > +void dimm_notify(uint32_t idx, uint32_t event)
> > +{
> > +    DimmState *s;
> > +    s = dimm_find_from_idx(idx);
> > +    assert(s != NULL);
> > +
> > +    switch(event) {
> > +        case DIMM_REMOVE_SUCCESS:
> > +            dimm_depopulate(s);
> > +            break;
> > +        default:
> > +            break;
> > +    }
> > +}
> > +
> > +static int dimm_init(SysBusDevice *s)
> > +{
> > +    DimmState *slot;
> > +    slot = DIMM(s);
> > +    slot->mr = NULL;
> > +    slot->populated = false;
> > +    return 0;
> > +}
> > +
> > +static void dimm_class_init(ObjectClass *klass, void *data)
> > +{
> > +    SysBusDeviceClass *sc = SYS_BUS_DEVICE_CLASS(klass);
> > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > +
> > +    dc->props = dimm_properties;
> > +    sc->init = dimm_init;
> > +    dimm_hotplug = NULL;
> > +    QTAILQ_INIT(&dimmlist);
> > +}
> > +
> > +static TypeInfo dimm_info = {
> > +    .name          = "dimm",
> > +    .parent        = TYPE_SYS_BUS_DEVICE,
> > +    .instance_size = sizeof(DimmState),
> > +    .class_init    = dimm_class_init,
> > +};
> > +
> > +static void dimm_register_types(void)
> > +{
> > +    type_register_static(&dimm_info);
> > +}
> > +
> > +type_init(dimm_register_types)
> > diff --git a/hw/dimm.h b/hw/dimm.h
> > new file mode 100644
> > index 0000000..643f319
> > --- /dev/null
> > +++ b/hw/dimm.h
> > @@ -0,0 +1,58 @@
> > +#ifndef QEMU_DIMM_H
> > +#define QEMU_DIMM_H
> 
> Should be HW_DIMM_H.
ok.

> 
> > +
> > +#include "qemu-common.h"
> > +#include "memory.h"
> > +#include "sysbus.h"
> > +#include "qapi-types.h"
> > +#include "qemu-queue.h"
> > +#include "cpus.h"
> > +#define MAX_DIMMS 255
> > +#define DIMM_BITMAP_BYTES (MAX_DIMMS + 7) / 8
> > +#define DEFAULT_DIMMSIZE 1024*1024*1024
> > +
> > +typedef enum {
> > +    DIMM_REMOVE_SUCCESS = 0,
> > +    DIMM_REMOVE_FAIL = 1,
> > +    DIMM_ADD_SUCCESS = 2,
> > +    DIMM_ADD_FAIL = 3
> > +} dimm_hp_result_code;
> > +
> > +#define TYPE_DIMM "dimm"
> > +#define DIMM(obj) \
> > +    OBJECT_CHECK(DimmState, (obj), TYPE_DIMM)
> > +#define DIMM_CLASS(klass) \
> > +    OBJECT_CLASS_CHECK(DimmClass, (obj), TYPE_DIMM)
> > +#define DIMM_GET_CLASS(obj) \
> > +    OBJECT_GET_CLASS(DimmClass, (obj), TYPE_DIMM)
> > +
> > +typedef struct DimmState {
> > +    SysBusDevice busdev;
> > +    uint32_t idx; /* index in memory hotplug register/bitmap */
> > +    ram_addr_t start; /* starting physical address */
> > +    ram_addr_t size;
> > +    uint32_t node; /* numa node proximity */
> > +    MemoryRegion *mr; /* MemoryRegion for this slot. !NULL only if populated */
> > +    bool populated; /* 1 means device has been hotplugged. Default is 0. */
> > +    QTAILQ_ENTRY (DimmState) nextdimm;
> > +} DimmState;
> > +
> > +typedef int (*dimm_hotplug_fn)(DeviceState *qdev, SysBusDevice *dev, int add);
> > +typedef target_phys_addr_t (*dimm_calcoffset_fn)(uint64_t size);
> > +
> > +DimmState *dimm_create(char *id, uint64_t size, uint64_t node, uint32_t
> > +        dimm_idx, bool populated);
> > +void dimm_populate(DimmState *s);
> > +void dimm_depopulate(DimmState *s);
> > +int dimm_do(Monitor *mon, const QDict *qdict, bool add);
> > +DimmState *dimm_find_from_idx(uint32_t idx);
> > +DimmState *dimm_find_from_name(char *id);
> > +void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev);
> > +void dimm_calc_offsets(dimm_calcoffset_fn calcfn);
> > +void dimm_activate(DimmState *slot);
> > +void dimm_deactivate(DimmState *slot);
> > +void dimm_scan_populated(void);
> > +void dimm_notify(uint32_t idx, uint32_t event);
> > +
> > +
> > +#endif
> > --
> > 1.7.9
> >
> >

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 09/21] pc: Add dimm paravirt SRAT info
  2012-07-12 19:48     ` [Qemu-devel] " Blue Swirl
@ 2012-07-13 17:40       ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-13 17:40 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel, kvm, seabios, gleb, kevin, avi, anthony, imammedo

On Thu, Jul 12, 2012 at 07:48:04PM +0000, Blue Swirl wrote:
> On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
> <vasilis.liaskovitis@profitbricks.com> wrote:
> > The numa_fw_cfg paravirt interface is extended to include SRAT information for
> > all hotplug-able dimms. There are 3 words for each hotplug-able memory slot,
> > denoting start address, size and node proximity. The new info is appended after
> > existing numa info, so that the fw_cfg layout does not break.  This information
> > is used by Seabios to build hotplug memory device objects at runtime.
> > nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info
> > to SeaBIOS.
> >
> > v1->v2:
> > Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order not
> > to break existing layout
> > Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt
> >
> > Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> > ---
> >  docs/specs/fwcfg.txt |   28 ++++++++++++++++++++++++++
> >  hw/pc.c              |   53 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >  vl.c                 |    2 +-
> >  3 files changed, 80 insertions(+), 3 deletions(-)
> >  create mode 100644 docs/specs/fwcfg.txt
> >
> > diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt
> > new file mode 100644
> > index 0000000..e6fcd8f
> > --- /dev/null
> > +++ b/docs/specs/fwcfg.txt
> > @@ -0,0 +1,28 @@
> > +QEMU<->BIOS Paravirt Documentation
> > +--------------------------------------
> > +
> > +This document describes paravirt data structures passed from QEMU to BIOS.
> > +
> > +fw_cfg SRAT paravirt info
> > +--------------------
> > +The SRAT info passed from QEMU to BIOS has the following layout:
> > +
> > +-----------------------------------------------------------------------------------------------
> > +#nodes | cpu0_pxm | cpu1_pxm | ... | cpulast_pxm | node0_mem | node1_mem | ... | nodelast_mem
> > +
> > +-----------------------------------------------------------------------------------------------
> > +#dimms | dimm0_start | dimm0_sz | dimm0_pxm | ... | dimmlast_start | dimmlast_sz | dimmlast_pxm
> > +
> > +Entry 0 contains the number of numa nodes (nb_numa_nodes).
> > +
> > +Entries 1..max_cpus: The next max_cpus entries describe node proximity for each
> > +one of the vCPUs in the system.
> > +
> > +Entries max_cpus+1..max_cpus+nb_numa_nodes+1:  The next nb_numa_nodes entries
> > +describe the memory size for each one of the NUMA nodes in the system.
> > +
> > +Entry max_cpus+nb_numa_nodes+1 contains the number of memory dimms (nb_hp_dimms)
> > +
> > +The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet contains
> > +the physical address offset, size (in bytes), and node proximity for the
> > +respective dimm.
> 
> The size and endianness are not specified, you are using LE 64 bit
> values for each item.

thanks, I 'll update.

> 
> > diff --git a/hw/pc.c b/hw/pc.c
> > index ef9901a..cf651d0 100644
> > --- a/hw/pc.c
> > +++ b/hw/pc.c
> > @@ -598,12 +598,15 @@ int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
> >      return index;
> >  }
> >
> > +static void setup_hp_dimms(uint64_t *fw_cfg_slots);
> > +
> >  static void *bochs_bios_init(void)
> >  {
> >      void *fw_cfg;
> >      uint8_t *smbios_table;
> >      size_t smbios_len;
> >      uint64_t *numa_fw_cfg;
> > +    uint64_t *hp_dimms_fw_cfg;
> >      int i, j;
> >
> >      register_ioport_write(0x400, 1, 2, bochs_bios_write, NULL);
> > @@ -638,8 +641,10 @@ static void *bochs_bios_init(void)
> >      /* allocate memory for the NUMA channel: one (64bit) word for the number
> >       * of nodes, one word for each VCPU->node and one word for each node to
> >       * hold the amount of memory.
> > +     * Finally one word for the number of hotplug memory slots and three words
> > +     * for each hotplug memory slot (start address, size and node proximity).
> >       */
> > -    numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8);
> > +    numa_fw_cfg = g_malloc0((2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8);
> >      numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
> >      for (i = 0; i < max_cpus; i++) {
> >          for (j = 0; j < nb_numa_nodes; j++) {
> > @@ -652,8 +657,15 @@ static void *bochs_bios_init(void)
> >      for (i = 0; i < nb_numa_nodes; i++) {
> >          numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]);
> >      }
> > +
> > +    numa_fw_cfg[1 + max_cpus + nb_numa_nodes] = cpu_to_le64(nb_hp_dimms);
> > +
> > +    hp_dimms_fw_cfg = numa_fw_cfg + 2 + max_cpus + nb_numa_nodes;
> > +    if (nb_hp_dimms)
> > +        setup_hp_dimms(hp_dimms_fw_cfg);
> 
> Braces.
> 
> > +
> >      fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg,
> > -                     (1 + max_cpus + nb_numa_nodes) * 8);
> > +                     (2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8);
> >
> >      return fw_cfg;
> >  }
> > @@ -1223,3 +1235,40 @@ target_phys_addr_t pc_set_hp_memory_offset(uint64_t size)
> >
> >      return ret;
> >  }
> > +
> > +static void setup_hp_dimms(uint64_t *fw_cfg_slots)
> > +{
> > +    int i = 0;
> > +    Error *err = NULL;
> > +    DeviceState *dev;
> > +    DimmState *slot;
> > +    const char *type;
> > +    BusChild *kid;
> > +    BusState *bus = sysbus_get_default();
> > +
> > +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
> > +        dev = kid->child;
> > +        type = object_property_get_str(OBJECT(dev), "type", &err);
> > +        if (err) {
> > +            error_free(err);
> > +            fprintf(stderr, "error getting device type\n");
> > +            exit(1);
> > +        }
> > +
> > +        if (!strcmp(type, "dimm")) {
> > +            if (!dev->id) {
> > +                fprintf(stderr, "error getting dimm device id\n");
> > +                exit(1);
> > +            }
> > +            slot = DIMM(dev);
> > +            /* determine starting physical address for this memory slot */
> > +            assert(slot->start);
> > +            fw_cfg_slots[3 * slot->idx] = cpu_to_le64(slot->start);
> > +            fw_cfg_slots[3 * slot->idx + 1] = cpu_to_le64(slot->size);
> > +            fw_cfg_slots[3 * slot->idx + 2] = cpu_to_le64(slot->node);
> > +            i++;
> > +        }
> > +    }
> > +    assert(i == nb_hp_dimms);
> > +}
> > +
> > diff --git a/vl.c b/vl.c
> > index 0ff8818..37c9798 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -2335,7 +2335,7 @@ int main(int argc, char **argv, char **envp)
> >          node_cpumask[i] = 0;
> >      }
> >
> > -    nb_numa_nodes = 0;
> > +    nb_numa_nodes = 1;
> >      nb_nics = 0;
> >
> >      autostart= 1;
> > --
> > 1.7.9
> >
> >

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 09/21] pc: Add dimm paravirt SRAT info
@ 2012-07-13 17:40       ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-13 17:40 UTC (permalink / raw)
  To: Blue Swirl; +Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

On Thu, Jul 12, 2012 at 07:48:04PM +0000, Blue Swirl wrote:
> On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
> <vasilis.liaskovitis@profitbricks.com> wrote:
> > The numa_fw_cfg paravirt interface is extended to include SRAT information for
> > all hotplug-able dimms. There are 3 words for each hotplug-able memory slot,
> > denoting start address, size and node proximity. The new info is appended after
> > existing numa info, so that the fw_cfg layout does not break.  This information
> > is used by Seabios to build hotplug memory device objects at runtime.
> > nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info
> > to SeaBIOS.
> >
> > v1->v2:
> > Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order not
> > to break existing layout
> > Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt
> >
> > Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> > ---
> >  docs/specs/fwcfg.txt |   28 ++++++++++++++++++++++++++
> >  hw/pc.c              |   53 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >  vl.c                 |    2 +-
> >  3 files changed, 80 insertions(+), 3 deletions(-)
> >  create mode 100644 docs/specs/fwcfg.txt
> >
> > diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt
> > new file mode 100644
> > index 0000000..e6fcd8f
> > --- /dev/null
> > +++ b/docs/specs/fwcfg.txt
> > @@ -0,0 +1,28 @@
> > +QEMU<->BIOS Paravirt Documentation
> > +--------------------------------------
> > +
> > +This document describes paravirt data structures passed from QEMU to BIOS.
> > +
> > +fw_cfg SRAT paravirt info
> > +--------------------
> > +The SRAT info passed from QEMU to BIOS has the following layout:
> > +
> > +-----------------------------------------------------------------------------------------------
> > +#nodes | cpu0_pxm | cpu1_pxm | ... | cpulast_pxm | node0_mem | node1_mem | ... | nodelast_mem
> > +
> > +-----------------------------------------------------------------------------------------------
> > +#dimms | dimm0_start | dimm0_sz | dimm0_pxm | ... | dimmlast_start | dimmlast_sz | dimmlast_pxm
> > +
> > +Entry 0 contains the number of numa nodes (nb_numa_nodes).
> > +
> > +Entries 1..max_cpus: The next max_cpus entries describe node proximity for each
> > +one of the vCPUs in the system.
> > +
> > +Entries max_cpus+1..max_cpus+nb_numa_nodes+1:  The next nb_numa_nodes entries
> > +describe the memory size for each one of the NUMA nodes in the system.
> > +
> > +Entry max_cpus+nb_numa_nodes+1 contains the number of memory dimms (nb_hp_dimms)
> > +
> > +The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet contains
> > +the physical address offset, size (in bytes), and node proximity for the
> > +respective dimm.
> 
> The size and endianness are not specified, you are using LE 64 bit
> values for each item.

thanks, I 'll update.

> 
> > diff --git a/hw/pc.c b/hw/pc.c
> > index ef9901a..cf651d0 100644
> > --- a/hw/pc.c
> > +++ b/hw/pc.c
> > @@ -598,12 +598,15 @@ int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
> >      return index;
> >  }
> >
> > +static void setup_hp_dimms(uint64_t *fw_cfg_slots);
> > +
> >  static void *bochs_bios_init(void)
> >  {
> >      void *fw_cfg;
> >      uint8_t *smbios_table;
> >      size_t smbios_len;
> >      uint64_t *numa_fw_cfg;
> > +    uint64_t *hp_dimms_fw_cfg;
> >      int i, j;
> >
> >      register_ioport_write(0x400, 1, 2, bochs_bios_write, NULL);
> > @@ -638,8 +641,10 @@ static void *bochs_bios_init(void)
> >      /* allocate memory for the NUMA channel: one (64bit) word for the number
> >       * of nodes, one word for each VCPU->node and one word for each node to
> >       * hold the amount of memory.
> > +     * Finally one word for the number of hotplug memory slots and three words
> > +     * for each hotplug memory slot (start address, size and node proximity).
> >       */
> > -    numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8);
> > +    numa_fw_cfg = g_malloc0((2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8);
> >      numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
> >      for (i = 0; i < max_cpus; i++) {
> >          for (j = 0; j < nb_numa_nodes; j++) {
> > @@ -652,8 +657,15 @@ static void *bochs_bios_init(void)
> >      for (i = 0; i < nb_numa_nodes; i++) {
> >          numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]);
> >      }
> > +
> > +    numa_fw_cfg[1 + max_cpus + nb_numa_nodes] = cpu_to_le64(nb_hp_dimms);
> > +
> > +    hp_dimms_fw_cfg = numa_fw_cfg + 2 + max_cpus + nb_numa_nodes;
> > +    if (nb_hp_dimms)
> > +        setup_hp_dimms(hp_dimms_fw_cfg);
> 
> Braces.
> 
> > +
> >      fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg,
> > -                     (1 + max_cpus + nb_numa_nodes) * 8);
> > +                     (2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8);
> >
> >      return fw_cfg;
> >  }
> > @@ -1223,3 +1235,40 @@ target_phys_addr_t pc_set_hp_memory_offset(uint64_t size)
> >
> >      return ret;
> >  }
> > +
> > +static void setup_hp_dimms(uint64_t *fw_cfg_slots)
> > +{
> > +    int i = 0;
> > +    Error *err = NULL;
> > +    DeviceState *dev;
> > +    DimmState *slot;
> > +    const char *type;
> > +    BusChild *kid;
> > +    BusState *bus = sysbus_get_default();
> > +
> > +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
> > +        dev = kid->child;
> > +        type = object_property_get_str(OBJECT(dev), "type", &err);
> > +        if (err) {
> > +            error_free(err);
> > +            fprintf(stderr, "error getting device type\n");
> > +            exit(1);
> > +        }
> > +
> > +        if (!strcmp(type, "dimm")) {
> > +            if (!dev->id) {
> > +                fprintf(stderr, "error getting dimm device id\n");
> > +                exit(1);
> > +            }
> > +            slot = DIMM(dev);
> > +            /* determine starting physical address for this memory slot */
> > +            assert(slot->start);
> > +            fw_cfg_slots[3 * slot->idx] = cpu_to_le64(slot->start);
> > +            fw_cfg_slots[3 * slot->idx + 1] = cpu_to_le64(slot->size);
> > +            fw_cfg_slots[3 * slot->idx + 2] = cpu_to_le64(slot->node);
> > +            i++;
> > +        }
> > +    }
> > +    assert(i == nb_hp_dimms);
> > +}
> > +
> > diff --git a/vl.c b/vl.c
> > index 0ff8818..37c9798 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -2335,7 +2335,7 @@ int main(int argc, char **argv, char **envp)
> >          node_cpumask[i] = 0;
> >      }
> >
> > -    nb_numa_nodes = 0;
> > +    nb_numa_nodes = 1;
> >      nb_nics = 0;
> >
> >      autostart= 1;
> > --
> > 1.7.9
> >
> >

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/21] ACPI memory hotplug
  2012-07-12 20:04   ` Blue Swirl
@ 2012-07-13 17:49     ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-13 17:49 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel, kvm, seabios, gleb, kevin, avi, anthony, imammedo

On Thu, Jul 12, 2012 at 08:04:56PM +0000, Blue Swirl wrote:
> On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
> <vasilis.liaskovitis@profitbricks.com> wrote:
> > This is v2 of the ACPI memory hotplug prototype for x86_64 target.
> 
> I think the concept of DIMMs (what about SIMMs? SODIMMs? I liked
> memslot) would be useful for most targets, but hotplugging may be
> limited to x86 only. It would be nice to keep these two separate or as
> loosely coupled as possible.

agreed.
what specific usecases besides hotplugging are you thinking about? 
Also are there non-acpi hotplug platforms?

I am trying to keep generic dimm manipulation functions (e.g. population /
depopulation and searching) in hw/dimm[.ch]. Currently the x86-acpi_piix4 "backend"
registers a callback for hot-add / hot-remove. In theory other hotplug backends
can hook in. 

btw I don't mind using "-memslot" (I think someone during v1 mentioned -dimm), we just
need some consensus on the naming.

> 
> >
> > Changes v1->v2
> >
> > - memory map is automatically calculated for hotplug dimms. Dimms are added from
> > top-of-memory skipping the pci hole at [PCI_HOLE_START, 4G).
> > - Renamed from "-memslot" to "-dimm". Commands changed to "dimm_add", "dimm_del".
> > - Seabios ejection array reduced to a byte. Use extraction macros for dimm ssdt.
> > - additional SRAT paravirt info does not break previous SRAT fw_cfg layout.
> > - Documentation of new acpi_piix4 registers and paravirt data.
> > - add ACPI _OST support for _OST enabled guests. This allows qemu to receive
> > notification for success / failure of memory hot-add and hot-remove operations.
> > Guest needs to support _OST (https://lkml.org/lkml/2012/6/25/321)
> > - add monitor info command to report total guest memory (initial + hot-added)
> > - add command line options and monitor commands for batch dimm creation/population
> >
> > Overview:
> >
> > Dimm devices are modeled with a new qemu command line
> >
> > "-dimm id=name,size=sz,node=pxm,populated=on|off"
> >
> > As already mentioned, the starting physical address for all dimms is calculated
> > automatically from top of memory, skipping the pci hole at [PCI_HOLE_START, 4G).
> > Node is defining numa proximity for this dimm. When not defined it defaults
> > to zero.
> > "-dimm id=dimm0,size=512M,node=0,populated=off"
> > will define a 512M memory slot belonging to numa node 0.
> >
> > Dimms are added or removed with a new hmp command "dimm_add/dimm_del":
> > Hot-add syntax: "dimm_add id"
> > Hot-remove syntax: "dimm_del id"
> >
> > Issues:
> >
> > - Live migration works as long as populated field is changed to "on" for
> > hotplugged dimms at the destination qemu command line (patch 12/21 lifts
> > this requirement). The DimmState structure does not yet define a
> > VMStateDescription, but i assume this is the preferred way to pass state
> > for migration.
> >
> > - Dimms are abstracted as qdevices attached to the main system bus. However,
> > memory hotplugging has its own side channel ignoring main_system_bus's hotplug
> > incapability. A cleaner integration is still needed, probably attaching memory
> > devices as children-links of an acpi-capable device (in the pc case acpi_piix4)
> > instead of the system bus (TBD). Then device_add/device_del instead of new
> > commands can hopefully be used.
> >
> > Comments/review welcome.
> >
> > series is based on uq/master for qemu-kvm, and master for seabios. Can be found
> > also at:
> > http://github.com/vliaskov/qemu-kvm/commits/memhp-v2
> > http://github.com/vliaskov/seabios/commits/memhp-v2
> >
> > Vasilis Liaskovitis (14):
> >   dimm: Implement memory device abstraction
> >   acpi_piix4: Implement memory device hotplug registers
> >   pc: calculate dimm physical addresses and adjust memory map
> >   pc: Add dimm paravirt SRAT info
> >   Implement "-dimm" command line option
> >   Implement dimm_add and dimm_del commands for hmp and qmp
> >   fix live-migration when "populated=on" is missing
> >   Implement memory hotplug notification lists
> >   acpi_piix4: _OST dimm support
> >   acpi_piix4: Update dimm state on VM reboot
> >   acpi_piix4: Update dimm bitmap state on hot-remove fail
> >   Implement "info memtotal" and "query-memtotal"
> >   Implement -dimms, -dimmspop command line options
> >   Implement mem_increase, mem_decrease hmp/qmp commands
> >
> >  arch_init.c                 |   23 ++-
> >  docs/specs/acpi_hotplug.txt |   46 +++++
> >  docs/specs/fwcfg.txt        |   28 +++
> >  hmp-commands.hx             |   67 +++++++
> >  hmp.c                       |   24 +++
> >  hmp.h                       |    2 +
> >  hw/Makefile.objs            |    2 +-
> >  hw/acpi_piix4.c             |  131 ++++++++++++-
> >  hw/dimm.c                   |  449 +++++++++++++++++++++++++++++++++++++++++++
> >  hw/dimm.h                   |   72 +++++++
> >  hw/pc.c                     |   94 +++++++++-
> >  hw/pc.h                     |    6 +
> >  hw/pc_piix.c                |   18 ++-
> >  monitor.c                   |   35 ++++
> >  monitor.h                   |    5 +
> >  qapi-schema.json            |   38 ++++
> >  qemu-config.c               |   70 +++++++
> >  qemu-options.hx             |   15 ++
> >  qmp-commands.hx             |  137 +++++++++++++
> >  sysemu.h                    |    1 +
> >  vl.c                        |  122 ++++++++++++-
> >  21 files changed, 1368 insertions(+), 17 deletions(-)
> >  create mode 100644 docs/specs/acpi_hotplug.txt
> >  create mode 100644 docs/specs/fwcfg.txt
> >  create mode 100644 hw/dimm.c
> >  create mode 100644 hw/dimm.h
> >
> > Vasilis Liaskovitis (7):
> >   Add ACPI_EXTRACT_DEVICE* macros
> >   Add SSDT memory device support
> >   acpi-dsdt: Implement functions for memory hotplug.
> >   acpi: generate hotplug memory devices.
> >   pciinit: Fix pcimem_start value
> >   acpi_dsdt: Support _OST dimm method
> >   acpi_dsdt: Revert internal dimm state on _OST failure
> >
> >  Makefile              |    2 +-
> >  src/acpi-dsdt.dsl     |  120 ++++++++++++++++++++++++++++++++++++-
> >  src/acpi.c            |  158 +++++++++++++++++++++++++++++++++++++++++++++++--
> >  src/pciinit.c         |    2 +-
> >  src/ssdt-mem.dsl      |   69 +++++++++++++++++++++
> >  tools/acpi_extract.py |   28 +++++++++
> >  6 files changed, 369 insertions(+), 10 deletions(-)
> >  create mode 100644 src/ssdt-mem.dsl
> >

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/21] ACPI memory hotplug
@ 2012-07-13 17:49     ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-13 17:49 UTC (permalink / raw)
  To: Blue Swirl; +Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

On Thu, Jul 12, 2012 at 08:04:56PM +0000, Blue Swirl wrote:
> On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
> <vasilis.liaskovitis@profitbricks.com> wrote:
> > This is v2 of the ACPI memory hotplug prototype for x86_64 target.
> 
> I think the concept of DIMMs (what about SIMMs? SODIMMs? I liked
> memslot) would be useful for most targets, but hotplugging may be
> limited to x86 only. It would be nice to keep these two separate or as
> loosely coupled as possible.

agreed.
what specific usecases besides hotplugging are you thinking about? 
Also are there non-acpi hotplug platforms?

I am trying to keep generic dimm manipulation functions (e.g. population /
depopulation and searching) in hw/dimm[.ch]. Currently the x86-acpi_piix4 "backend"
registers a callback for hot-add / hot-remove. In theory other hotplug backends
can hook in. 

btw I don't mind using "-memslot" (I think someone during v1 mentioned -dimm), we just
need some consensus on the naming.

> 
> >
> > Changes v1->v2
> >
> > - memory map is automatically calculated for hotplug dimms. Dimms are added from
> > top-of-memory skipping the pci hole at [PCI_HOLE_START, 4G).
> > - Renamed from "-memslot" to "-dimm". Commands changed to "dimm_add", "dimm_del".
> > - Seabios ejection array reduced to a byte. Use extraction macros for dimm ssdt.
> > - additional SRAT paravirt info does not break previous SRAT fw_cfg layout.
> > - Documentation of new acpi_piix4 registers and paravirt data.
> > - add ACPI _OST support for _OST enabled guests. This allows qemu to receive
> > notification for success / failure of memory hot-add and hot-remove operations.
> > Guest needs to support _OST (https://lkml.org/lkml/2012/6/25/321)
> > - add monitor info command to report total guest memory (initial + hot-added)
> > - add command line options and monitor commands for batch dimm creation/population
> >
> > Overview:
> >
> > Dimm devices are modeled with a new qemu command line
> >
> > "-dimm id=name,size=sz,node=pxm,populated=on|off"
> >
> > As already mentioned, the starting physical address for all dimms is calculated
> > automatically from top of memory, skipping the pci hole at [PCI_HOLE_START, 4G).
> > Node is defining numa proximity for this dimm. When not defined it defaults
> > to zero.
> > "-dimm id=dimm0,size=512M,node=0,populated=off"
> > will define a 512M memory slot belonging to numa node 0.
> >
> > Dimms are added or removed with a new hmp command "dimm_add/dimm_del":
> > Hot-add syntax: "dimm_add id"
> > Hot-remove syntax: "dimm_del id"
> >
> > Issues:
> >
> > - Live migration works as long as populated field is changed to "on" for
> > hotplugged dimms at the destination qemu command line (patch 12/21 lifts
> > this requirement). The DimmState structure does not yet define a
> > VMStateDescription, but i assume this is the preferred way to pass state
> > for migration.
> >
> > - Dimms are abstracted as qdevices attached to the main system bus. However,
> > memory hotplugging has its own side channel ignoring main_system_bus's hotplug
> > incapability. A cleaner integration is still needed, probably attaching memory
> > devices as children-links of an acpi-capable device (in the pc case acpi_piix4)
> > instead of the system bus (TBD). Then device_add/device_del instead of new
> > commands can hopefully be used.
> >
> > Comments/review welcome.
> >
> > series is based on uq/master for qemu-kvm, and master for seabios. Can be found
> > also at:
> > http://github.com/vliaskov/qemu-kvm/commits/memhp-v2
> > http://github.com/vliaskov/seabios/commits/memhp-v2
> >
> > Vasilis Liaskovitis (14):
> >   dimm: Implement memory device abstraction
> >   acpi_piix4: Implement memory device hotplug registers
> >   pc: calculate dimm physical addresses and adjust memory map
> >   pc: Add dimm paravirt SRAT info
> >   Implement "-dimm" command line option
> >   Implement dimm_add and dimm_del commands for hmp and qmp
> >   fix live-migration when "populated=on" is missing
> >   Implement memory hotplug notification lists
> >   acpi_piix4: _OST dimm support
> >   acpi_piix4: Update dimm state on VM reboot
> >   acpi_piix4: Update dimm bitmap state on hot-remove fail
> >   Implement "info memtotal" and "query-memtotal"
> >   Implement -dimms, -dimmspop command line options
> >   Implement mem_increase, mem_decrease hmp/qmp commands
> >
> >  arch_init.c                 |   23 ++-
> >  docs/specs/acpi_hotplug.txt |   46 +++++
> >  docs/specs/fwcfg.txt        |   28 +++
> >  hmp-commands.hx             |   67 +++++++
> >  hmp.c                       |   24 +++
> >  hmp.h                       |    2 +
> >  hw/Makefile.objs            |    2 +-
> >  hw/acpi_piix4.c             |  131 ++++++++++++-
> >  hw/dimm.c                   |  449 +++++++++++++++++++++++++++++++++++++++++++
> >  hw/dimm.h                   |   72 +++++++
> >  hw/pc.c                     |   94 +++++++++-
> >  hw/pc.h                     |    6 +
> >  hw/pc_piix.c                |   18 ++-
> >  monitor.c                   |   35 ++++
> >  monitor.h                   |    5 +
> >  qapi-schema.json            |   38 ++++
> >  qemu-config.c               |   70 +++++++
> >  qemu-options.hx             |   15 ++
> >  qmp-commands.hx             |  137 +++++++++++++
> >  sysemu.h                    |    1 +
> >  vl.c                        |  122 ++++++++++++-
> >  21 files changed, 1368 insertions(+), 17 deletions(-)
> >  create mode 100644 docs/specs/acpi_hotplug.txt
> >  create mode 100644 docs/specs/fwcfg.txt
> >  create mode 100644 hw/dimm.c
> >  create mode 100644 hw/dimm.h
> >
> > Vasilis Liaskovitis (7):
> >   Add ACPI_EXTRACT_DEVICE* macros
> >   Add SSDT memory device support
> >   acpi-dsdt: Implement functions for memory hotplug.
> >   acpi: generate hotplug memory devices.
> >   pciinit: Fix pcimem_start value
> >   acpi_dsdt: Support _OST dimm method
> >   acpi_dsdt: Revert internal dimm state on _OST failure
> >
> >  Makefile              |    2 +-
> >  src/acpi-dsdt.dsl     |  120 ++++++++++++++++++++++++++++++++++++-
> >  src/acpi.c            |  158 +++++++++++++++++++++++++++++++++++++++++++++++--
> >  src/pciinit.c         |    2 +-
> >  src/ssdt-mem.dsl      |   69 +++++++++++++++++++++
> >  tools/acpi_extract.py |   28 +++++++++
> >  6 files changed, 369 insertions(+), 10 deletions(-)
> >  create mode 100644 src/ssdt-mem.dsl
> >

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH v2 00/21] ACPI memory hotplug
  2012-07-13 17:49     ` Vasilis Liaskovitis
@ 2012-07-14  9:08       ` Blue Swirl
  -1 siblings, 0 replies; 86+ messages in thread
From: Blue Swirl @ 2012-07-14  9:08 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

On Fri, Jul 13, 2012 at 5:49 PM, Vasilis Liaskovitis
<vasilis.liaskovitis@profitbricks.com> wrote:
> On Thu, Jul 12, 2012 at 08:04:56PM +0000, Blue Swirl wrote:
>> On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
>> <vasilis.liaskovitis@profitbricks.com> wrote:
>> > This is v2 of the ACPI memory hotplug prototype for x86_64 target.
>>
>> I think the concept of DIMMs (what about SIMMs? SODIMMs? I liked
>> memslot) would be useful for most targets, but hotplugging may be
>> limited to x86 only. It would be nice to keep these two separate or as
>> loosely coupled as possible.
>
> agreed.
> what specific usecases besides hotplugging are you thinking about?

Most real boards have some kind of RAM module slots. Now this is
implemented with -m option, but a generic memory slot model would be
more accurate. Also the memory layout needs to be communicated to BIOS
somehow unless we want to spend cycles for BIOS memory probes. The
NUMA fw_cfg memory description should be usable for most cases even
for embedded UP machines.

> Also are there non-acpi hotplug platforms?

Some enterprise-class Sparc and PPC machines support memory hotplug.

>
> I am trying to keep generic dimm manipulation functions (e.g. population /
> depopulation and searching) in hw/dimm[.ch]. Currently the x86-acpi_piix4 "backend"
> registers a callback for hot-add / hot-remove. In theory other hotplug backends
> can hook in.
>
> btw I don't mind using "-memslot" (I think someone during v1 mentioned -dimm), we just
> need some consensus on the naming.
>
>>
>> >
>> > Changes v1->v2
>> >
>> > - memory map is automatically calculated for hotplug dimms. Dimms are added from
>> > top-of-memory skipping the pci hole at [PCI_HOLE_START, 4G).
>> > - Renamed from "-memslot" to "-dimm". Commands changed to "dimm_add", "dimm_del".
>> > - Seabios ejection array reduced to a byte. Use extraction macros for dimm ssdt.
>> > - additional SRAT paravirt info does not break previous SRAT fw_cfg layout.
>> > - Documentation of new acpi_piix4 registers and paravirt data.
>> > - add ACPI _OST support for _OST enabled guests. This allows qemu to receive
>> > notification for success / failure of memory hot-add and hot-remove operations.
>> > Guest needs to support _OST (https://lkml.org/lkml/2012/6/25/321)
>> > - add monitor info command to report total guest memory (initial + hot-added)
>> > - add command line options and monitor commands for batch dimm creation/population
>> >
>> > Overview:
>> >
>> > Dimm devices are modeled with a new qemu command line
>> >
>> > "-dimm id=name,size=sz,node=pxm,populated=on|off"
>> >
>> > As already mentioned, the starting physical address for all dimms is calculated
>> > automatically from top of memory, skipping the pci hole at [PCI_HOLE_START, 4G).
>> > Node is defining numa proximity for this dimm. When not defined it defaults
>> > to zero.
>> > "-dimm id=dimm0,size=512M,node=0,populated=off"
>> > will define a 512M memory slot belonging to numa node 0.
>> >
>> > Dimms are added or removed with a new hmp command "dimm_add/dimm_del":
>> > Hot-add syntax: "dimm_add id"
>> > Hot-remove syntax: "dimm_del id"
>> >
>> > Issues:
>> >
>> > - Live migration works as long as populated field is changed to "on" for
>> > hotplugged dimms at the destination qemu command line (patch 12/21 lifts
>> > this requirement). The DimmState structure does not yet define a
>> > VMStateDescription, but i assume this is the preferred way to pass state
>> > for migration.
>> >
>> > - Dimms are abstracted as qdevices attached to the main system bus. However,
>> > memory hotplugging has its own side channel ignoring main_system_bus's hotplug
>> > incapability. A cleaner integration is still needed, probably attaching memory
>> > devices as children-links of an acpi-capable device (in the pc case acpi_piix4)
>> > instead of the system bus (TBD). Then device_add/device_del instead of new
>> > commands can hopefully be used.
>> >
>> > Comments/review welcome.
>> >
>> > series is based on uq/master for qemu-kvm, and master for seabios. Can be found
>> > also at:
>> > http://github.com/vliaskov/qemu-kvm/commits/memhp-v2
>> > http://github.com/vliaskov/seabios/commits/memhp-v2
>> >
>> > Vasilis Liaskovitis (14):
>> >   dimm: Implement memory device abstraction
>> >   acpi_piix4: Implement memory device hotplug registers
>> >   pc: calculate dimm physical addresses and adjust memory map
>> >   pc: Add dimm paravirt SRAT info
>> >   Implement "-dimm" command line option
>> >   Implement dimm_add and dimm_del commands for hmp and qmp
>> >   fix live-migration when "populated=on" is missing
>> >   Implement memory hotplug notification lists
>> >   acpi_piix4: _OST dimm support
>> >   acpi_piix4: Update dimm state on VM reboot
>> >   acpi_piix4: Update dimm bitmap state on hot-remove fail
>> >   Implement "info memtotal" and "query-memtotal"
>> >   Implement -dimms, -dimmspop command line options
>> >   Implement mem_increase, mem_decrease hmp/qmp commands
>> >
>> >  arch_init.c                 |   23 ++-
>> >  docs/specs/acpi_hotplug.txt |   46 +++++
>> >  docs/specs/fwcfg.txt        |   28 +++
>> >  hmp-commands.hx             |   67 +++++++
>> >  hmp.c                       |   24 +++
>> >  hmp.h                       |    2 +
>> >  hw/Makefile.objs            |    2 +-
>> >  hw/acpi_piix4.c             |  131 ++++++++++++-
>> >  hw/dimm.c                   |  449 +++++++++++++++++++++++++++++++++++++++++++
>> >  hw/dimm.h                   |   72 +++++++
>> >  hw/pc.c                     |   94 +++++++++-
>> >  hw/pc.h                     |    6 +
>> >  hw/pc_piix.c                |   18 ++-
>> >  monitor.c                   |   35 ++++
>> >  monitor.h                   |    5 +
>> >  qapi-schema.json            |   38 ++++
>> >  qemu-config.c               |   70 +++++++
>> >  qemu-options.hx             |   15 ++
>> >  qmp-commands.hx             |  137 +++++++++++++
>> >  sysemu.h                    |    1 +
>> >  vl.c                        |  122 ++++++++++++-
>> >  21 files changed, 1368 insertions(+), 17 deletions(-)
>> >  create mode 100644 docs/specs/acpi_hotplug.txt
>> >  create mode 100644 docs/specs/fwcfg.txt
>> >  create mode 100644 hw/dimm.c
>> >  create mode 100644 hw/dimm.h
>> >
>> > Vasilis Liaskovitis (7):
>> >   Add ACPI_EXTRACT_DEVICE* macros
>> >   Add SSDT memory device support
>> >   acpi-dsdt: Implement functions for memory hotplug.
>> >   acpi: generate hotplug memory devices.
>> >   pciinit: Fix pcimem_start value
>> >   acpi_dsdt: Support _OST dimm method
>> >   acpi_dsdt: Revert internal dimm state on _OST failure
>> >
>> >  Makefile              |    2 +-
>> >  src/acpi-dsdt.dsl     |  120 ++++++++++++++++++++++++++++++++++++-
>> >  src/acpi.c            |  158 +++++++++++++++++++++++++++++++++++++++++++++++--
>> >  src/pciinit.c         |    2 +-
>> >  src/ssdt-mem.dsl      |   69 +++++++++++++++++++++
>> >  tools/acpi_extract.py |   28 +++++++++
>> >  6 files changed, 369 insertions(+), 10 deletions(-)
>> >  create mode 100644 src/ssdt-mem.dsl
>> >

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/21] ACPI memory hotplug
@ 2012-07-14  9:08       ` Blue Swirl
  0 siblings, 0 replies; 86+ messages in thread
From: Blue Swirl @ 2012-07-14  9:08 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

On Fri, Jul 13, 2012 at 5:49 PM, Vasilis Liaskovitis
<vasilis.liaskovitis@profitbricks.com> wrote:
> On Thu, Jul 12, 2012 at 08:04:56PM +0000, Blue Swirl wrote:
>> On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
>> <vasilis.liaskovitis@profitbricks.com> wrote:
>> > This is v2 of the ACPI memory hotplug prototype for x86_64 target.
>>
>> I think the concept of DIMMs (what about SIMMs? SODIMMs? I liked
>> memslot) would be useful for most targets, but hotplugging may be
>> limited to x86 only. It would be nice to keep these two separate or as
>> loosely coupled as possible.
>
> agreed.
> what specific usecases besides hotplugging are you thinking about?

Most real boards have some kind of RAM module slots. Now this is
implemented with -m option, but a generic memory slot model would be
more accurate. Also the memory layout needs to be communicated to BIOS
somehow unless we want to spend cycles for BIOS memory probes. The
NUMA fw_cfg memory description should be usable for most cases even
for embedded UP machines.

> Also are there non-acpi hotplug platforms?

Some enterprise-class Sparc and PPC machines support memory hotplug.

>
> I am trying to keep generic dimm manipulation functions (e.g. population /
> depopulation and searching) in hw/dimm[.ch]. Currently the x86-acpi_piix4 "backend"
> registers a callback for hot-add / hot-remove. In theory other hotplug backends
> can hook in.
>
> btw I don't mind using "-memslot" (I think someone during v1 mentioned -dimm), we just
> need some consensus on the naming.
>
>>
>> >
>> > Changes v1->v2
>> >
>> > - memory map is automatically calculated for hotplug dimms. Dimms are added from
>> > top-of-memory skipping the pci hole at [PCI_HOLE_START, 4G).
>> > - Renamed from "-memslot" to "-dimm". Commands changed to "dimm_add", "dimm_del".
>> > - Seabios ejection array reduced to a byte. Use extraction macros for dimm ssdt.
>> > - additional SRAT paravirt info does not break previous SRAT fw_cfg layout.
>> > - Documentation of new acpi_piix4 registers and paravirt data.
>> > - add ACPI _OST support for _OST enabled guests. This allows qemu to receive
>> > notification for success / failure of memory hot-add and hot-remove operations.
>> > Guest needs to support _OST (https://lkml.org/lkml/2012/6/25/321)
>> > - add monitor info command to report total guest memory (initial + hot-added)
>> > - add command line options and monitor commands for batch dimm creation/population
>> >
>> > Overview:
>> >
>> > Dimm devices are modeled with a new qemu command line
>> >
>> > "-dimm id=name,size=sz,node=pxm,populated=on|off"
>> >
>> > As already mentioned, the starting physical address for all dimms is calculated
>> > automatically from top of memory, skipping the pci hole at [PCI_HOLE_START, 4G).
>> > Node is defining numa proximity for this dimm. When not defined it defaults
>> > to zero.
>> > "-dimm id=dimm0,size=512M,node=0,populated=off"
>> > will define a 512M memory slot belonging to numa node 0.
>> >
>> > Dimms are added or removed with a new hmp command "dimm_add/dimm_del":
>> > Hot-add syntax: "dimm_add id"
>> > Hot-remove syntax: "dimm_del id"
>> >
>> > Issues:
>> >
>> > - Live migration works as long as populated field is changed to "on" for
>> > hotplugged dimms at the destination qemu command line (patch 12/21 lifts
>> > this requirement). The DimmState structure does not yet define a
>> > VMStateDescription, but i assume this is the preferred way to pass state
>> > for migration.
>> >
>> > - Dimms are abstracted as qdevices attached to the main system bus. However,
>> > memory hotplugging has its own side channel ignoring main_system_bus's hotplug
>> > incapability. A cleaner integration is still needed, probably attaching memory
>> > devices as children-links of an acpi-capable device (in the pc case acpi_piix4)
>> > instead of the system bus (TBD). Then device_add/device_del instead of new
>> > commands can hopefully be used.
>> >
>> > Comments/review welcome.
>> >
>> > series is based on uq/master for qemu-kvm, and master for seabios. Can be found
>> > also at:
>> > http://github.com/vliaskov/qemu-kvm/commits/memhp-v2
>> > http://github.com/vliaskov/seabios/commits/memhp-v2
>> >
>> > Vasilis Liaskovitis (14):
>> >   dimm: Implement memory device abstraction
>> >   acpi_piix4: Implement memory device hotplug registers
>> >   pc: calculate dimm physical addresses and adjust memory map
>> >   pc: Add dimm paravirt SRAT info
>> >   Implement "-dimm" command line option
>> >   Implement dimm_add and dimm_del commands for hmp and qmp
>> >   fix live-migration when "populated=on" is missing
>> >   Implement memory hotplug notification lists
>> >   acpi_piix4: _OST dimm support
>> >   acpi_piix4: Update dimm state on VM reboot
>> >   acpi_piix4: Update dimm bitmap state on hot-remove fail
>> >   Implement "info memtotal" and "query-memtotal"
>> >   Implement -dimms, -dimmspop command line options
>> >   Implement mem_increase, mem_decrease hmp/qmp commands
>> >
>> >  arch_init.c                 |   23 ++-
>> >  docs/specs/acpi_hotplug.txt |   46 +++++
>> >  docs/specs/fwcfg.txt        |   28 +++
>> >  hmp-commands.hx             |   67 +++++++
>> >  hmp.c                       |   24 +++
>> >  hmp.h                       |    2 +
>> >  hw/Makefile.objs            |    2 +-
>> >  hw/acpi_piix4.c             |  131 ++++++++++++-
>> >  hw/dimm.c                   |  449 +++++++++++++++++++++++++++++++++++++++++++
>> >  hw/dimm.h                   |   72 +++++++
>> >  hw/pc.c                     |   94 +++++++++-
>> >  hw/pc.h                     |    6 +
>> >  hw/pc_piix.c                |   18 ++-
>> >  monitor.c                   |   35 ++++
>> >  monitor.h                   |    5 +
>> >  qapi-schema.json            |   38 ++++
>> >  qemu-config.c               |   70 +++++++
>> >  qemu-options.hx             |   15 ++
>> >  qmp-commands.hx             |  137 +++++++++++++
>> >  sysemu.h                    |    1 +
>> >  vl.c                        |  122 ++++++++++++-
>> >  21 files changed, 1368 insertions(+), 17 deletions(-)
>> >  create mode 100644 docs/specs/acpi_hotplug.txt
>> >  create mode 100644 docs/specs/fwcfg.txt
>> >  create mode 100644 hw/dimm.c
>> >  create mode 100644 hw/dimm.h
>> >
>> > Vasilis Liaskovitis (7):
>> >   Add ACPI_EXTRACT_DEVICE* macros
>> >   Add SSDT memory device support
>> >   acpi-dsdt: Implement functions for memory hotplug.
>> >   acpi: generate hotplug memory devices.
>> >   pciinit: Fix pcimem_start value
>> >   acpi_dsdt: Support _OST dimm method
>> >   acpi_dsdt: Revert internal dimm state on _OST failure
>> >
>> >  Makefile              |    2 +-
>> >  src/acpi-dsdt.dsl     |  120 ++++++++++++++++++++++++++++++++++++-
>> >  src/acpi.c            |  158 +++++++++++++++++++++++++++++++++++++++++++++++--
>> >  src/pciinit.c         |    2 +-
>> >  src/ssdt-mem.dsl      |   69 +++++++++++++++++++++
>> >  tools/acpi_extract.py |   28 +++++++++
>> >  6 files changed, 369 insertions(+), 10 deletions(-)
>> >  create mode 100644 src/ssdt-mem.dsl
>> >

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH v2 03/21][SeaBIOS] acpi-dsdt: Implement functions for memory hotplug
  2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
@ 2012-07-17  7:23     ` Wen Congyang
  -1 siblings, 0 replies; 86+ messages in thread
From: Wen Congyang @ 2012-07-17  7:23 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: qemu-devel, kvm, seabios, avi, anthony, gleb, imammedo, kevin

At 07/11/2012 06:31 PM, Vasilis Liaskovitis Wrote:
> Extend the DSDT to include methods for handling memory hot-add and hot-remove
> notifications and memory device status requests. These functions are called
> from the memory device SSDT methods.
> 
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> ---
>  src/acpi-dsdt.dsl |   70 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 68 insertions(+), 2 deletions(-)
> 
> diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
> index 2060686..5d3e92b 100644
> --- a/src/acpi-dsdt.dsl
> +++ b/src/acpi-dsdt.dsl
> @@ -737,6 +737,71 @@ DefinitionBlock (
>              }
>              Return(One)
>          }
> +        /* Objects filled in by run-time generated SSDT */
> +        External(MTFY, MethodObj)
> +        External(MEON, PkgObj)
> +
> +        Method (CMST, 1, NotSerialized) {
> +            // _STA method - return ON status of memdevice
> +            // Local0 = MEON flag for this cpu
> +            Store(DerefOf(Index(MEON, Arg0)), Local0)
> +            If (Local0) { Return(0xF) } Else { Return(0x0) }
> +        }
> +
> +        /* Memory hotplug notify array */
> +        OperationRegion(MEST, SystemIO, 0xaf80, 32)
> +        Field (MEST, ByteAcc, NoLock, Preserve)
> +        {
> +            MES, 256
> +        }
> + 
> +        /* Memory eject byte */
> +        OperationRegion(MEMJ, SystemIO, 0xafa0, 1)
> +        Field (MEMJ, ByteAcc, NoLock, Preserve)
> +        {
> +            MPE, 8
> +        }
> +        
> +        Method(MESC, 0) {
> +            // Local5 = active memdevice bitmap
> +            Store (MES, Local5)
> +            // Local2 = last read byte from bitmap
> +            Store (Zero, Local2)
> +            // Local0 = memory device iterator
> +            Store (Zero, Local0)
> +            While (LLess(Local0, SizeOf(MEON))) {
> +                // Local1 = MEON flag for this memory device
> +                Store(DerefOf(Index(MEON, Local0)), Local1)
> +                If (And(Local0, 0x07)) {
> +                    // Shift down previously read bitmap byte
> +                    ShiftRight(Local2, 1, Local2)
> +                } Else {
> +                    // Read next byte from memdevice bitmap
> +                    Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), Local2)
> +                }
> +                // Local3 = active state for this memory device
> +                Store(And(Local2, 1), Local3)
> +
> +                If (LNotEqual(Local1, Local3)) {

There are two ways to hot remove a memory device:
1. dimm_del
2. echo 1 >/sys/bus/acpi/devices/PNP0C80:XX/eject

In the 2nd case, we cannot hotplug this memory device again,
because both Local1 and Local3 are 1.

So, I think MEON flag for this meory device should be set to 0 in method _EJ0
or implement method _PS3 for memory device.

Thanks
Wen Congyang

> +                    // State change - update MEON with new state
> +                    Store(Local3, Index(MEON, Local0))
> +                    // Do MEM notify
> +                    If (LEqual(Local3, 1)) {
> +                        MTFY(Local0, 1)
> +                    } Else {
> +                        MTFY(Local0, 3)
> +                    }
> +                }
> +                Increment(Local0)
> +            }
> +            Return(One)
> +        }
> +
> +        Method (MPEJ, 2, NotSerialized) {
> +            // _EJ0 method - eject callback
> +            Store(Arg0, MPE)
> +            Sleep(200)
> +        }
>      }
>  
>  
> @@ -759,8 +824,9 @@ DefinitionBlock (
>              // CPU hotplug event
>              Return(\_SB.PRSC())
>          }
> -        Method(_L03) {
> -            Return(0x01)
> +        Method(_E03) {
> +            // Memory hotplug event
> +            Return(\_SB.MESC())
>          }
>          Method(_L04) {
>              Return(0x01)


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 03/21][SeaBIOS] acpi-dsdt: Implement functions for memory hotplug
@ 2012-07-17  7:23     ` Wen Congyang
  0 siblings, 0 replies; 86+ messages in thread
From: Wen Congyang @ 2012-07-17  7:23 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

At 07/11/2012 06:31 PM, Vasilis Liaskovitis Wrote:
> Extend the DSDT to include methods for handling memory hot-add and hot-remove
> notifications and memory device status requests. These functions are called
> from the memory device SSDT methods.
> 
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> ---
>  src/acpi-dsdt.dsl |   70 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 68 insertions(+), 2 deletions(-)
> 
> diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
> index 2060686..5d3e92b 100644
> --- a/src/acpi-dsdt.dsl
> +++ b/src/acpi-dsdt.dsl
> @@ -737,6 +737,71 @@ DefinitionBlock (
>              }
>              Return(One)
>          }
> +        /* Objects filled in by run-time generated SSDT */
> +        External(MTFY, MethodObj)
> +        External(MEON, PkgObj)
> +
> +        Method (CMST, 1, NotSerialized) {
> +            // _STA method - return ON status of memdevice
> +            // Local0 = MEON flag for this cpu
> +            Store(DerefOf(Index(MEON, Arg0)), Local0)
> +            If (Local0) { Return(0xF) } Else { Return(0x0) }
> +        }
> +
> +        /* Memory hotplug notify array */
> +        OperationRegion(MEST, SystemIO, 0xaf80, 32)
> +        Field (MEST, ByteAcc, NoLock, Preserve)
> +        {
> +            MES, 256
> +        }
> + 
> +        /* Memory eject byte */
> +        OperationRegion(MEMJ, SystemIO, 0xafa0, 1)
> +        Field (MEMJ, ByteAcc, NoLock, Preserve)
> +        {
> +            MPE, 8
> +        }
> +        
> +        Method(MESC, 0) {
> +            // Local5 = active memdevice bitmap
> +            Store (MES, Local5)
> +            // Local2 = last read byte from bitmap
> +            Store (Zero, Local2)
> +            // Local0 = memory device iterator
> +            Store (Zero, Local0)
> +            While (LLess(Local0, SizeOf(MEON))) {
> +                // Local1 = MEON flag for this memory device
> +                Store(DerefOf(Index(MEON, Local0)), Local1)
> +                If (And(Local0, 0x07)) {
> +                    // Shift down previously read bitmap byte
> +                    ShiftRight(Local2, 1, Local2)
> +                } Else {
> +                    // Read next byte from memdevice bitmap
> +                    Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), Local2)
> +                }
> +                // Local3 = active state for this memory device
> +                Store(And(Local2, 1), Local3)
> +
> +                If (LNotEqual(Local1, Local3)) {

There are two ways to hot remove a memory device:
1. dimm_del
2. echo 1 >/sys/bus/acpi/devices/PNP0C80:XX/eject

In the 2nd case, we cannot hotplug this memory device again,
because both Local1 and Local3 are 1.

So, I think MEON flag for this meory device should be set to 0 in method _EJ0
or implement method _PS3 for memory device.

Thanks
Wen Congyang

> +                    // State change - update MEON with new state
> +                    Store(Local3, Index(MEON, Local0))
> +                    // Do MEM notify
> +                    If (LEqual(Local3, 1)) {
> +                        MTFY(Local0, 1)
> +                    } Else {
> +                        MTFY(Local0, 3)
> +                    }
> +                }
> +                Increment(Local0)
> +            }
> +            Return(One)
> +        }
> +
> +        Method (MPEJ, 2, NotSerialized) {
> +            // _EJ0 method - eject callback
> +            Store(Arg0, MPE)
> +            Sleep(200)
> +        }
>      }
>  
>  
> @@ -759,8 +824,9 @@ DefinitionBlock (
>              // CPU hotplug event
>              Return(\_SB.PRSC())
>          }
> -        Method(_L03) {
> -            Return(0x01)
> +        Method(_E03) {
> +            // Memory hotplug event
> +            Return(\_SB.MESC())
>          }
>          Method(_L04) {
>              Return(0x01)

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH v2 03/21][SeaBIOS] acpi-dsdt: Implement functions for memory hotplug
  2012-07-17  7:23     ` [Qemu-devel] " Wen Congyang
@ 2012-07-20  8:48       ` Vasilis Liaskovitis
  -1 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-20  8:48 UTC (permalink / raw)
  To: Wen Congyang
  Cc: qemu-devel, kvm, seabios, avi, anthony, gleb, imammedo, kevin

On Tue, Jul 17, 2012 at 03:23:00PM +0800, Wen Congyang wrote:
> > +        Method(MESC, 0) {
> > +            // Local5 = active memdevice bitmap
> > +            Store (MES, Local5)
> > +            // Local2 = last read byte from bitmap
> > +            Store (Zero, Local2)
> > +            // Local0 = memory device iterator
> > +            Store (Zero, Local0)
> > +            While (LLess(Local0, SizeOf(MEON))) {
> > +                // Local1 = MEON flag for this memory device
> > +                Store(DerefOf(Index(MEON, Local0)), Local1)
> > +                If (And(Local0, 0x07)) {
> > +                    // Shift down previously read bitmap byte
> > +                    ShiftRight(Local2, 1, Local2)
> > +                } Else {
> > +                    // Read next byte from memdevice bitmap
> > +                    Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), Local2)
> > +                }
> > +                // Local3 = active state for this memory device
> > +                Store(And(Local2, 1), Local3)
> > +
> > +                If (LNotEqual(Local1, Local3)) {
> 
> There are two ways to hot remove a memory device:
> 1. dimm_del
> 2. echo 1 >/sys/bus/acpi/devices/PNP0C80:XX/eject
> 
> In the 2nd case, we cannot hotplug this memory device again,
> because both Local1 and Local3 are 1.
> 
> So, I think MEON flag for this meory device should be set to 0 in method _EJ0
> or implement method _PS3 for memory device.

good catch. Both internal seabios state (MEON) and the machine qemu bitmap
(mems_sts in hw/acpi_piix4.c) have to be updated when the ejection comes from
OSPM action. I will implement a _PS3 method that updates the MEON flag and also
signals qemu to change the mems_sts bitmap.

thanks,
- Vasilis


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 03/21][SeaBIOS] acpi-dsdt: Implement functions for memory hotplug
@ 2012-07-20  8:48       ` Vasilis Liaskovitis
  0 siblings, 0 replies; 86+ messages in thread
From: Vasilis Liaskovitis @ 2012-07-20  8:48 UTC (permalink / raw)
  To: Wen Congyang
  Cc: kvm, gleb, seabios, qemu-devel, kevin, avi, anthony, imammedo

On Tue, Jul 17, 2012 at 03:23:00PM +0800, Wen Congyang wrote:
> > +        Method(MESC, 0) {
> > +            // Local5 = active memdevice bitmap
> > +            Store (MES, Local5)
> > +            // Local2 = last read byte from bitmap
> > +            Store (Zero, Local2)
> > +            // Local0 = memory device iterator
> > +            Store (Zero, Local0)
> > +            While (LLess(Local0, SizeOf(MEON))) {
> > +                // Local1 = MEON flag for this memory device
> > +                Store(DerefOf(Index(MEON, Local0)), Local1)
> > +                If (And(Local0, 0x07)) {
> > +                    // Shift down previously read bitmap byte
> > +                    ShiftRight(Local2, 1, Local2)
> > +                } Else {
> > +                    // Read next byte from memdevice bitmap
> > +                    Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), Local2)
> > +                }
> > +                // Local3 = active state for this memory device
> > +                Store(And(Local2, 1), Local3)
> > +
> > +                If (LNotEqual(Local1, Local3)) {
> 
> There are two ways to hot remove a memory device:
> 1. dimm_del
> 2. echo 1 >/sys/bus/acpi/devices/PNP0C80:XX/eject
> 
> In the 2nd case, we cannot hotplug this memory device again,
> because both Local1 and Local3 are 1.
> 
> So, I think MEON flag for this meory device should be set to 0 in method _EJ0
> or implement method _PS3 for memory device.

good catch. Both internal seabios state (MEON) and the machine qemu bitmap
(mems_sts in hw/acpi_piix4.c) have to be updated when the ejection comes from
OSPM action. I will implement a _PS3 method that updates the MEON flag and also
signals qemu to change the mems_sts bitmap.

thanks,
- Vasilis

^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2012-07-20  8:48 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-11 10:31 [RFC PATCH v2 00/21] ACPI memory hotplug Vasilis Liaskovitis
2012-07-11 10:31 ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:31 ` [RFC PATCH v2 01/21][SeaBIOS] Add ACPI_EXTRACT_DEVICE* macros Vasilis Liaskovitis
2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:31 ` [RFC PATCH v2 02/21][SeaBIOS] Add SSDT memory device support Vasilis Liaskovitis
2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:31 ` [RFC PATCH v2 03/21][SeaBIOS] acpi-dsdt: Implement functions for memory hotplug Vasilis Liaskovitis
2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-17  7:23   ` Wen Congyang
2012-07-17  7:23     ` [Qemu-devel] " Wen Congyang
2012-07-20  8:48     ` Vasilis Liaskovitis
2012-07-20  8:48       ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:31 ` [RFC PATCH v2 04/21][SeaBIOS] acpi: generate hotplug memory devices Vasilis Liaskovitis
2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:48   ` Wen Congyang
2012-07-11 10:48     ` [Qemu-devel] " Wen Congyang
2012-07-11 16:39     ` Vasilis Liaskovitis
2012-07-11 16:39       ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:31 ` [RFC PATCH v2 05/21][SeaBIOS] pciinit: Fix pcimem_start value Vasilis Liaskovitis
2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 11:56   ` Gerd Hoffmann
2012-07-11 11:56     ` [Qemu-devel] " Gerd Hoffmann
2012-07-11 16:45     ` Vasilis Liaskovitis
2012-07-11 16:45       ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-12  7:22       ` Gerd Hoffmann
2012-07-12  7:22         ` [Qemu-devel] " Gerd Hoffmann
2012-07-12  9:09         ` Vasilis Liaskovitis
2012-07-12  9:09           ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:31 ` [RFC PATCH v2 06/21] dimm: Implement memory device abstraction Vasilis Liaskovitis
2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-12 19:55   ` Blue Swirl
2012-07-12 19:55     ` [Qemu-devel] " Blue Swirl
2012-07-13 17:39     ` Vasilis Liaskovitis
2012-07-13 17:39       ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:31 ` [RFC PATCH v2 07/21] acpi_piix4: Implement memory device hotplug registers Vasilis Liaskovitis
2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:31 ` [RFC PATCH v2 08/21] pc: calculate dimm physical addresses and adjust memory map Vasilis Liaskovitis
2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:31 ` [RFC PATCH v2 09/21] pc: Add dimm paravirt SRAT info Vasilis Liaskovitis
2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-12 19:48   ` Blue Swirl
2012-07-12 19:48     ` [Qemu-devel] " Blue Swirl
2012-07-13 17:40     ` Vasilis Liaskovitis
2012-07-13 17:40       ` Vasilis Liaskovitis
2012-07-11 10:31 ` [RFC PATCH v2 10/21] Implement "-dimm" command line option Vasilis Liaskovitis
2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:31 ` [RFC PATCH v2 11/21] Implement dimm_add and dimm_del hmp/qmp commands Vasilis Liaskovitis
2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:31 ` [RFC PATCH v2 12/21] fix live-migration when "populated=on" is missing Vasilis Liaskovitis
2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:31 ` [RFC PATCH v2 13/21] Implement memory hotplug notification lists Vasilis Liaskovitis
2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 14:59   ` Eric Blake
2012-07-11 14:59     ` Eric Blake
2012-07-11 16:47     ` Vasilis Liaskovitis
2012-07-11 16:47       ` Vasilis Liaskovitis
2012-07-11 10:31 ` [RFC PATCH v2 14/21][SeaBIOS] acpi_dsdt: Support _OST dimm method Vasilis Liaskovitis
2012-07-11 10:31   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:32 ` [RFC PATCH v2 15/21] acpi_piix4: _OST dimm support Vasilis Liaskovitis
2012-07-11 10:32   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:32 ` [RFC PATCH v2 16/21] acpi_piix4: Update dimm state on VM reboot Vasilis Liaskovitis
2012-07-11 10:32   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:32 ` [RFC PATCH v2 17/21][SeaBIOS] acpi_dsdt: Revert internal dimm state on _OST failure Vasilis Liaskovitis
2012-07-11 10:32   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:32 ` [RFC PATCH v2 18/21] acpi_piix4: Update dimm bitmap state on hot-remove fail Vasilis Liaskovitis
2012-07-11 10:32   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:32 ` [RFC PATCH v2 19/21] Implement "info memtotal" and "query-memtotal" Vasilis Liaskovitis
2012-07-11 10:32   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 15:14   ` Eric Blake
2012-07-11 15:14     ` [Qemu-devel] " Eric Blake
2012-07-11 16:55     ` Vasilis Liaskovitis
2012-07-11 16:55       ` Vasilis Liaskovitis
2012-07-11 10:32 ` [RFC PATCH v2 20/21] Implement -dimms, -dimmspop command line options Vasilis Liaskovitis
2012-07-11 10:32   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 14:55   ` Avi Kivity
2012-07-11 14:55     ` [Qemu-devel] " Avi Kivity
2012-07-11 16:57     ` Vasilis Liaskovitis
2012-07-11 16:57       ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-11 10:32 ` [RFC PATCH v2 21/21] Implement mem_increase, mem_decrease hmp/qmp commands Vasilis Liaskovitis
2012-07-11 10:32   ` [Qemu-devel] " Vasilis Liaskovitis
2012-07-12 20:04 ` [Qemu-devel] [RFC PATCH v2 00/21] ACPI memory hotplug Blue Swirl
2012-07-12 20:04   ` Blue Swirl
2012-07-13 17:49   ` Vasilis Liaskovitis
2012-07-13 17:49     ` Vasilis Liaskovitis
2012-07-14  9:08     ` Blue Swirl
2012-07-14  9:08       ` [Qemu-devel] " Blue Swirl

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.