All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/9] ACPI memory hotplug
@ 2012-04-19 14:08 Vasilis Liaskovitis
  2012-04-19 14:08 ` [RFC PATCH 1/9][SeaBIOS] Add SSDT memory device support Vasilis Liaskovitis
                   ` (11 more replies)
  0 siblings, 12 replies; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-19 14:08 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios; +Cc: gleb, avi, Vasilis Liaskovitis

This is a prototype for ACPI memory hotplug on x86_64 target. Based on some
earlier work and comments from Gleb.

Memslot devices are modeled with a new qemu command line 

"-memslot id=name,start=start_addr,size=sz,node=pxm"

user is responsible for defining memslots with meaningful start/size values,
e.g. not defining a memory slot over a PCI-hole. Alternatively, the start size
could also be handled/assigned automatically from the specific emulated hardware
(for hw/pc.c PCI hole is currently [below_4g_mem_size, 4G), so hotplugged memory
should start from max(4G, above_4g_mem_size).

Node is defining numa proximity for this memslot. When not defined it defaults
to zero.

e.g. "-memslot id=hot1,start=4294967296,size=536870912,node=0"
will define a 512M memory slot starting at physical address 4G, belonging to numa node 0.

Memory slots are added or removed with a new hmp command "memslot":
Hot-add syntax: "memslot id add"
Hot-remove syntax: "memslot id delete"

- All memslots are initially unpopulated. Memslots are currently modeling only
hotplug-able memory slots i.e. initial system memory is not modeled with
memslots. The concept could be generalized to include all memory though, or it
could more closely follow kvm-memory slots.

- Memslots are abstracted as qdevices attached to the main system bus. However,
memory hotplugging has its own side channel ignoring main_system_bus's hotplug
incapability. A cleaner integration would be needed. What's  the preferred
way of modeling memory devices in qom? Would it be better to attach memory
devices as children-links of an acpi-capable device (in the pc case acpi_piix4)
instead of the system bus?

- Refcounting memory slots has been discussed (but is not included in this 
series yet). Depopulating a memory region happens on a guestOS _EJ callback,
which means the guestOS will not be using the region anymore. However, guest
addresses from the depopulated region need to also be unmapped from the qemu
address space using cpu_physical_memory_unmap(). Does memory_region_del_subregion()
or some other memory API call guarantee that a memoryregion has been unmapped
from qemu's address space?

- What is the expected behaviour of hotplugged memory after a reboot? Is it
supposed to be persistent after reboots? The last 2 patches in the series try to
make hotplugged memslots persistent after reboot by creating and consulting e820
map entries.  A better solution is needed for hot-remove after a reboot, because
e820 entries can be merged.

series is based on uq/master for qemu-kvm, and master for seabios. Can be found
also at:


Vasilis Liaskovitis (9):
  Seabios: Add SSDT memory device support
  Seabios, acpi: Implement acpi-dsdt functions for memory hotplug.
  Seabios, acpi: generate hotplug memory devices.
  Implement memslot device abstraction
  acpi_piix4: Implement memory device hotplug registers and handlers. 
  pc: pass paravirt info for hotplug memory slots to BIOS
  Implement memslot command-line option and memslot hmp monitor command
  pc: adjust e820 map on hot-add and hot-remove
  Seabios, acpi: enable memory devices if e820 entry is present

 Makefile.objs   |    2 +-
 hmp-commands.hx |   15 ++++
 hw/acpi_piix4.c |  103 +++++++++++++++++++++++++++-
 hw/memslot.c    |  201 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/memslot.h    |   44 ++++++++++++
 hw/pc.c         |   87 ++++++++++++++++++++++--
 hw/pc.h         |    1 +
 monitor.c       |    8 ++
 monitor.h       |    1 +
 qemu-config.c   |   25 +++++++
 qemu-options.hx |    8 ++
 sysemu.h        |    1 +
 vl.c            |   44 ++++++++++++-
 13 files changed, 528 insertions(+), 12 deletions(-)
 create mode 100644 hw/memslot.c
 create mode 100644 hw/memslot.h

 create mode 100644 src/ssdt-mem.dsl
 src/acpi-dsdt.dsl |   68 ++++++++++++++++++++++-
 src/acpi.c        |  155 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 src/memmap.c      |   15 +++++
 src/ssdt-mem.dsl  |   66 ++++++++++++++++++++++
 4 files changed, 298 insertions(+), 6 deletions(-)
 create mode 100644 src/ssdt-mem.dsl

-- 
1.7.9


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC PATCH 1/9][SeaBIOS] Add SSDT memory device support
  2012-04-19 14:08 [RFC PATCH 0/9] ACPI memory hotplug Vasilis Liaskovitis
@ 2012-04-19 14:08 ` Vasilis Liaskovitis
  2012-04-19 14:08 ` [RFC PATCH 2/9][SeaBIOS] Implement acpi-dsdt functions for memory hotplug Vasilis Liaskovitis
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-19 14:08 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios; +Cc: gleb, avi, Vasilis Liaskovitis

Define SSDT hotplug-able memory devices in _SB namespace. The dynamically
generated SSDT includes per memory device hotplug methods. These methods
just call methods defined in the DSDT. Also dynamically generate a MTFY
method and a MEON array of the online/available memory devices.  Add file
src/ssdt-mem.dsl with directions for generating the per-memory device
processor object AML code.
The design is taken from SSDT cpu generation.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 src/ssdt-mem.dsl |   66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 66 insertions(+), 0 deletions(-)
 create mode 100644 src/ssdt-mem.dsl

diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl
new file mode 100644
index 0000000..9586643
--- /dev/null
+++ b/src/ssdt-mem.dsl
@@ -0,0 +1,66 @@
+/* This file is the basis for the ssdt_mem[] variable in src/acpi.c.
+ * It is similar in design to the ssdt_proc variable.  
+ * It defines the contents of the per-cpu Processor() object.  At
+ * runtime, a dynamically generated SSDT will contain one copy of this
+ * AML snippet for every possible memory device in the system.  The 
+ * objects will * be placed in the \_SB_ namespace.
+ *
+ * To generate a new ssdt_memc[], run the commands:
+ *   cpp -P src/ssdt-mem.dsl > out/ssdt-mem.dsl.i
+ *   iasl -ta -p out/ssdt-mem out/ssdt-mem.dsl.i
+ *   tail -c +37 < out/ssdt-mem.aml | hexdump -e '"    " 8/1 "0x%02x," "\n"'
+ * and then cut-and-paste the output into the src/acpi.c ssdt_mem[]
+ * array.
+ *
+ * In addition to the aml code generated from this file, the
+ * src/acpi.c file creates a MEMNTFY method with an entry for each memdevice:
+ *     Method(MTFY, 2) {
+ *         If (LEqual(Arg0, 0x00)) { Notify(MP00, Arg1) }
+ *         If (LEqual(Arg0, 0x01)) { Notify(MP01, Arg1) }
+ *         ...
+ *     }
+ * and a MEON array with the list of active and inactive memory devices:
+ *     Name(MEON, Package() { One, One, ..., Zero, Zero, ... })
+ */
+DefinitionBlock ("ssdt-mem.aml", "SSDT", 0x02, "BXPC", "CSSDT", 0x1)
+/*  v------------------ DO NOT EDIT ------------------v */
+{
+    Device(MPAA) {
+        Name(ID, 0xAA)
+/*  ^------------------ DO NOT EDIT ------------------^
+ *
+ * The src/acpi.c code requires the above layout so that it can update
+ * MPAA and 0xAA with the appropriate MEMDEVICE id (see
+ * SD_OFFSET_MEMHEX/MEMID1/MEMID2).  Don't change the above without
+ * also updating the C code.
+ */
+        Name(_HID, EISAID("PNP0C80"))
+        Name(_PXM, 0xAA)
+
+        External(CMST, MethodObj)
+        External(MPEJ, MethodObj)
+
+        Name(_CRS, ResourceTemplate() {
+            QwordMemory(
+               ResourceConsumer,
+               ,
+               MinFixed,
+               MaxFixed,
+               Cacheable,
+               ReadWrite, 
+               0x0, 
+               0xDEADBEEF, 
+               0xE6ADBEEE, 
+               0x00000000,
+               0x08000000, 
+               )
+        })
+        Method (_STA, 0) {
+            Return(CMST(ID))        
+        }    
+        Method (_EJ0, 1, NotSerialized) {
+            MPEJ(ID, Arg0)
+        }
+    }
+}    
+
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 2/9][SeaBIOS] Implement acpi-dsdt functions for memory hotplug.
  2012-04-19 14:08 [RFC PATCH 0/9] ACPI memory hotplug Vasilis Liaskovitis
  2012-04-19 14:08 ` [RFC PATCH 1/9][SeaBIOS] Add SSDT memory device support Vasilis Liaskovitis
@ 2012-04-19 14:08 ` Vasilis Liaskovitis
  2012-04-20 10:55   ` Igor Mammedov
  2012-04-19 14:08 ` [RFC PATCH 3/9][SeaBIOS] acpi: generate hotplug memory devices Vasilis Liaskovitis
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-19 14:08 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios; +Cc: Vasilis Liaskovitis, avi, gleb

Extend the DSDT to include methods for handling memory hot-add and hot-remove
notifications and memory device status requests. These functions are called
from the memory device SSDT methods.

Eject has only been tested with level gpe event, but will be changed to edge gpe
event soon, according to recent master patch for other ACPI hotplug events.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 src/acpi-dsdt.dsl |   68 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 4bdc268..184daf0 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -709,9 +709,72 @@ DefinitionBlock (
             }
             Return(One)
         }
-    }
 
+        /* Objects filled in by run-time generated SSDT */
+        External(MTFY, MethodObj)
+        External(MEON, PkgObj)
+
+        Method (CMST, 1, NotSerialized) {
+            // _STA method - return ON status of memdevice
+            // Local0 = MEON flag for this cpu
+            Store(DerefOf(Index(MEON, Arg0)), Local0)
+            If (Local0) { Return(0xF) } Else { Return(0x0) }
+        }
+        /* Memory eject notify method */
+        OperationRegion(MEMJ, SystemIO, 0xaf40, 32)
+        Field (MEMJ, ByteAcc, NoLock, Preserve)
+        {
+            MPE, 256
+        }
+
+        Method (MPEJ, 2, NotSerialized) {
+            // _EJ0 method - eject callback
+            Store(ShiftLeft(1,Arg0), MPE)
+            Sleep(200)
+        }
+
+        /* Memory hotplug notify method */
+        OperationRegion(MEST, SystemIO, 0xaf20, 32)
+        Field (MEST, ByteAcc, NoLock, Preserve)
+        {
+            MES, 256
+        }
+
+        Method(MESC, 0) {
+            // Local5 = active memdevice bitmap
+            Store (MES, Local5)
+            // Local2 = last read byte from bitmap
+            Store (Zero, Local2)
+            // Local0 = memory device iterator
+            Store (Zero, Local0)
+            While (LLess(Local0, SizeOf(MEON))) {
+                // Local1 = MEON flag for this memory device
+                Store(DerefOf(Index(MEON, Local0)), Local1)
+                If (And(Local0, 0x07)) {
+                    // Shift down previously read bitmap byte
+                    ShiftRight(Local2, 1, Local2)
+                } Else {
+                    // Read next byte from memdevice bitmap
+                    Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), Local2)
+                }
+                // Local3 = active state for this memory device
+                Store(And(Local2, 1), Local3)
 
+                If (LNotEqual(Local1, Local3)) {
+                    // State change - update MEON with new state
+                    Store(Local3, Index(MEON, Local0))
+                    // Do MEM notify
+                    If (LEqual(Local3, 1)) {
+                        MTFY(Local0, 1)
+                    } Else {
+                        MTFY(Local0, 3)
+                    }
+                }
+                Increment(Local0)
+            }
+            Return(One)
+        }
+    }
 /****************************************************************
  * General purpose events
  ****************************************************************/
@@ -732,7 +795,8 @@ DefinitionBlock (
             Return(\_SB.PRSC())
         }
         Method(_L03) {
-            Return(0x01)
+            // Memory hotplug event
+            Return(\_SB.MESC())
         }
         Method(_L04) {
             Return(0x01)
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 3/9][SeaBIOS] acpi: generate hotplug memory devices.
  2012-04-19 14:08 [RFC PATCH 0/9] ACPI memory hotplug Vasilis Liaskovitis
  2012-04-19 14:08 ` [RFC PATCH 1/9][SeaBIOS] Add SSDT memory device support Vasilis Liaskovitis
  2012-04-19 14:08 ` [RFC PATCH 2/9][SeaBIOS] Implement acpi-dsdt functions for memory hotplug Vasilis Liaskovitis
@ 2012-04-19 14:08 ` Vasilis Liaskovitis
  2012-04-23 23:37   ` Kevin O'Connor
  2012-04-19 14:08 ` [RFC PATCH 4/9] Implement memslot device abstraction Vasilis Liaskovitis
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-19 14:08 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios; +Cc: gleb, avi, Vasilis Liaskovitis

 The memory device generation is guided by qemu paravirt info. Seabios
 first uses the info to setup SRAT entries for the hotplug-able memory slots.
 Afterwards, build_memssdt uses the created SRAT entries to generate
 appropriate memory device objects. One memory device (and corresponding SRAT
 entry) is generated for each hotplug-able qemu memslot. Currently no SSDT
 memory device is created for initial system memory (the method can be
 generalized to all memory though).

 Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 src/acpi.c |  151 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 147 insertions(+), 4 deletions(-)

diff --git a/src/acpi.c b/src/acpi.c
index 30888b9..5580099 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -484,6 +484,131 @@ build_ssdt(void)
     return ssdt;
 }
 
+static unsigned char ssdt_mem[] = {
+    0x5b,0x82,0x47,0x07,0x4d,0x50,0x41,0x41,
+    0x08,0x49,0x44,0x5f,0x5f,0x0a,0xaa,0x08,
+    0x5f,0x48,0x49,0x44,0x0c,0x41,0xd0,0x0c,
+    0x80,0x08,0x5f,0x50,0x58,0x4d,0x0a,0xaa,
+    0x08,0x5f,0x43,0x52,0x53,0x11,0x33,0x0a,
+    0x30,0x8a,0x2b,0x00,0x00,0x0d,0x03,0x00,
+    0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xef,
+    0xbe,0xad,0xde,0x00,0x00,0x00,0x00,0xee,
+    0xbe,0xad,0xe6,0x00,0x00,0x00,0x00,0x00,
+    0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+    0x00,0x00,0x08,0x00,0x00,0x00,0x00,0x79,
+    0x00,0x14,0x0f,0x5f,0x53,0x54,0x41,0x00,
+    0xa4,0x43,0x4d,0x53,0x54,0x49,0x44,0x5f,
+    0x5f,0x14,0x0f,0x5f,0x45,0x4a,0x30,0x01,
+    0x4d,0x50,0x45,0x4a,0x49,0x44,0x5f,0x5f,
+    0x68
+};
+
+#define SD_OFFSET_MEMHEX 6
+#define SD_OFFSET_MEMID 14
+#define SD_OFFSET_PXMID 31
+#define SD_OFFSET_MEMSTART 55
+#define SD_OFFSET_MEMEND   63
+#define SD_OFFSET_MEMSIZE  79
+
+u64 nb_hp_memslots = 0;
+struct srat_memory_affinity *mem;
+
+static void build_memdev(u8 *ssdt_ptr, int i, u64 mem_base, u64 mem_len, u8 node)
+{
+    memcpy(ssdt_ptr, ssdt_mem, sizeof(ssdt_mem));
+    ssdt_ptr[SD_OFFSET_MEMHEX] = getHex(i >> 4);
+    ssdt_ptr[SD_OFFSET_MEMHEX+1] = getHex(i);
+    ssdt_ptr[SD_OFFSET_MEMID] = i;
+    ssdt_ptr[SD_OFFSET_PXMID] = node;
+    *(u64*)(ssdt_ptr + SD_OFFSET_MEMSTART) = mem_base;
+    *(u64*)(ssdt_ptr + SD_OFFSET_MEMEND) = mem_base + mem_len;
+    *(u64*)(ssdt_ptr + SD_OFFSET_MEMSIZE) = mem_len;
+}
+
+static void*
+build_memssdt(void)
+{
+    u64 mem_base;
+    u64 mem_len;
+    u8  node;
+    int i;
+    struct srat_memory_affinity *entry = mem;
+    u64 nb_memdevs = nb_hp_memslots;
+
+    int length = ((1+3+4)
+                  + (nb_memdevs * sizeof(ssdt_mem))
+                  + (1+2+5+(12*nb_memdevs))
+                  + (6+2+1+(1*nb_memdevs)));
+    u8 *ssdt = malloc_high(sizeof(struct acpi_table_header) + length);
+    if (! ssdt) {
+        warn_noalloc();
+        return NULL;
+    }
+    u8 *ssdt_ptr = ssdt + sizeof(struct acpi_table_header);
+
+    // build Scope(_SB_) header
+    *(ssdt_ptr++) = 0x10; // ScopeOp
+    ssdt_ptr = encodeLen(ssdt_ptr, length-1, 3);
+    *(ssdt_ptr++) = '_';
+    *(ssdt_ptr++) = 'S';
+    *(ssdt_ptr++) = 'B';
+    *(ssdt_ptr++) = '_';
+
+    for (i = 0; i < nb_memdevs; i++) {
+        mem_base = (((u64)(entry->base_addr_high) << 32 )| entry->base_addr_low);
+        mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low);
+        node = entry->proximity[0];
+        build_memdev(ssdt_ptr, i, mem_base, mem_len, node);
+        ssdt_ptr += sizeof(ssdt_mem);
+        entry++;
+    }
+
+    // build "Method(MTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CM00, Arg1)} ...}"
+    *(ssdt_ptr++) = 0x14; // MethodOp
+    ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*nb_memdevs), 2);
+    *(ssdt_ptr++) = 'M';
+    *(ssdt_ptr++) = 'T';
+    *(ssdt_ptr++) = 'F';
+    *(ssdt_ptr++) = 'Y';
+    *(ssdt_ptr++) = 0x02;
+    for (i=0; i<nb_memdevs; i++) {
+        *(ssdt_ptr++) = 0xA0; // IfOp
+        ssdt_ptr = encodeLen(ssdt_ptr, 11, 1);
+        *(ssdt_ptr++) = 0x93; // LEqualOp
+        *(ssdt_ptr++) = 0x68; // Arg0Op
+        *(ssdt_ptr++) = 0x0A; // BytePrefix
+        *(ssdt_ptr++) = i;
+        *(ssdt_ptr++) = 0x86; // NotifyOp
+        *(ssdt_ptr++) = 'M';
+        *(ssdt_ptr++) = 'P';
+        *(ssdt_ptr++) = getHex(i >> 4);
+        *(ssdt_ptr++) = getHex(i);
+        *(ssdt_ptr++) = 0x69; // Arg1Op
+    }
+
+    // build "Name(MEON, Package() { One, One, ..., Zero, Zero, ... })"
+    *(ssdt_ptr++) = 0x08; // NameOp
+    *(ssdt_ptr++) = 'M';
+    *(ssdt_ptr++) = 'E';
+    *(ssdt_ptr++) = 'O';
+    *(ssdt_ptr++) = 'N';
+    *(ssdt_ptr++) = 0x12; // PackageOp
+    ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*nb_memdevs), 2);
+    *(ssdt_ptr++) = nb_memdevs;
+
+    entry = mem;
+
+    for (i = 0; i < nb_memdevs; i++) {
+        mem_base = (((u64)(entry->base_addr_high) << 32 )| entry->base_addr_low);
+        mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low);
+        *(ssdt_ptr++) = 0x00;
+        entry++;
+    }
+    build_header((void*)ssdt, SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
+
+    return ssdt;
+}
+
 #include "ssdt-pcihp.hex"
 
 #define PCI_RMV_BASE 0xae0c
@@ -580,18 +705,21 @@ build_srat(void)
     if (nb_numa_nodes == 0)
         return NULL;
 
-    u64 *numadata = malloc_tmphigh(sizeof(u64) * (MaxCountCPUs + nb_numa_nodes));
+    qemu_cfg_get_numa_data(&nb_hp_memslots, 1);
+
+    u64 *numadata = malloc_tmphigh(sizeof(u64) * (MaxCountCPUs + nb_numa_nodes +
+        3 * nb_hp_memslots));
     if (!numadata) {
         warn_noalloc();
         return NULL;
     }
 
-    qemu_cfg_get_numa_data(numadata, MaxCountCPUs + nb_numa_nodes);
+    qemu_cfg_get_numa_data(numadata, MaxCountCPUs + nb_numa_nodes + 3 * nb_hp_memslots);
 
     struct system_resource_affinity_table *srat;
     int srat_size = sizeof(*srat) +
         sizeof(struct srat_processor_affinity) * MaxCountCPUs +
-        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + 2);
+        sizeof(struct srat_memory_affinity) * (nb_numa_nodes + nb_hp_memslots + 2);
 
     srat = malloc_high(srat_size);
     if (!srat) {
@@ -627,6 +755,7 @@ build_srat(void)
      */
     struct srat_memory_affinity *numamem = (void*)core;
     int slots = 0;
+    int node;
     u64 mem_len, mem_base, next_base = 0;
 
     acpi_build_srat_memory(numamem, 0, 640*1024, 0, 1);
@@ -653,10 +782,23 @@ build_srat(void)
             next_base += (1ULL << 32) - RamSize;
         }
         acpi_build_srat_memory(numamem, mem_base, mem_len, i-1, 1);
+
+        numamem++;
+        slots++;
+
+    }
+    mem = (void*)numamem;
+
+    for (i = 1; i < nb_hp_memslots + 1; ++i) {
+        mem_base = *numadata++;
+        mem_len = *numadata++;
+        node = *numadata++;
+        acpi_build_srat_memory(numamem, mem_base, mem_len, node, 1);
+        dprintf(1, "hotplug memory slot %d on node %d\n", i, node);
         numamem++;
         slots++;
     }
-    for (; slots < nb_numa_nodes + 2; slots++) {
+    for (; slots < nb_numa_nodes + nb_hp_memslots + 2; slots++) {
         acpi_build_srat_memory(numamem, 0, 0, 0, 0);
         numamem++;
     }
@@ -707,6 +849,7 @@ acpi_bios_init(void)
     ACPI_INIT_TABLE(build_madt());
     ACPI_INIT_TABLE(build_hpet());
     ACPI_INIT_TABLE(build_srat());
+    ACPI_INIT_TABLE(build_memssdt());
     ACPI_INIT_TABLE(build_pcihp());
 
     u16 i, external_tables = qemu_cfg_acpi_additional_tables();
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 4/9] Implement memslot device abstraction
  2012-04-19 14:08 [RFC PATCH 0/9] ACPI memory hotplug Vasilis Liaskovitis
                   ` (2 preceding siblings ...)
  2012-04-19 14:08 ` [RFC PATCH 3/9][SeaBIOS] acpi: generate hotplug memory devices Vasilis Liaskovitis
@ 2012-04-19 14:08 ` Vasilis Liaskovitis
  2012-04-19 14:08 ` [RFC PATCH 5/9] acpi_piix4: Implement memory device hotplug registers Vasilis Liaskovitis
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-19 14:08 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios; +Cc: gleb, avi, Vasilis Liaskovitis

 Each hotplug-able memory slot is a SysBusDevice. All memslots are initially
 unpopulated. A hot-add operation for a particular memory slot creates a new
 MemoryRegion of the given physical address offset, size and node proximity,
 and attaches it to main system memory as a sub_region. A hot-remove operation
 detaches and frees the MemoryRegion from system memory.

 This is an early prototype and lacks proper qdev integration: a separate
 hotplug mechanism/side-channel is used and main system bus hotplug
 capability is ignored.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hw/memslot.c |  195 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/memslot.h |   44 +++++++++++++
 2 files changed, 239 insertions(+), 0 deletions(-)
 create mode 100644 hw/memslot.c
 create mode 100644 hw/memslot.h

diff --git a/hw/memslot.c b/hw/memslot.c
new file mode 100644
index 0000000..b100824
--- /dev/null
+++ b/hw/memslot.c
@@ -0,0 +1,195 @@
+/*
+ * MemorySlot device for Memory Hotplug
+ *
+ * Copyright ProfitBricks GmbH 2012
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "trace.h"
+#include "qdev.h"
+#include "memslot.h"
+#include "../exec-memory.h"
+
+static DeviceState *memslot_hotplug_qdev;
+static memslot_hotplug_fn memslot_hotplug;
+
+static Property memslot_properties[] = {
+    DEFINE_PROP_END_OF_LIST()
+};
+
+void memslot_populate(MemSlotState *s)
+{
+    char buf[32];
+    MemoryRegion *new = NULL;
+
+    sprintf(buf, "memslot%u", s->idx);
+    new = g_malloc(sizeof(MemoryRegion));
+    memory_region_init_ram(new, buf, s->size);
+    vmstate_register_ram_global(new);
+    memory_region_add_subregion(get_system_memory(), s->start, new);
+    s->mr = new;
+    s->populated = 1;
+}
+
+void memslot_depopulate(MemSlotState *s)
+{
+    assert(s);
+    if (s->populated) {
+        vmstate_unregister_ram(s->mr, NULL);
+        memory_region_del_subregion(get_system_memory(), s->mr);
+        memory_region_destroy(s->mr);
+        s->populated = 0;
+        s->mr = NULL;
+    }
+}
+
+MemSlotState *memslot_create(char *id, target_phys_addr_t start, uint64_t size,
+        uint64_t node, uint32_t memslot_idx)
+{
+    DeviceState *dev;
+    MemSlotState *mdev;
+
+    dev = sysbus_create_simple("memslot", -1, NULL);
+    dev->id = id;
+
+    mdev = MEMSLOT(dev);
+    mdev->idx = memslot_idx;
+    mdev->start = start;
+    mdev->size = size;
+    mdev->node = node;
+
+    return mdev;
+}
+
+void memslot_register_hotplug(memslot_hotplug_fn hotplug, DeviceState *qdev)
+{
+    memslot_hotplug_qdev = qdev;
+    memslot_hotplug = hotplug;
+}
+
+static MemSlotState *memslot_find(char *id)
+{
+    DeviceState *qdev;
+    qdev = qdev_find_recursive(sysbus_get_default(), id);
+    if (qdev)
+        return MEMSLOT(qdev);
+    return NULL;
+}
+
+int memslot_do(Monitor *mon, const QDict *qdict)
+{
+    MemSlotState *slot = NULL;
+
+    char *id = (char*) qdict_get_try_str(qdict, "id");
+    if (!id) {
+        fprintf(stderr, "ERROR %s invalid id\n",__FUNCTION__);
+        return 1;
+    }
+
+    slot = memslot_find(id);
+
+    if (!slot) {
+        fprintf(stderr, "%s no slot %s found\n", __FUNCTION__, id);
+        return 1;
+    }
+
+    char *action = (char*) qdict_get_try_str(qdict, "action");
+    if (!action || (strcmp(action, "add") && strcmp(action, "delete"))) {
+        fprintf(stderr, "ERROR %s invalid action\n", __FUNCTION__);
+        return 1;
+    }
+
+    if (!strcmp(action, "add")) {
+        if (slot->populated) {
+            fprintf(stderr, "ERROR %s slot %s already populated\n",
+                    __FUNCTION__, id);
+            return 1;
+        }
+        memslot_populate(slot);
+        if (memslot_hotplug)
+            memslot_hotplug(memslot_hotplug_qdev, (SysBusDevice*)slot, 1);
+    }
+    else {
+        if (!slot->populated) {
+            fprintf(stderr, "ERROR %s slot %s is not populated\n",
+                    __FUNCTION__, id);
+            return 1;
+        }
+        if (memslot_hotplug)
+            memslot_hotplug(memslot_hotplug_qdev, (SysBusDevice*)slot, 0);
+    }
+    return 0;
+}
+
+MemSlotState *memslot_find_from_idx(uint32_t idx)
+{
+    Error *err = NULL;
+    DeviceState *dev;
+    MemSlotState *slot;
+    char *type;
+    BusState *bus = sysbus_get_default();
+    QTAILQ_FOREACH(dev, &bus->children, sibling) {
+        type = object_property_get_str(OBJECT(dev), "type", &err);
+        if (err) {
+            error_free(err);
+            fprintf(stderr, "error getting device type\n");
+            return NULL;
+        }
+        if (!strcmp(type, "memslot")) {
+            slot = MEMSLOT(dev);
+            if (slot->idx == idx) {
+                fprintf(stderr, "%s found slot with idx %u : %p\n",
+                        __FUNCTION__, idx, slot);
+                return slot;
+            }
+            else
+                fprintf(stderr, "%s slot with idx %u !=  %u\n", __FUNCTION__,
+                        slot->idx, idx);
+        }
+    }
+    return NULL;
+}
+
+static int memslot_init(SysBusDevice *s)
+{
+    MemSlotState *slot;
+    slot = MEMSLOT(s);
+    slot->mr = NULL;
+    slot->populated = 0;
+    return 0;
+}
+
+static void memslot_class_init(ObjectClass *klass, void *data)
+{
+    SysBusDeviceClass *sc = SYS_BUS_DEVICE_CLASS(klass);
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->props = memslot_properties;
+    sc->init = memslot_init;
+    memslot_hotplug = NULL;
+}
+
+static TypeInfo memslot_info = {
+    .name          = "memslot",
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(MemSlotState),
+    .class_init    = memslot_class_init,
+};
+
+static void memslot_register_types(void)
+{
+    type_register_static(&memslot_info);
+}
+
+type_init(memslot_register_types)
diff --git a/hw/memslot.h b/hw/memslot.h
new file mode 100644
index 0000000..9412667
--- /dev/null
+++ b/hw/memslot.h
@@ -0,0 +1,44 @@
+#ifndef QEMU_MEM_H
+#define QEMU_MEM_H
+
+#include "qemu-common.h"
+#include "memory.h"
+#include "sysbus.h"
+
+#define TYPE_MEMSLOT "memslot"
+#define MEMSLOT(obj) \
+    OBJECT_CHECK(MemSlotState, (obj), TYPE_MEMSLOT)
+#define MEMSLOT_CLASS(klass) \
+    OBJECT_CLASS_CHECK(MemSlotClass, (obj), TYPE_MEMSLOT)
+#define MEMSLOT_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(MemSlotClass, (obj), TYPE_MEMSLOT)
+
+typedef struct MemSlotState {
+    SysBusDevice busdev;
+    uint32_t populated; /* 1 means device has been hotplugged. Default is 0. */
+    uint32_t idx; /* index in memory hotplug register/bitmap */
+    uint64_t start; /* starting physical address */
+    uint64_t size;
+    uint32_t node; /* numa node proximity */
+    MemoryRegion *mr; /* MemoryRegion for this slot. !NULL only if populated */
+} MemSlotState;
+
+typedef struct MemSlotClass
+{
+    SysBusDeviceClass parent_class;
+    void (*set)(MemSlotState *s, MemoryRegion *mem);
+} MemSlotClass;
+
+/* mem.c */
+
+typedef int (*memslot_hotplug_fn)(DeviceState *qdev, SysBusDevice *dev, int add);
+
+MemSlotState *memslot_create(char *id, target_phys_addr_t start, uint64_t size,
+        uint64_t node, uint32_t memslot_idx);
+void memslot_populate(MemSlotState *s);
+void memslot_depopulate(MemSlotState *s);
+int memslot_do(Monitor *mon, const QDict *qdict);
+MemSlotState *memslot_find_from_idx(uint32_t idx);
+void memslot_register_hotplug(memslot_hotplug_fn hotplug, DeviceState *qdev);
+
+#endif
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 5/9] acpi_piix4: Implement memory device hotplug registers
  2012-04-19 14:08 [RFC PATCH 0/9] ACPI memory hotplug Vasilis Liaskovitis
                   ` (3 preceding siblings ...)
  2012-04-19 14:08 ` [RFC PATCH 4/9] Implement memslot device abstraction Vasilis Liaskovitis
@ 2012-04-19 14:08 ` Vasilis Liaskovitis
  2012-04-19 14:08 ` [RFC PATCH 6/9] pc: pass paravirt info for hotplug memory slots to BIOS Vasilis Liaskovitis
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-19 14:08 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios; +Cc: gleb, avi, Vasilis Liaskovitis

 A 32-byte register is used to present up to 256 hotplug-able memory devices
 to BIOS and OSPM. Hot-add and hot-remove functions trigger an ACPI hotplug
 event through these. Only reads are allowed from these registers (from
 BIOS/OSPM perspective). "memslot id add" will immediately populate the new
 memslot (a new MemoryRegion is created and attached to system memory), and
 then trigger the ACPI hot-add event. "memslot id delete" triggers the ACPI
 hot-remove event but needs to wait for OSPM to eject the device.  We use a
 second set of eject registers to know when OSPM has called the _EJ function
 for a particular memslot. A write to these will depopulate the corresponding
 memslot i.e. detach and free the MemoryRegion. Only writes to the eject
 registers are allowed.

 A new property mem_acpi_hotplug should enable these memory hotplug registers
 for future machine types (not yet implemented in this version).

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hw/acpi_piix4.c |   93 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 89 insertions(+), 4 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 797ed24..a14dd3c 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -27,6 +27,8 @@
 #include "sysemu.h"
 #include "range.h"
 #include "ioport.h"
+#include "sysbus.h"
+#include "memslot.h"
 
 //#define DEBUG
 
@@ -43,9 +45,16 @@
 #define PCI_BASE 0xae00
 #define PCI_EJ_BASE 0xae08
 #define PCI_RMV_BASE 0xae0c
+#define MEM_BASE 0xaf20
+#define MEM_EJ_BASE 0xaf40
 
+#define PIIX4_MEM_HOTPLUG_STATUS 8
 #define PIIX4_PCI_HOTPLUG_STATUS 2
 
+struct gpe_regs {
+    uint8_t mems_sts[32];
+};
+
 struct pci_status {
     uint32_t up;
     uint32_t down;
@@ -66,6 +75,7 @@ typedef struct PIIX4PMState {
     int kvm_enabled;
     Notifier machine_ready;
 
+    struct gpe_regs    gpe;
     /* for pci hotplug */
     struct pci_status pci0_status;
     uint32_t pci0_hotplug_enable;
@@ -86,8 +96,8 @@ static void pm_update_sci(PIIX4PMState *s)
                    ACPI_BITMASK_POWER_BUTTON_ENABLE |
                    ACPI_BITMASK_GLOBAL_LOCK_ENABLE |
                    ACPI_BITMASK_TIMER_ENABLE)) != 0) ||
-        (((s->ar.gpe.sts[0] & s->ar.gpe.en[0])
-          & PIIX4_PCI_HOTPLUG_STATUS) != 0);
+        (((s->ar.gpe.sts[0] & s->ar.gpe.en[0]) &
+          (PIIX4_PCI_HOTPLUG_STATUS | PIIX4_MEM_HOTPLUG_STATUS)) != 0);
 
     qemu_set_irq(s->irq, sci_level);
     /* schedule a timer interruption if needed */
@@ -432,17 +442,34 @@ type_init(piix4_pm_register_types)
 static uint32_t gpe_readb(void *opaque, uint32_t addr)
 {
     PIIX4PMState *s = opaque;
-    uint32_t val = acpi_gpe_ioport_readb(&s->ar, addr);
+    uint32_t val = 0;
+    struct gpe_regs *g = &s->gpe;
+
+    switch (addr) {
+        case MEM_BASE ... MEM_BASE+31:
+            val = g->mems_sts[addr - MEM_BASE];
+            break;
+        default:
+            val = acpi_gpe_ioport_readb(&s->ar, addr);
+    }
 
     PIIX4_DPRINTF("gpe read %x == %x\n", addr, val);
     return val;
 }
 
+static void piix4_memslot_eject(uint32_t addr, uint32_t val);
+
 static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
 {
     PIIX4PMState *s = opaque;
 
-    acpi_gpe_ioport_writeb(&s->ar, addr, val);
+    switch (addr) {
+        case MEM_EJ_BASE ... MEM_EJ_BASE+31:
+            piix4_memslot_eject(addr, val);
+            break;
+        default:
+            acpi_gpe_ioport_writeb(&s->ar, addr, val);
+    }
     pm_update_sci(s);
 
     PIIX4_DPRINTF("gpe write %x <== %d\n", addr, val);
@@ -521,9 +548,12 @@ static void pcirmv_write(void *opaque, uint32_t addr, uint32_t val)
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
                                 PCIHotplugState state);
 
+static int piix4_memslot_hotplug(DeviceState *qdev, SysBusDevice *dev, int add);
+
 static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
 {
     struct pci_status *pci0_status = &s->pci0_status;
+    int i = 0;
 
     register_ioport_write(GPE_BASE, GPE_LEN, 1, gpe_writeb, s);
     register_ioport_read(GPE_BASE, GPE_LEN, 1,  gpe_readb, s);
@@ -538,6 +568,13 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
     register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
     register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
 
+    register_ioport_read(MEM_BASE, 32, 1,  gpe_readb, s);
+    register_ioport_write(MEM_EJ_BASE, 32, 1,  gpe_writeb, s);
+    for(i = 0; i < 32; i++) {
+        s->gpe.mems_sts[i] = 0;
+    }
+    memslot_register_hotplug(piix4_memslot_hotplug, &s->dev.qdev);
+
     pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
 }
 
@@ -553,6 +590,54 @@ static void disable_device(PIIX4PMState *s, int slot)
     s->pci0_status.down |= (1 << slot);
 }
 
+static void enable_mem_device(PIIX4PMState *s, int memdevice)
+{
+    struct gpe_regs *g = &s->gpe;
+    s->ar.gpe.sts[0] |= PIIX4_MEM_HOTPLUG_STATUS;
+    g->mems_sts[memdevice/8] |= (1 << (memdevice%8));
+}
+
+static void disable_mem_device(PIIX4PMState *s, int memdevice)
+{
+    struct gpe_regs *g = &s->gpe;
+    s->ar.gpe.sts[0] |= PIIX4_MEM_HOTPLUG_STATUS;
+    g->mems_sts[memdevice/8] &= ~(1 << (memdevice%8));
+}
+
+static void piix4_memslot_eject(uint32_t addr, uint32_t val)
+{
+    uint32_t start = 8 * (addr - MEM_EJ_BASE);
+    uint32_t idx = 0;
+    MemSlotState *s;
+    PIIX4_DPRINTF("memej write %x <= %d\n", addr, val);
+    while (val) {
+        if (val & 1) {
+            s = memslot_find_from_idx(start + idx);
+            assert(s != NULL);
+            memslot_depopulate(s);
+        }
+        val = val >> 1;
+        idx++;
+    }
+}
+
+static int piix4_memslot_hotplug(DeviceState *qdev, SysBusDevice *dev, int
+        add)
+{
+    PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev,
+                                PCI_DEVICE(qdev));
+    MemSlotState *slot = MEMSLOT(dev);
+
+    if (add) {
+        enable_mem_device(s, slot->idx);
+    }
+    else {
+        disable_mem_device(s, slot->idx);
+    }
+    pm_update_sci(s);
+    return 0;
+}
+
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
 				PCIHotplugState state)
 {
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 6/9] pc: pass paravirt info for hotplug memory slots to BIOS
  2012-04-19 14:08 [RFC PATCH 0/9] ACPI memory hotplug Vasilis Liaskovitis
                   ` (4 preceding siblings ...)
  2012-04-19 14:08 ` [RFC PATCH 5/9] acpi_piix4: Implement memory device hotplug registers Vasilis Liaskovitis
@ 2012-04-19 14:08 ` Vasilis Liaskovitis
  2012-04-19 14:21   ` Avi Kivity
  2012-04-20 10:33   ` Igor Mammedov
  2012-04-19 14:08 ` [RFC PATCH 7/9] Implement memslot command-line option and memslot hmp command Vasilis Liaskovitis
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-19 14:08 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios; +Cc: gleb, avi, Vasilis Liaskovitis

 The numa_fw_cfg paravirt interface is extended to include SRAT information for
 all hotplug-able memslots. There are 3 words for each hotplug-able memory slot,
 denoting start address, size and node proximity. nb_numa_nodes is set to 1 by
 default (not 0), so that we always pass srat info to SeaBIOS.

 This information is used by Seabios to build hotplug memory device objects at runtime.

 Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hw/pc.c |   59 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
 vl.c    |    4 +++-
 2 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 67f0479..f1f550a 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -46,6 +46,7 @@
 #include "ui/qemu-spice.h"
 #include "memory.h"
 #include "exec-memory.h"
+#include "memslot.h"
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
@@ -592,12 +593,15 @@ int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
     return index;
 }
 
+static void bochs_bios_setup_hp_memslots(uint64_t *fw_cfg_slots);
+
 static void *bochs_bios_init(void)
 {
     void *fw_cfg;
     uint8_t *smbios_table;
     size_t smbios_len;
     uint64_t *numa_fw_cfg;
+    uint64_t *hp_memslots_fw_cfg;
     int i, j;
 
     register_ioport_write(0x400, 1, 2, bochs_bios_write, NULL);
@@ -630,28 +634,71 @@ static void *bochs_bios_init(void)
     fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, (uint8_t *)&hpet_cfg,
                      sizeof(struct hpet_fw_config));
     /* allocate memory for the NUMA channel: one (64bit) word for the number
-     * of nodes, one word for each VCPU->node and one word for each node to
-     * hold the amount of memory.
+     * of nodes, one word for the number of hotplug memory slots, one word
+     * for each VCPU->node, one word for each node to hold the amount of memory.
+     * Finally three words for each hotplug memory slot, denoting start address,
+     * size and node proximity.
      */
-    numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8);
+    numa_fw_cfg = g_malloc0((2 + max_cpus + nb_numa_nodes + 3 * nb_hp_memslots) * 8);
     numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
+    numa_fw_cfg[1] = cpu_to_le64(nb_hp_memslots);
+
     for (i = 0; i < max_cpus; i++) {
         for (j = 0; j < nb_numa_nodes; j++) {
             if (node_cpumask[j] & (1 << i)) {
-                numa_fw_cfg[i + 1] = cpu_to_le64(j);
+                numa_fw_cfg[i + 2] = cpu_to_le64(j);
                 break;
             }
         }
     }
     for (i = 0; i < nb_numa_nodes; i++) {
-        numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]);
+        numa_fw_cfg[max_cpus + 2 + i] = cpu_to_le64(node_mem[i]);
     }
+
+    hp_memslots_fw_cfg = numa_fw_cfg + 2 + max_cpus + nb_numa_nodes;
+    if (nb_hp_memslots)
+        bochs_bios_setup_hp_memslots(hp_memslots_fw_cfg);
+
     fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg,
-                     (1 + max_cpus + nb_numa_nodes) * 8);
+                     (2 + max_cpus + nb_numa_nodes + 3 * nb_hp_memslots) * 8);
 
     return fw_cfg;
 }
 
+static void bochs_bios_setup_hp_memslots(uint64_t *fw_cfg_slots)
+{
+    int i = 0;
+    Error *err = NULL;
+    DeviceState *dev;
+    MemSlotState *slot;
+    char *type;
+    BusState *bus = sysbus_get_default();
+
+    QTAILQ_FOREACH(dev, &bus->children, sibling) {
+        type = object_property_get_str(OBJECT(dev), "type", &err);
+        if (err) {
+            error_free(err);
+            fprintf(stderr, "error getting device type\n");
+            exit(1);
+        }
+
+        if (!strcmp(type, "memslot")) {
+            if (!dev->id) {
+                error_free(err);
+                fprintf(stderr, "error getting memslot device id\n");
+                exit(1);
+            }
+            if (!strcmp(dev->id, "initialslot")) continue;
+            slot = MEMSLOT(dev);
+            fw_cfg_slots[3 * slot->idx] = cpu_to_le64(slot->start);
+            fw_cfg_slots[3 * slot->idx + 1] = cpu_to_le64(slot->size);
+            fw_cfg_slots[3 * slot->idx + 2] = cpu_to_le64(slot->node);
+            i++;
+        }
+    }
+    assert(i == nb_hp_memslots);
+}
+
 static long get_file_size(FILE *f)
 {
     long where, size;
diff --git a/vl.c b/vl.c
index ae91a8a..50df453 100644
--- a/vl.c
+++ b/vl.c
@@ -3428,8 +3428,10 @@ int main(int argc, char **argv, char **envp)
 
     register_savevm_live(NULL, "ram", 0, 4, NULL, ram_save_live, NULL,
                          ram_load, NULL);
+    if (!nb_numa_nodes)
+        nb_numa_nodes = 1;
 
-    if (nb_numa_nodes > 0) {
+    {
         int i;
 
         if (nb_numa_nodes > MAX_NODES) {
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 7/9] Implement memslot command-line option and memslot hmp command
  2012-04-19 14:08 [RFC PATCH 0/9] ACPI memory hotplug Vasilis Liaskovitis
                   ` (5 preceding siblings ...)
  2012-04-19 14:08 ` [RFC PATCH 6/9] pc: pass paravirt info for hotplug memory slots to BIOS Vasilis Liaskovitis
@ 2012-04-19 14:08 ` Vasilis Liaskovitis
  2012-04-19 14:22   ` Avi Kivity
  2012-04-19 14:08 ` [RFC PATCH 8/9] pc: adjust e820 map on hot-add and hot-remove Vasilis Liaskovitis
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-19 14:08 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios; +Cc: gleb, avi, Vasilis Liaskovitis

 Implement -memslot qemu-kvm command line option to define hotplug-able memory
 slots.
 Syntax: "-memslot id=name,start=addr,size=sz,node=nodeid"

 e.g. "-memslot id=hot1,start=4294967296,size=1073741824,node=0"
 will define a 1G memory slot starting at physical address 4G, belonging to numa
 node 0. Defining no node will automatically add a memslot to node 0.

 Also implement a new hmp monitor command for hot-add and hot-remove of memory slots
 Syntax: "memslot slotname action"
 where action is add/delete and slotname is the qdev-id of the memory slot.

 Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 Makefile.objs   |    2 +-
 hmp-commands.hx |   15 +++++++++++++++
 monitor.c       |    8 ++++++++
 monitor.h       |    1 +
 qemu-config.c   |   25 +++++++++++++++++++++++++
 qemu-options.hx |    8 ++++++++
 sysemu.h        |    1 +
 vl.c            |   40 ++++++++++++++++++++++++++++++++++++++++
 8 files changed, 99 insertions(+), 1 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index 5c3bcda..98ce865 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -240,7 +240,7 @@ hw-obj-$(CONFIG_USB_OHCI) += usb/hcd-ohci.o
 hw-obj-$(CONFIG_USB_EHCI) += usb/hcd-ehci.o
 hw-obj-$(CONFIG_USB_XHCI) += usb/hcd-xhci.o
 hw-obj-$(CONFIG_FDC) += fdc.o
-hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o
+hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o memslot.o
 hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
 hw-obj-$(CONFIG_DMA) += dma.o
 hw-obj-$(CONFIG_I82374) += i82374.o
diff --git a/hmp-commands.hx b/hmp-commands.hx
index a6f5a84..cadf4ca 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -618,6 +618,21 @@ Add device.
 ETEXI
 
     {
+        .name       = "memslot",
+        .args_type  = "id:s,action:s",
+        .params     = "id,action",
+        .help       = "add memslot device",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_memslot_add,
+    },
+
+STEXI
+@item memslot_add @var{config}
+@findex memslot_add
+
+Add memslot.
+ETEXI
+    {
         .name       = "device_del",
         .args_type  = "id:s",
         .params     = "device",
diff --git a/monitor.c b/monitor.c
index 8946a10..f672186 100644
--- a/monitor.c
+++ b/monitor.c
@@ -30,6 +30,7 @@
 #include "hw/pci.h"
 #include "hw/watchdog.h"
 #include "hw/loader.h"
+#include "hw/memslot.h"
 #include "gdbstub.h"
 #include "net.h"
 #include "net/slirp.h"
@@ -4675,3 +4676,10 @@ int monitor_read_block_device_key(Monitor *mon, const char *device,
 
     return monitor_read_bdrv_key_start(mon, bs, completion_cb, opaque);
 }
+
+int do_memslot_add(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+#if defined(TARGET_I386) || defined(TARGET_X86_64)
+    return memslot_do(mon, qdict);
+#endif
+}
diff --git a/monitor.h b/monitor.h
index 0d49800..1e14a63 100644
--- a/monitor.h
+++ b/monitor.h
@@ -80,5 +80,6 @@ int monitor_read_password(Monitor *mon, ReadLineFunc *readline_func,
 int qmp_qom_set(Monitor *mon, const QDict *qdict, QObject **ret);
 
 int qmp_qom_get(Monitor *mon, const QDict *qdict, QObject **ret);
+int do_memslot_add(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif /* !MONITOR_H */
diff --git a/qemu-config.c b/qemu-config.c
index be84a03..1f26187 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -613,6 +613,30 @@ QemuOptsList qemu_boot_opts = {
     },
 };
 
+static QemuOptsList qemu_memslot_opts = {
+    .name = "memslot",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_memslot_opts.head),
+    .desc = {
+        {
+            .name = "id",
+            .type = QEMU_OPT_STRING,
+        },{
+            .name = "start",
+            .type = QEMU_OPT_SIZE,
+            .help = "physical address start for this memslot",
+        },{
+            .name = "size",
+            .type = QEMU_OPT_SIZE,
+            .help = "memory size for this memslot",
+        },{
+            .name = "node",
+            .type = QEMU_OPT_NUMBER,
+            .help = "NUMA node number (i.e. proximity) for this memslot",
+        },
+        { /* end of list */ }
+    },
+};
+
 static QemuOptsList *vm_config_groups[32] = {
     &qemu_drive_opts,
     &qemu_chardev_opts,
@@ -628,6 +652,7 @@ static QemuOptsList *vm_config_groups[32] = {
     &qemu_machine_opts,
     &qemu_boot_opts,
     &qemu_iscsi_opts,
+    &qemu_memslot_opts,
     NULL,
 };
 
diff --git a/qemu-options.hx b/qemu-options.hx
index a169792..aff0546 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2728,3 +2728,11 @@ HXCOMM This is the last statement. Insert new options before this line!
 STEXI
 @end table
 ETEXI
+
+DEF("memslot", HAS_ARG, QEMU_OPTION_memslot,
+        "-memslot start=num,size=num,id=name\n"
+        "specify unpopulated memory slot",
+        QEMU_ARCH_ALL)
+
+
+
diff --git a/sysemu.h b/sysemu.h
index bc2c788..7247099 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -136,6 +136,7 @@ extern QEMUClock *rtc_clock;
 extern int nb_numa_nodes;
 extern uint64_t node_mem[MAX_NODES];
 extern uint64_t node_cpumask[MAX_NODES];
+extern int nb_hp_memslots;
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/vl.c b/vl.c
index 50df453..85ac0f9 100644
--- a/vl.c
+++ b/vl.c
@@ -126,6 +126,7 @@ int main(int argc, char **argv)
 #include "hw/xen.h"
 #include "hw/qdev.h"
 #include "hw/loader.h"
+#include "hw/memslot.h"
 #include "bt-host.h"
 #include "net.h"
 #include "net/slirp.h"
@@ -247,6 +248,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order
 int nb_numa_nodes;
 uint64_t node_mem[MAX_NODES];
 uint64_t node_cpumask[MAX_NODES];
+int nb_hp_memslots;
 
 uint8_t qemu_uuid[16];
 
@@ -512,6 +514,36 @@ static void configure_rtc_date_offset(const char *startdate, int legacy)
     }
 }
 
+static void configure_memslot(QemuOpts *opts)
+{
+    const char *value, *id;
+    uint64_t start, size, node;
+
+    id = qemu_opts_id(opts);
+    value = qemu_opt_get(opts, "start");
+    if (!value) {
+        fprintf(stderr, "qemu: invalid start address for memslot '%s'\n", id);
+        exit(1);
+    }
+    start = atoll(value);
+    value = qemu_opt_get(opts, "size");
+    if (!value) {
+        fprintf(stderr, "qemu: invalid size for memslot '%s'\n", id);
+        exit(1);
+    }
+    size = atoi(value);
+    value = qemu_opt_get(opts, "node");
+    if (!value) {
+        fprintf(stderr, "qemu: no node proximity defined for memslot '%s'\n", id);
+        node = 0;
+    }
+    else node = atoi(value);
+    fprintf(stderr, "qemu: memslot %s start %lu size %lu node %lu \n", id,
+            start, size, node);
+    memslot_create((char*)id, start, size, node, nb_hp_memslots);
+    nb_hp_memslots++;
+}
+
 static void configure_rtc(QemuOpts *opts)
 {
     const char *value;
@@ -2330,6 +2362,7 @@ int main(int argc, char **argv, char **envp)
 
     nb_numa_nodes = 0;
     nb_nics = 0;
+    nb_hp_memslots = 0;
 
     autostart= 1;
 
@@ -2521,6 +2554,13 @@ int main(int argc, char **argv, char **envp)
             case QEMU_OPTION_kernel:
                 qemu_opts_set(qemu_find_opts("machine"), 0, "kernel", optarg);
                 break;
+            case QEMU_OPTION_memslot:
+                opts = qemu_opts_parse(qemu_find_opts("memslot"), optarg, 0);
+                if (!opts) {
+                    exit(1);
+                }
+                configure_memslot(opts);
+                break;
             case QEMU_OPTION_initrd:
                 qemu_opts_set(qemu_find_opts("machine"), 0, "initrd", optarg);
                 break;
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 8/9] pc: adjust e820 map on hot-add and hot-remove
  2012-04-19 14:08 [RFC PATCH 0/9] ACPI memory hotplug Vasilis Liaskovitis
                   ` (6 preceding siblings ...)
  2012-04-19 14:08 ` [RFC PATCH 7/9] Implement memslot command-line option and memslot hmp command Vasilis Liaskovitis
@ 2012-04-19 14:08 ` Vasilis Liaskovitis
  2012-04-22 13:58   ` Gleb Natapov
  2012-04-19 14:08 ` [RFC PATCH 9/9][SeaBIOS] enable memory devices if e820 entry is present Vasilis Liaskovitis
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-19 14:08 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios; +Cc: gleb, avi, Vasilis Liaskovitis

 Hotplugged memory is not persistent in the e820 memory maps. After hotplugging
 a memslot and rebooting the VM, the hotplugged device is not present.

 A possible solution is to add an e820 for the new memslot in the acpi_piix4
 hot-add handler. On a reset, Seabios (see next patch in series) will enable all
 memory devices for which it finds an e820 entry that covers the devices's address
 range.

 On hot-remove, the acpi_piix4 handler will try to remove the e820 entry
 corresponding to the device. This will work when no VM reboots happen
 between hot-add and hot-remove, but it is not a sufficient solution in
 general: Seabios and GuestOS merge adjacent e820 entries on machine reboot,
 so the sequence hot-add/ rebootVM / hot-remove will fail to remove a
 corresponding e820 entry at the hot-remove phase.

 Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 hw/acpi_piix4.c |    6 ++++++
 hw/pc.c         |   28 ++++++++++++++++++++++++++++
 hw/pc.h         |    1 +
 3 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 2921d18..2b5fd04 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -619,6 +619,9 @@ static void piix4_memslot_eject(uint32_t addr, uint32_t val)
             s = memslot_find_from_idx(start + idx);
             assert(s != NULL);
             memslot_depopulate(s);
+            if (e820_del_entry(s->start, s->size, E820_RAM) == -EBUSY)
+                PIIX4_DPRINTF("failed to remove e820 entry for memslot %u\n",
+                       s->idx);
         }
         val = val >> 1;
         idx++;
@@ -634,6 +637,9 @@ static int piix4_memslot_hotplug(DeviceState *qdev, SysBusDevice *dev, int
 
     if (add) {
         enable_mem_device(s, slot->idx);
+        if (e820_add_entry(slot->start, slot->size, E820_RAM) == -EBUSY)
+            PIIX4_DPRINTF("failed to add e820 entry for memslot %u\n",
+                    slot->idx);
     }
     else {
         disable_mem_device(s, slot->idx);
diff --git a/hw/pc.c b/hw/pc.c
index f1f550a..04d243f 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -593,6 +593,34 @@ int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
     return index;
 }
 
+int e820_del_entry(uint64_t address, uint64_t length, uint32_t type)
+{
+    int index = le32_to_cpu(e820_table.count);
+    int search;
+    struct e820_entry *entry;
+
+    if (index == 0)
+        return -EBUSY;
+    search = index - 1;
+    entry = &e820_table.entry[search];
+    while (search >= 0) {
+        if ((entry->address == cpu_to_le64(address)) &&
+                (entry->length == cpu_to_le64(length)) &&
+                (entry->type == cpu_to_le32(type))){
+            if (search != index - 1) {
+                memcpy(&e820_table.entry[search], &e820_table.entry[search + 1],
+                        sizeof(struct e820_entry) * (index - search));
+            }
+            index--;
+            e820_table.count = cpu_to_le32(index);
+            return 1;
+        }
+        search--;
+        entry = &e820_table.entry[search];
+    }
+    return -EBUSY;
+}
+
 static void bochs_bios_setup_hp_memslots(uint64_t *fw_cfg_slots);
 
 static void *bochs_bios_init(void)
diff --git a/hw/pc.h b/hw/pc.h
index 74d3369..4925e8c 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -226,5 +226,6 @@ void pc_system_firmware_init(MemoryRegion *rom_memory);
 #define E820_UNUSABLE   5
 
 int e820_add_entry(uint64_t, uint64_t, uint32_t);
+int e820_del_entry(uint64_t, uint64_t, uint32_t);
 
 #endif
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 9/9][SeaBIOS] enable memory devices if e820 entry is present
  2012-04-19 14:08 [RFC PATCH 0/9] ACPI memory hotplug Vasilis Liaskovitis
                   ` (7 preceding siblings ...)
  2012-04-19 14:08 ` [RFC PATCH 8/9] pc: adjust e820 map on hot-add and hot-remove Vasilis Liaskovitis
@ 2012-04-19 14:08 ` Vasilis Liaskovitis
  2012-04-26  0:58   ` [SeaBIOS] [RFC PATCH 9/9] " Wen Congyang
  2012-04-19 14:49 ` [Qemu-devel] [RFC PATCH 0/9] ACPI memory hotplug Anthony Liguori
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-19 14:08 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios; +Cc: gleb, avi, Vasilis Liaskovitis

 On a reboot, seabios regenerates srat/ssdt objects. If a valid e820 entry is
 found spanning the whole address range of a hotplug memory device, the  device
 will be enabled. This ensures persistency of hotplugged memory slots across VM
 reboots.

 Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 src/acpi.c   |    6 +++++-
 src/memmap.c |   15 +++++++++++++++
 2 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/src/acpi.c b/src/acpi.c
index 5580099..2ebed2e 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -601,7 +601,11 @@ build_memssdt(void)
     for (i = 0; i < nb_memdevs; i++) {
         mem_base = (((u64)(entry->base_addr_high) << 32 )| entry->base_addr_low);
         mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low);
-        *(ssdt_ptr++) = 0x00;
+        if (find_e820(mem_base, mem_len, E820_RAM)) {
+            *(ssdt_ptr++) = 0x01;
+        }
+        else
+            *(ssdt_ptr++) = 0x00;
         entry++;
     }
     build_header((void*)ssdt, SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
diff --git a/src/memmap.c b/src/memmap.c
index 56865b4..9790da1 100644
--- a/src/memmap.c
+++ b/src/memmap.c
@@ -131,6 +131,21 @@ add_e820(u64 start, u64 size, u32 type)
     //dump_map();
 }
 
+// Check if an e820 entry exists that covers the memory range
+// [start, start+size) with same type as type.
+int
+find_e820(u64 start, u64 size, u32 type)
+{
+    int i;
+    for (i=0; i<e820_count; i++) {
+        struct e820entry *e = &e820_list[i];
+        if ((e->start <= start) && (e->size >= (size + start - e->start)) &&
+            (e->type == type))
+            return 1;
+    }
+    return 0;
+}
+
 // Report on final memory locations.
 void
 memmap_finalize(void)
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 6/9] pc: pass paravirt info for hotplug memory slots to BIOS
  2012-04-19 14:08 ` [RFC PATCH 6/9] pc: pass paravirt info for hotplug memory slots to BIOS Vasilis Liaskovitis
@ 2012-04-19 14:21   ` Avi Kivity
  2012-04-20 10:33   ` Igor Mammedov
  1 sibling, 0 replies; 38+ messages in thread
From: Avi Kivity @ 2012-04-19 14:21 UTC (permalink / raw)
  To: Vasilis Liaskovitis; +Cc: gleb, seabios, qemu-devel, kvm

On 04/19/2012 05:08 PM, Vasilis Liaskovitis wrote:
>  The numa_fw_cfg paravirt interface is extended to include SRAT information for
>  all hotplug-able memslots. There are 3 words for each hotplug-able memory slot,
>  denoting start address, size and node proximity. nb_numa_nodes is set to 1 by
>  default (not 0), so that we always pass srat info to SeaBIOS.
>
>  This information is used by Seabios to build hotplug memory device objects at runtime.
>

Please document this ABI.  I don't see an existing place, suggest
docs/specs/fwcfg.txt (only your additions need to be documented).

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 7/9] Implement memslot command-line option and memslot hmp command
  2012-04-19 14:08 ` [RFC PATCH 7/9] Implement memslot command-line option and memslot hmp command Vasilis Liaskovitis
@ 2012-04-19 14:22   ` Avi Kivity
  2012-04-19 18:10     ` Vasilis Liaskovitis
  0 siblings, 1 reply; 38+ messages in thread
From: Avi Kivity @ 2012-04-19 14:22 UTC (permalink / raw)
  To: Vasilis Liaskovitis; +Cc: qemu-devel, kvm, seabios, gleb

On 04/19/2012 05:08 PM, Vasilis Liaskovitis wrote:
>  Implement -memslot qemu-kvm command line option to define hotplug-able memory
>  slots.
>  Syntax: "-memslot id=name,start=addr,size=sz,node=nodeid"
>
>  e.g. "-memslot id=hot1,start=4294967296,size=1073741824,node=0"
>  will define a 1G memory slot starting at physical address 4G, belonging to numa
>  node 0. Defining no node will automatically add a memslot to node 0.

start=4G,size=1G ought to work too, no?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/9] ACPI memory hotplug
  2012-04-19 14:08 [RFC PATCH 0/9] ACPI memory hotplug Vasilis Liaskovitis
                   ` (8 preceding siblings ...)
  2012-04-19 14:08 ` [RFC PATCH 9/9][SeaBIOS] enable memory devices if e820 entry is present Vasilis Liaskovitis
@ 2012-04-19 14:49 ` Anthony Liguori
  2012-04-19 18:09   ` Vasilis Liaskovitis
  2012-04-20 14:20 ` Vasilis Liaskovitis
  2012-04-22 13:56 ` Gleb Natapov
  11 siblings, 1 reply; 38+ messages in thread
From: Anthony Liguori @ 2012-04-19 14:49 UTC (permalink / raw)
  To: Vasilis Liaskovitis; +Cc: qemu-devel, kvm, seabios, avi, gleb, Wanpeng Li

On 04/19/2012 09:08 AM, Vasilis Liaskovitis wrote:
> This is a prototype for ACPI memory hotplug on x86_64 target. Based on some
> earlier work and comments from Gleb.
>
> Memslot devices are modeled with a new qemu command line
>
> "-memslot id=name,start=start_addr,size=sz,node=pxm"

Hi,

For 1.2, I'd really like to focus on refactoring the PC machine as described in 
this series:

https://github.com/aliguori/qemu/commits/qom-rebase.12

I'd like to represent the guest memory as a "DIMM" device.

In terms of this proposal, I would then expect that the i440fx device would have 
a num_dimms property that controlled how many link<DIMM>'s it had.  Hotplug 
would consist of creating a DIMM at run time and connecting it to the 
appropriate link.

One thing that's not clear to me is how the start/size fits in.  On bare metal, 
is this something that's calculated by the firmware during start up and then 
populated in ACPI?   Does it do something like take the largest possible DIMM 
size that it supports and fill out the table?

At any rate, I think we should focus on modeling this in QOM verses adding a new 
option and hacking at the existing memory init code.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/9] ACPI memory hotplug
  2012-04-19 14:49 ` [Qemu-devel] [RFC PATCH 0/9] ACPI memory hotplug Anthony Liguori
@ 2012-04-19 18:09   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-19 18:09 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, seabios, avi, gleb, Wanpeng Li, kevin

Hi,

On Thu, Apr 19, 2012 at 09:49:31AM -0500, Anthony Liguori wrote:
> On 04/19/2012 09:08 AM, Vasilis Liaskovitis wrote:
> >This is a prototype for ACPI memory hotplug on x86_64 target. Based on some
> >earlier work and comments from Gleb.
> >
> >Memslot devices are modeled with a new qemu command line
> >
> >"-memslot id=name,start=start_addr,size=sz,node=pxm"
> 
> Hi,
> 
> For 1.2, I'd really like to focus on refactoring the PC machine as
> described in this series:
> 
> https://github.com/aliguori/qemu/commits/qom-rebase.12
> 
> I'd like to represent the guest memory as a "DIMM" device.
> 
> In terms of this proposal, I would then expect that the i440fx
> device would have a num_dimms property that controlled how many
> link<DIMM>'s it had.  Hotplug would consist of creating a DIMM at
> run time and connecting it to the appropriate link.
>
ok, makes sense.

> One thing that's not clear to me is how the start/size fits in.  On
> bare metal, is this something that's calculated by the firmware
> during start up and then populated in ACPI?   Does it do something
> like take the largest possible DIMM size that it supports and fill
> out the table?

The current series works as follows:
For each DIMM/memslot option, firmware constructs a QWordMemory ACPI object
(see ACPI spec, ASL 18.5.95). This object has AddressMinimum, AddressMaximum,
RangeLength fields. The first of these corresponds directly to the start
attribute, the third corresponds to size, and the second is derived from both.

On bare metal, I believe the firmware detects the actual DIMM devices and their
size and calculates the physical offset (AddressMinimum) for each, taking into
account possible PCI hole. I doubt the largest possible DIMM size is used, since
a hotplug entity/event should correspond to a physical device. (Kevin or Gleb may
have a better idea of what real metal firmware usually does).

Perhaps you are suggesting having a predefined number of equally sized DIMMs as
being hotplug-able? This way no memslot/DIMM config would have to be passed by
the user at the command line for each DIMM.

In this series, the user-defined memslot options pass the desired DIMM
descriptions to SeaBIOS, which then builds the aforementioned objects.(I assume
it would be possible to pass this info with normal "-device" commands, after
proper qom-ification)

> 
> At any rate, I think we should focus on modeling this in QOM verses
> adding a new option and hacking at the existing memory init code.

agreed. I will take a look into qom-rebase.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 7/9] Implement memslot command-line option and memslot hmp command
  2012-04-19 14:22   ` Avi Kivity
@ 2012-04-19 18:10     ` Vasilis Liaskovitis
  0 siblings, 0 replies; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-19 18:10 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm, seabios, gleb

Hi,

On Thu, Apr 19, 2012 at 05:22:52PM +0300, Avi Kivity wrote:
> On 04/19/2012 05:08 PM, Vasilis Liaskovitis wrote:
> >  Implement -memslot qemu-kvm command line option to define hotplug-able memory
> >  slots.
> >  Syntax: "-memslot id=name,start=addr,size=sz,node=nodeid"
> >
> >  e.g. "-memslot id=hot1,start=4294967296,size=1073741824,node=0"
> >  will define a 1G memory slot starting at physical address 4G, belonging to numa
> >  node 0. Defining no node will automatically add a memslot to node 0.
> 
> start=4G,size=1G ought to work too, no?

it should, but it didn't when I tried. Probably some silliness on my part, I
will retry.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 6/9] pc: pass paravirt info for hotplug memory slots to BIOS
  2012-04-19 14:08 ` [RFC PATCH 6/9] pc: pass paravirt info for hotplug memory slots to BIOS Vasilis Liaskovitis
  2012-04-19 14:21   ` Avi Kivity
@ 2012-04-20 10:33   ` Igor Mammedov
  2012-04-20 16:35     ` [Qemu-devel] " Vasilis Liaskovitis
  1 sibling, 1 reply; 38+ messages in thread
From: Igor Mammedov @ 2012-04-20 10:33 UTC (permalink / raw)
  To: Vasilis Liaskovitis; +Cc: gleb, seabios, qemu-devel, kvm, avi

On 04/19/2012 04:08 PM, Vasilis Liaskovitis wrote:
>   The numa_fw_cfg paravirt interface is extended to include SRAT information for
>   all hotplug-able memslots. There are 3 words for each hotplug-able memory slot,
>   denoting start address, size and node proximity. nb_numa_nodes is set to 1 by
>   default (not 0), so that we always pass srat info to SeaBIOS.
>
>   This information is used by Seabios to build hotplug memory device objects at runtime.
>
>   Signed-off-by: Vasilis Liaskovitis<vasilis.liaskovitis@profitbricks.com>
> ---
>   hw/pc.c |   59 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
>   vl.c    |    4 +++-
>   2 files changed, 56 insertions(+), 7 deletions(-)
>
> diff --git a/hw/pc.c b/hw/pc.c
> index 67f0479..f1f550a 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -46,6 +46,7 @@
>   #include "ui/qemu-spice.h"
>   #include "memory.h"
>   #include "exec-memory.h"
> +#include "memslot.h"
>
>   /* output Bochs bios info messages */
>   //#define DEBUG_BIOS
> @@ -592,12 +593,15 @@ int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
>       return index;
>   }
>
> +static void bochs_bios_setup_hp_memslots(uint64_t *fw_cfg_slots);
> +
>   static void *bochs_bios_init(void)
>   {
>       void *fw_cfg;
>       uint8_t *smbios_table;
>       size_t smbios_len;
>       uint64_t *numa_fw_cfg;
> +    uint64_t *hp_memslots_fw_cfg;
>       int i, j;
>
>       register_ioport_write(0x400, 1, 2, bochs_bios_write, NULL);
> @@ -630,28 +634,71 @@ static void *bochs_bios_init(void)
>       fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, (uint8_t *)&hpet_cfg,
>                        sizeof(struct hpet_fw_config));
>       /* allocate memory for the NUMA channel: one (64bit) word for the number
> -     * of nodes, one word for each VCPU->node and one word for each node to
> -     * hold the amount of memory.
> +     * of nodes, one word for the number of hotplug memory slots, one word
> +     * for each VCPU->node, one word for each node to hold the amount of memory.
> +     * Finally three words for each hotplug memory slot, denoting start address,
> +     * size and node proximity.
>        */
> -    numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8);
> +    numa_fw_cfg = g_malloc0((2 + max_cpus + nb_numa_nodes + 3 * nb_hp_memslots) * 8);
>       numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
> +    numa_fw_cfg[1] = cpu_to_le64(nb_hp_memslots);
this will brake compatibility if guest was migrated from old->new qemu
than on reboot it will use old bios that expects numa_fw_cfg[1] to be something else.
Could memslots info be moved to the end of an existing interface?

> +
>       for (i = 0; i<  max_cpus; i++) {
>           for (j = 0; j<  nb_numa_nodes; j++) {
>               if (node_cpumask[j]&  (1<<  i)) {
> -                numa_fw_cfg[i + 1] = cpu_to_le64(j);
> +                numa_fw_cfg[i + 2] = cpu_to_le64(j);
>                   break;
>               }
>           }
>       }
>       for (i = 0; i<  nb_numa_nodes; i++) {
> -        numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]);
> +        numa_fw_cfg[max_cpus + 2 + i] = cpu_to_le64(node_mem[i]);
>       }
> +
> +    hp_memslots_fw_cfg = numa_fw_cfg + 2 + max_cpus + nb_numa_nodes;
> +    if (nb_hp_memslots)
> +        bochs_bios_setup_hp_memslots(hp_memslots_fw_cfg);
> +
>       fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg,
> -                     (1 + max_cpus + nb_numa_nodes) * 8);
> +                     (2 + max_cpus + nb_numa_nodes + 3 * nb_hp_memslots) * 8);
>
>       return fw_cfg;
>   }
>
> +static void bochs_bios_setup_hp_memslots(uint64_t *fw_cfg_slots)
> +{
> +    int i = 0;
> +    Error *err = NULL;
> +    DeviceState *dev;
> +    MemSlotState *slot;
> +    char *type;
> +    BusState *bus = sysbus_get_default();
> +
> +    QTAILQ_FOREACH(dev,&bus->children, sibling) {
> +        type = object_property_get_str(OBJECT(dev), "type",&err);
> +        if (err) {
> +            error_free(err);
> +            fprintf(stderr, "error getting device type\n");
> +            exit(1);
> +        }
> +
> +        if (!strcmp(type, "memslot")) {
> +            if (!dev->id) {
> +                error_free(err);
> +                fprintf(stderr, "error getting memslot device id\n");
> +                exit(1);
> +            }
> +            if (!strcmp(dev->id, "initialslot")) continue;
> +            slot = MEMSLOT(dev);
> +            fw_cfg_slots[3 * slot->idx] = cpu_to_le64(slot->start);
> +            fw_cfg_slots[3 * slot->idx + 1] = cpu_to_le64(slot->size);
> +            fw_cfg_slots[3 * slot->idx + 2] = cpu_to_le64(slot->node);
> +            i++;
> +        }
> +    }
> +    assert(i == nb_hp_memslots);
> +}
> +
>   static long get_file_size(FILE *f)
>   {
>       long where, size;
> diff --git a/vl.c b/vl.c
> index ae91a8a..50df453 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -3428,8 +3428,10 @@ int main(int argc, char **argv, char **envp)
>
>       register_savevm_live(NULL, "ram", 0, 4, NULL, ram_save_live, NULL,
>                            ram_load, NULL);
> +    if (!nb_numa_nodes)
> +        nb_numa_nodes = 1;
>
> -    if (nb_numa_nodes>  0) {
> +    {
>           int i;
>
>           if (nb_numa_nodes>  MAX_NODES) {

-- 
-----
  Igor

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 2/9][SeaBIOS] Implement acpi-dsdt functions for memory hotplug.
  2012-04-19 14:08 ` [RFC PATCH 2/9][SeaBIOS] Implement acpi-dsdt functions for memory hotplug Vasilis Liaskovitis
@ 2012-04-20 10:55   ` Igor Mammedov
  2012-04-20 14:11     ` [Qemu-devel] " Vasilis Liaskovitis
  0 siblings, 1 reply; 38+ messages in thread
From: Igor Mammedov @ 2012-04-20 10:55 UTC (permalink / raw)
  To: Vasilis Liaskovitis; +Cc: gleb, seabios, qemu-devel, kvm, avi

On 04/19/2012 04:08 PM, Vasilis Liaskovitis wrote:
> Extend the DSDT to include methods for handling memory hot-add and hot-remove
> notifications and memory device status requests. These functions are called
> from the memory device SSDT methods.
>
> Eject has only been tested with level gpe event, but will be changed to edge gpe
> event soon, according to recent master patch for other ACPI hotplug events.
>
> Signed-off-by: Vasilis Liaskovitis<vasilis.liaskovitis@profitbricks.com>
> ---
>   src/acpi-dsdt.dsl |   68 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>   1 files changed, 66 insertions(+), 2 deletions(-)
>
> diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
> index 4bdc268..184daf0 100644
> --- a/src/acpi-dsdt.dsl
> +++ b/src/acpi-dsdt.dsl
> @@ -709,9 +709,72 @@ DefinitionBlock (
>               }
>               Return(One)
>           }
> -    }
>
> +        /* Objects filled in by run-time generated SSDT */
> +        External(MTFY, MethodObj)
> +        External(MEON, PkgObj)
> +
> +        Method (CMST, 1, NotSerialized) {
> +            // _STA method - return ON status of memdevice
> +            // Local0 = MEON flag for this cpu
> +            Store(DerefOf(Index(MEON, Arg0)), Local0)
> +            If (Local0) { Return(0xF) } Else { Return(0x0) }
> +        }
> +        /* Memory eject notify method */
> +        OperationRegion(MEMJ, SystemIO, 0xaf40, 32)
> +        Field (MEMJ, ByteAcc, NoLock, Preserve)
> +        {
> +            MPE, 256
> +        }
> +
> +        Method (MPEJ, 2, NotSerialized) {
> +            // _EJ0 method - eject callback
> +            Store(ShiftLeft(1,Arg0), MPE)
> +            Sleep(200)
> +        }
MPE is write only and only one memslot is ejected at a time. Why 256 bit-field is here then?
Could we use just 1 byte and write a slot number into it and save some io address space this way?

> +
> +        /* Memory hotplug notify method */
> +        OperationRegion(MEST, SystemIO, 0xaf20, 32)
It's more a suggestion: move it a bit farther to allow maybe 1024 cpus in the future.
That will prevent compatibility a headache, if we decide to expand support to more then
256 cpus.

Or event better to make this address configurable in run-time and build this var along
with SSDT (converting along the way all other hard-coded io ports to the same generic
run-time interface). This wish is out of scope of this patch-set, but what
do you think about the idea?

> +        Field (MEST, ByteAcc, NoLock, Preserve)
> +        {
> +            MES, 256
> +        }
> +
> +        Method(MESC, 0) {
> +            // Local5 = active memdevice bitmap
> +            Store (MES, Local5)
> +            // Local2 = last read byte from bitmap
> +            Store (Zero, Local2)
> +            // Local0 = memory device iterator
> +            Store (Zero, Local0)
> +            While (LLess(Local0, SizeOf(MEON))) {
> +                // Local1 = MEON flag for this memory device
> +                Store(DerefOf(Index(MEON, Local0)), Local1)
> +                If (And(Local0, 0x07)) {
> +                    // Shift down previously read bitmap byte
> +                    ShiftRight(Local2, 1, Local2)
> +                } Else {
> +                    // Read next byte from memdevice bitmap
> +                    Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), Local2)
> +                }
> +                // Local3 = active state for this memory device
> +                Store(And(Local2, 1), Local3)
>
> +                If (LNotEqual(Local1, Local3)) {
> +                    // State change - update MEON with new state
> +                    Store(Local3, Index(MEON, Local0))
> +                    // Do MEM notify
> +                    If (LEqual(Local3, 1)) {
> +                        MTFY(Local0, 1)
> +                    } Else {
> +                        MTFY(Local0, 3)
> +                    }
> +                }
> +                Increment(Local0)
> +            }
> +            Return(One)
> +        }
> +    }
>   /****************************************************************
>    * General purpose events
>    ****************************************************************/
> @@ -732,7 +795,8 @@ DefinitionBlock (
>               Return(\_SB.PRSC())
>           }
>           Method(_L03) {
> -            Return(0x01)
> +            // Memory hotplug event
> +            Return(\_SB.MESC())
>           }
>           Method(_L04) {
>               Return(0x01)

-- 
-----
  Igor

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/9][SeaBIOS] Implement acpi-dsdt functions for memory hotplug.
  2012-04-20 10:55   ` Igor Mammedov
@ 2012-04-20 14:11     ` Vasilis Liaskovitis
  0 siblings, 0 replies; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-20 14:11 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, kvm, seabios, avi, gleb

Hi,

On Fri, Apr 20, 2012 at 12:55:24PM +0200, Igor Mammedov wrote:
> >+        /* Memory eject notify method */
> >+        OperationRegion(MEMJ, SystemIO, 0xaf40, 32)
> >+        Field (MEMJ, ByteAcc, NoLock, Preserve)
> >+        {
> >+            MPE, 256
> >+        }
> >+
> >+        Method (MPEJ, 2, NotSerialized) {
> >+            // _EJ0 method - eject callback
> >+            Store(ShiftLeft(1,Arg0), MPE)
> >+            Sleep(200)
> >+        }
> MPE is write only and only one memslot is ejected at a time. Why 256 bit-field is here then?
> Could we use just 1 byte and write a slot number into it and save some io address space this way?

good point. This was implemented similarly to the hot-add/status register only
for symmetry, but you are right, since only one slot is ejected at a time, this
can be reduced to one byte and save space. I will update for the next version.

> 
> >+
> >+        /* Memory hotplug notify method */
> >+        OperationRegion(MEST, SystemIO, 0xaf20, 32)
> It's more a suggestion: move it a bit farther to allow maybe 1024 cpus in the future.
> That will prevent compatibility a headache, if we decide to expand support to more then
> 256 cpus.

ok, I will move it to 0xaf80 or higher (so cpu-hotplug could be extended to at
least 1024 cpus)

> 
> Or event better to make this address configurable in run-time and build this var along
> with SSDT (converting along the way all other hard-coded io ports to the same generic
> run-time interface). This wish is out of scope of this patch-set, but what
> do you think about the idea?

yes, that would give more flexibility and avoid more compatibility headaches.
As you say it's not a main issue for the series, but I can work on it as we start
converting hardcoded i/o ports to configurable properties.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 0/9] ACPI memory hotplug
  2012-04-19 14:08 [RFC PATCH 0/9] ACPI memory hotplug Vasilis Liaskovitis
                   ` (9 preceding siblings ...)
  2012-04-19 14:49 ` [Qemu-devel] [RFC PATCH 0/9] ACPI memory hotplug Anthony Liguori
@ 2012-04-20 14:20 ` Vasilis Liaskovitis
  2012-04-22 13:56 ` Gleb Natapov
  11 siblings, 0 replies; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-20 14:20 UTC (permalink / raw)
  To: qemu-devel, kvm, seabios; +Cc: avi, gleb

On Thu, Apr 19, 2012 at 04:08:38PM +0200, Vasilis Liaskovitis wrote:
> 
> series is based on uq/master for qemu-kvm, and master for seabios. Can be found
> also at:
forgot to paste the repo links in the original coverletter, here they are if
someone wants them:

https://github.com/vliaskov/qemu-kvm/commits/memory-hotplug 
https://github.com/vliaskov/seabios/commits/memory-hotplug

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 6/9] pc: pass paravirt info for hotplug memory slots to BIOS
  2012-04-20 10:33   ` Igor Mammedov
@ 2012-04-20 16:35     ` Vasilis Liaskovitis
  0 siblings, 0 replies; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-20 16:35 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, kvm, seabios, avi, gleb

On Fri, Apr 20, 2012 at 12:33:57PM +0200, Igor Mammedov wrote:
> On 04/19/2012 04:08 PM, Vasilis Liaskovitis wrote:
> >-    numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8);
> >+    numa_fw_cfg = g_malloc0((2 + max_cpus + nb_numa_nodes + 3 * nb_hp_memslots) * 8);
> >      numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
> >+    numa_fw_cfg[1] = cpu_to_le64(nb_hp_memslots);
> this will brake compatibility if guest was migrated from old->new qemu
> than on reboot it will use old bios that expects numa_fw_cfg[1] to be something else.
> Could memslots info be moved to the end of an existing interface?

right. The number of memslots can be placed at 1 + max_cpus + nb_numa_nodes,
instead of right after the number of nodes. This way the old layout is preserved,
and all memslot info comes at the end. I will rewrite.

thanks,
- Vasilis

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 0/9] ACPI memory hotplug
  2012-04-19 14:08 [RFC PATCH 0/9] ACPI memory hotplug Vasilis Liaskovitis
                   ` (10 preceding siblings ...)
  2012-04-20 14:20 ` Vasilis Liaskovitis
@ 2012-04-22 13:56 ` Gleb Natapov
  2012-04-22 14:06   ` Avi Kivity
  11 siblings, 1 reply; 38+ messages in thread
From: Gleb Natapov @ 2012-04-22 13:56 UTC (permalink / raw)
  To: Vasilis Liaskovitis; +Cc: seabios, qemu-devel, kvm, avi

On Thu, Apr 19, 2012 at 04:08:38PM +0200, Vasilis Liaskovitis wrote:
> This is a prototype for ACPI memory hotplug on x86_64 target. Based on some
> earlier work and comments from Gleb.
> 
> Memslot devices are modeled with a new qemu command line 
> 
> "-memslot id=name,start=start_addr,size=sz,node=pxm"
> 
> user is responsible for defining memslots with meaningful start/size values,
> e.g. not defining a memory slot over a PCI-hole. Alternatively, the start size
> could also be handled/assigned automatically from the specific emulated hardware
> (for hw/pc.c PCI hole is currently [below_4g_mem_size, 4G), so hotplugged memory
> should start from max(4G, above_4g_mem_size).
> 
> Node is defining numa proximity for this memslot. When not defined it defaults
> to zero.
> 
> e.g. "-memslot id=hot1,start=4294967296,size=536870912,node=0"
> will define a 512M memory slot starting at physical address 4G, belonging to numa node 0.
> 
> Memory slots are added or removed with a new hmp command "memslot":
> Hot-add syntax: "memslot id add"
> Hot-remove syntax: "memslot id delete"
> 
> - All memslots are initially unpopulated. Memslots are currently modeling only
We can add ,populated=true to -memslot to make slot populated from the
start. We will need it for migration anyway.

> hotplug-able memory slots i.e. initial system memory is not modeled with
> memslots. The concept could be generalized to include all memory though, or it
> could more closely follow kvm-memory slots.
OK, I hope final version will allow for memory < 4G to be hot-pluggable.

> 
> - Memslots are abstracted as qdevices attached to the main system bus. However,
> memory hotplugging has its own side channel ignoring main_system_bus's hotplug
> incapability. A cleaner integration would be needed. What's  the preferred
> way of modeling memory devices in qom? Would it be better to attach memory
> devices as children-links of an acpi-capable device (in the pc case acpi_piix4)
> instead of the system bus?
> 
> - Refcounting memory slots has been discussed (but is not included in this 
> series yet). Depopulating a memory region happens on a guestOS _EJ callback,
> which means the guestOS will not be using the region anymore. However, guest
> addresses from the depopulated region need to also be unmapped from the qemu
> address space using cpu_physical_memory_unmap(). Does memory_region_del_subregion()
> or some other memory API call guarantee that a memoryregion has been unmapped
> from qemu's address space?
> 
> - What is the expected behaviour of hotplugged memory after a reboot? Is it
> supposed to be persistent after reboots? The last 2 patches in the series try to
> make hotplugged memslots persistent after reboot by creating and consulting e820
> map entries.  A better solution is needed for hot-remove after a reboot, because
> e820 entries can be merged.
> 
> series is based on uq/master for qemu-kvm, and master for seabios. Can be found
> also at:
> 
> 
> Vasilis Liaskovitis (9):
>   Seabios: Add SSDT memory device support
>   Seabios, acpi: Implement acpi-dsdt functions for memory hotplug.
>   Seabios, acpi: generate hotplug memory devices.
>   Implement memslot device abstraction
>   acpi_piix4: Implement memory device hotplug registers and handlers. 
>   pc: pass paravirt info for hotplug memory slots to BIOS
>   Implement memslot command-line option and memslot hmp monitor command
>   pc: adjust e820 map on hot-add and hot-remove
>   Seabios, acpi: enable memory devices if e820 entry is present
> 
>  Makefile.objs   |    2 +-
>  hmp-commands.hx |   15 ++++
>  hw/acpi_piix4.c |  103 +++++++++++++++++++++++++++-
>  hw/memslot.c    |  201 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/memslot.h    |   44 ++++++++++++
>  hw/pc.c         |   87 ++++++++++++++++++++++--
>  hw/pc.h         |    1 +
>  monitor.c       |    8 ++
>  monitor.h       |    1 +
>  qemu-config.c   |   25 +++++++
>  qemu-options.hx |    8 ++
>  sysemu.h        |    1 +
>  vl.c            |   44 ++++++++++++-
>  13 files changed, 528 insertions(+), 12 deletions(-)
>  create mode 100644 hw/memslot.c
>  create mode 100644 hw/memslot.h
> 
>  create mode 100644 src/ssdt-mem.dsl
>  src/acpi-dsdt.dsl |   68 ++++++++++++++++++++++-
>  src/acpi.c        |  155 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>  src/memmap.c      |   15 +++++
>  src/ssdt-mem.dsl  |   66 ++++++++++++++++++++++
>  4 files changed, 298 insertions(+), 6 deletions(-)
>  create mode 100644 src/ssdt-mem.dsl
> 
> -- 
> 1.7.9

--
			Gleb.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 8/9] pc: adjust e820 map on hot-add and hot-remove
  2012-04-19 14:08 ` [RFC PATCH 8/9] pc: adjust e820 map on hot-add and hot-remove Vasilis Liaskovitis
@ 2012-04-22 13:58   ` Gleb Natapov
  2012-04-23 11:27     ` Vasilis Liaskovitis
  0 siblings, 1 reply; 38+ messages in thread
From: Gleb Natapov @ 2012-04-22 13:58 UTC (permalink / raw)
  To: Vasilis Liaskovitis; +Cc: seabios, qemu-devel, kvm, avi

On Thu, Apr 19, 2012 at 04:08:46PM +0200, Vasilis Liaskovitis wrote:
>  Hotplugged memory is not persistent in the e820 memory maps. After hotplugging
>  a memslot and rebooting the VM, the hotplugged device is not present.
> 
>  A possible solution is to add an e820 for the new memslot in the acpi_piix4
>  hot-add handler. On a reset, Seabios (see next patch in series) will enable all
>  memory devices for which it finds an e820 entry that covers the devices's address
>  range.
> 
>  On hot-remove, the acpi_piix4 handler will try to remove the e820 entry
>  corresponding to the device. This will work when no VM reboots happen
>  between hot-add and hot-remove, but it is not a sufficient solution in
>  general: Seabios and GuestOS merge adjacent e820 entries on machine reboot,
>  so the sequence hot-add/ rebootVM / hot-remove will fail to remove a
>  corresponding e820 entry at the hot-remove phase.
> 
Why do you need this path and the next one? Bios can restore the state
of memslots and build e820 map by reading mems_sts.

>  Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> ---
>  hw/acpi_piix4.c |    6 ++++++
>  hw/pc.c         |   28 ++++++++++++++++++++++++++++
>  hw/pc.h         |    1 +
>  3 files changed, 35 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> index 2921d18..2b5fd04 100644
> --- a/hw/acpi_piix4.c
> +++ b/hw/acpi_piix4.c
> @@ -619,6 +619,9 @@ static void piix4_memslot_eject(uint32_t addr, uint32_t val)
>              s = memslot_find_from_idx(start + idx);
>              assert(s != NULL);
>              memslot_depopulate(s);
> +            if (e820_del_entry(s->start, s->size, E820_RAM) == -EBUSY)
> +                PIIX4_DPRINTF("failed to remove e820 entry for memslot %u\n",
> +                       s->idx);
>          }
>          val = val >> 1;
>          idx++;
> @@ -634,6 +637,9 @@ static int piix4_memslot_hotplug(DeviceState *qdev, SysBusDevice *dev, int
>  
>      if (add) {
>          enable_mem_device(s, slot->idx);
> +        if (e820_add_entry(slot->start, slot->size, E820_RAM) == -EBUSY)
> +            PIIX4_DPRINTF("failed to add e820 entry for memslot %u\n",
> +                    slot->idx);
>      }
>      else {
>          disable_mem_device(s, slot->idx);
> diff --git a/hw/pc.c b/hw/pc.c
> index f1f550a..04d243f 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -593,6 +593,34 @@ int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
>      return index;
>  }
>  
> +int e820_del_entry(uint64_t address, uint64_t length, uint32_t type)
> +{
> +    int index = le32_to_cpu(e820_table.count);
> +    int search;
> +    struct e820_entry *entry;
> +
> +    if (index == 0)
> +        return -EBUSY;
> +    search = index - 1;
> +    entry = &e820_table.entry[search];
> +    while (search >= 0) {
> +        if ((entry->address == cpu_to_le64(address)) &&
> +                (entry->length == cpu_to_le64(length)) &&
> +                (entry->type == cpu_to_le32(type))){
> +            if (search != index - 1) {
> +                memcpy(&e820_table.entry[search], &e820_table.entry[search + 1],
> +                        sizeof(struct e820_entry) * (index - search));
> +            }
> +            index--;
> +            e820_table.count = cpu_to_le32(index);
> +            return 1;
> +        }
> +        search--;
> +        entry = &e820_table.entry[search];
> +    }
> +    return -EBUSY;
> +}
> +
>  static void bochs_bios_setup_hp_memslots(uint64_t *fw_cfg_slots);
>  
>  static void *bochs_bios_init(void)
> diff --git a/hw/pc.h b/hw/pc.h
> index 74d3369..4925e8c 100644
> --- a/hw/pc.h
> +++ b/hw/pc.h
> @@ -226,5 +226,6 @@ void pc_system_firmware_init(MemoryRegion *rom_memory);
>  #define E820_UNUSABLE   5
>  
>  int e820_add_entry(uint64_t, uint64_t, uint32_t);
> +int e820_del_entry(uint64_t, uint64_t, uint32_t);
>  
>  #endif
> -- 
> 1.7.9

--
			Gleb.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 0/9] ACPI memory hotplug
  2012-04-22 13:56 ` Gleb Natapov
@ 2012-04-22 14:06   ` Avi Kivity
  2012-04-22 14:09     ` Gleb Natapov
  0 siblings, 1 reply; 38+ messages in thread
From: Avi Kivity @ 2012-04-22 14:06 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Vasilis Liaskovitis, seabios, qemu-devel, kvm

On 04/22/2012 04:56 PM, Gleb Natapov wrote:
> start. We will need it for migration anyway.
>
> > hotplug-able memory slots i.e. initial system memory is not modeled with
> > memslots. The concept could be generalized to include all memory though, or it
> > could more closely follow kvm-memory slots.
> OK, I hope final version will allow for memory < 4G to be hot-pluggable.

Why is that important?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 0/9] ACPI memory hotplug
  2012-04-22 14:06   ` Avi Kivity
@ 2012-04-22 14:09     ` Gleb Natapov
  2012-04-22 14:13       ` Avi Kivity
  0 siblings, 1 reply; 38+ messages in thread
From: Gleb Natapov @ 2012-04-22 14:09 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Vasilis Liaskovitis, seabios, qemu-devel, kvm

On Sun, Apr 22, 2012 at 05:06:43PM +0300, Avi Kivity wrote:
> On 04/22/2012 04:56 PM, Gleb Natapov wrote:
> > start. We will need it for migration anyway.
> >
> > > hotplug-able memory slots i.e. initial system memory is not modeled with
> > > memslots. The concept could be generalized to include all memory though, or it
> > > could more closely follow kvm-memory slots.
> > OK, I hope final version will allow for memory < 4G to be hot-pluggable.
> 
> Why is that important?
> 
Because my feeling is that people that want to use this kind of feature
what to start using it with VMs smaller than 4G. Of course not all
memory have to be hot unpluggable. Making first 1M or event first 128M not
unpluggable make perfect sense.

--
			Gleb.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 0/9] ACPI memory hotplug
  2012-04-22 14:09     ` Gleb Natapov
@ 2012-04-22 14:13       ` Avi Kivity
  2012-04-22 14:20         ` Gleb Natapov
  0 siblings, 1 reply; 38+ messages in thread
From: Avi Kivity @ 2012-04-22 14:13 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Vasilis Liaskovitis, seabios, qemu-devel, kvm

On 04/22/2012 05:09 PM, Gleb Natapov wrote:
> On Sun, Apr 22, 2012 at 05:06:43PM +0300, Avi Kivity wrote:
> > On 04/22/2012 04:56 PM, Gleb Natapov wrote:
> > > start. We will need it for migration anyway.
> > >
> > > > hotplug-able memory slots i.e. initial system memory is not modeled with
> > > > memslots. The concept could be generalized to include all memory though, or it
> > > > could more closely follow kvm-memory slots.
> > > OK, I hope final version will allow for memory < 4G to be hot-pluggable.
> > 
> > Why is that important?
> > 
> Because my feeling is that people that want to use this kind of feature
> what to start using it with VMs smaller than 4G. Of course not all
> memory have to be hot unpluggable. Making first 1M or event first 128M not
> unpluggable make perfect sense.

Can't you achieve this with -m 1G, -device dimm,size=1G,populated=true
-device dimm,size=1G,populated=false?

(I don't think hotplugging below 512MB is needed, but I don't have any
real data on this).

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 0/9] ACPI memory hotplug
  2012-04-22 14:13       ` Avi Kivity
@ 2012-04-22 14:20         ` Gleb Natapov
  2012-04-23 12:31           ` Vasilis Liaskovitis
  2012-04-23 13:31           ` Avi Kivity
  0 siblings, 2 replies; 38+ messages in thread
From: Gleb Natapov @ 2012-04-22 14:20 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Vasilis Liaskovitis, seabios, qemu-devel, kvm

On Sun, Apr 22, 2012 at 05:13:27PM +0300, Avi Kivity wrote:
> On 04/22/2012 05:09 PM, Gleb Natapov wrote:
> > On Sun, Apr 22, 2012 at 05:06:43PM +0300, Avi Kivity wrote:
> > > On 04/22/2012 04:56 PM, Gleb Natapov wrote:
> > > > start. We will need it for migration anyway.
> > > >
> > > > > hotplug-able memory slots i.e. initial system memory is not modeled with
> > > > > memslots. The concept could be generalized to include all memory though, or it
> > > > > could more closely follow kvm-memory slots.
> > > > OK, I hope final version will allow for memory < 4G to be hot-pluggable.
> > > 
> > > Why is that important?
> > > 
> > Because my feeling is that people that want to use this kind of feature
> > what to start using it with VMs smaller than 4G. Of course not all
> > memory have to be hot unpluggable. Making first 1M or event first 128M not
> > unpluggable make perfect sense.
> 
> Can't you achieve this with -m 1G, -device dimm,size=1G,populated=true
> -device dimm,size=1G,populated=false?
> 
>From this:

(for hw/pc.c PCI hole is currently [below_4g_mem_size, 4G), so
hotplugged memory should start from max(4G, above_4g_mem_size).

I understand that hotpluggable memory can start from above 4G only. With
the config above we will have memory hole from 1G to PCI memory hole.
May be not a big problem, but I do not see technical reason for the constrain.
 
> (I don't think hotplugging below 512MB is needed, but I don't have any
> real data on this).
> 
512MB looks like a reasonable limitation too, but again if there is not
technical reason for having the limitation why have it?

--
			Gleb.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 8/9] pc: adjust e820 map on hot-add and hot-remove
  2012-04-22 13:58   ` Gleb Natapov
@ 2012-04-23 11:27     ` Vasilis Liaskovitis
  2012-04-23 11:30       ` Gleb Natapov
  0 siblings, 1 reply; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-23 11:27 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: qemu-devel, kvm, seabios, avi

On Sun, Apr 22, 2012 at 04:58:47PM +0300, Gleb Natapov wrote:
> On Thu, Apr 19, 2012 at 04:08:46PM +0200, Vasilis Liaskovitis wrote:
> >  Hotplugged memory is not persistent in the e820 memory maps. After hotplugging
> >  a memslot and rebooting the VM, the hotplugged device is not present.
> > 
> >  A possible solution is to add an e820 for the new memslot in the acpi_piix4
> >  hot-add handler. On a reset, Seabios (see next patch in series) will enable all
> >  memory devices for which it finds an e820 entry that covers the devices's address
> >  range.
> > 
> >  On hot-remove, the acpi_piix4 handler will try to remove the e820 entry
> >  corresponding to the device. This will work when no VM reboots happen
> >  between hot-add and hot-remove, but it is not a sufficient solution in
> >  general: Seabios and GuestOS merge adjacent e820 entries on machine reboot,
> >  so the sequence hot-add/ rebootVM / hot-remove will fail to remove a
> >  corresponding e820 entry at the hot-remove phase.
> > 
> Why do you need this path and the next one? Bios can restore the state
> of memslots and build e820 map by reading mems_sts.

i see, that is a simpler solution. Since qemu currently creates most ram e820map
entries and passes them to seabios, I tried to follow the same approach. But
your suggestion makes things easier and we don't have to worry about merged e820
entries on hot-remove.  I 'll rework it.
thanks,

 Vasilis

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 8/9] pc: adjust e820 map on hot-add and hot-remove
  2012-04-23 11:27     ` Vasilis Liaskovitis
@ 2012-04-23 11:30       ` Gleb Natapov
  0 siblings, 0 replies; 38+ messages in thread
From: Gleb Natapov @ 2012-04-23 11:30 UTC (permalink / raw)
  To: Vasilis Liaskovitis; +Cc: seabios, qemu-devel, kvm, avi

On Mon, Apr 23, 2012 at 01:27:40PM +0200, Vasilis Liaskovitis wrote:
> On Sun, Apr 22, 2012 at 04:58:47PM +0300, Gleb Natapov wrote:
> > On Thu, Apr 19, 2012 at 04:08:46PM +0200, Vasilis Liaskovitis wrote:
> > >  Hotplugged memory is not persistent in the e820 memory maps. After hotplugging
> > >  a memslot and rebooting the VM, the hotplugged device is not present.
> > > 
> > >  A possible solution is to add an e820 for the new memslot in the acpi_piix4
> > >  hot-add handler. On a reset, Seabios (see next patch in series) will enable all
> > >  memory devices for which it finds an e820 entry that covers the devices's address
> > >  range.
> > > 
> > >  On hot-remove, the acpi_piix4 handler will try to remove the e820 entry
> > >  corresponding to the device. This will work when no VM reboots happen
> > >  between hot-add and hot-remove, but it is not a sufficient solution in
> > >  general: Seabios and GuestOS merge adjacent e820 entries on machine reboot,
> > >  so the sequence hot-add/ rebootVM / hot-remove will fail to remove a
> > >  corresponding e820 entry at the hot-remove phase.
> > > 
> > Why do you need this path and the next one? Bios can restore the state
> > of memslots and build e820 map by reading mems_sts.
> 
> i see, that is a simpler solution. Since qemu currently creates most ram e820map
Quite the contrary. Qemu creates only one entry that Seabios can't
figure by itself.

> entries and passes them to seabios, I tried to follow the same approach. But
> your suggestion makes things easier and we don't have to worry about merged e820
> entries on hot-remove.  I 'll rework it.
> thanks,
> 
>  Vasilis

--
			Gleb.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 0/9] ACPI memory hotplug
  2012-04-22 14:20         ` Gleb Natapov
@ 2012-04-23 12:31           ` Vasilis Liaskovitis
  2012-04-24  7:52             ` Gleb Natapov
  2012-04-23 13:31           ` Avi Kivity
  1 sibling, 1 reply; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-23 12:31 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Avi Kivity, qemu-devel, kvm, seabios

Hi,

On Sun, Apr 22, 2012 at 05:20:59PM +0300, Gleb Natapov wrote:
> On Sun, Apr 22, 2012 at 05:13:27PM +0300, Avi Kivity wrote:
> > On 04/22/2012 05:09 PM, Gleb Natapov wrote:
> > > On Sun, Apr 22, 2012 at 05:06:43PM +0300, Avi Kivity wrote:
> > > > On 04/22/2012 04:56 PM, Gleb Natapov wrote:
> > > > > start. We will need it for migration anyway.
> > > > >
> > > > > > hotplug-able memory slots i.e. initial system memory is not modeled with
> > > > > > memslots. The concept could be generalized to include all memory though, or it
> > > > > > could more closely follow kvm-memory slots.
> > > > > OK, I hope final version will allow for memory < 4G to be hot-pluggable.
> > > > 
> > > > Why is that important?
> > > > 
> > > Because my feeling is that people that want to use this kind of feature
> > > what to start using it with VMs smaller than 4G. Of course not all
> > > memory have to be hot unpluggable. Making first 1M or event first 128M not
> > > unpluggable make perfect sense.
> > 
> > Can't you achieve this with -m 1G, -device dimm,size=1G,populated=true
> > -device dimm,size=1G,populated=false?
> > 
> From this:
> 
> (for hw/pc.c PCI hole is currently [below_4g_mem_size, 4G), so
> hotplugged memory should start from max(4G, above_4g_mem_size).
> 
> I understand that hotpluggable memory can start from above 4G only. With
> the config above we will have memory hole from 1G to PCI memory hole.
> May be not a big problem, but I do not see technical reason for the constrain.
>  
The 440fx spec mentions: "The address range from the top of main DRAM to 4
Gbytes (top of physical memory space supported by the 440FX PCIset) is normally
mapped to PCI. The PMC forwards all accesses within this address range to PCI."

What we probably want is that the initial memory map creation takes into account
all dimms specified (both populated/unpopulated) 
So "-m 1G, -device dimm,size=1G,populated=true -device dimm,size=1G,populated=false"
would create a system map with top of memory and start of PCI-hole at 2G. 

This may require some shifting of physical address offsets around
3.5GB-4GB - is this the minimum PCI hole allowed?

E.g. if we specify 4x1GB DIMMs (onlt the first initially populated)   
-m 1G, -device dimm,size=1G,populated=true -device dimm,size=1G,populated=false
-device dimm,size=1G,populated=false -device dimm,size=1G,populated=false

we create the following memory map:
dimm0: [0,1G)
dimm1: [1G, 2G)
dimm2: [2G, 3G)
dimm3: [4G, 5G) or dimm3 is split into [3G, 3.5G) and [4G, 4.5G)

does either of these options sound reasonable?

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 0/9] ACPI memory hotplug
  2012-04-22 14:20         ` Gleb Natapov
  2012-04-23 12:31           ` Vasilis Liaskovitis
@ 2012-04-23 13:31           ` Avi Kivity
  2012-04-24  7:21             ` Gleb Natapov
  1 sibling, 1 reply; 38+ messages in thread
From: Avi Kivity @ 2012-04-23 13:31 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: seabios, qemu-devel, kvm

On 04/22/2012 05:20 PM, Gleb Natapov wrote:
> On Sun, Apr 22, 2012 at 05:13:27PM +0300, Avi Kivity wrote:
> > On 04/22/2012 05:09 PM, Gleb Natapov wrote:
> > > On Sun, Apr 22, 2012 at 05:06:43PM +0300, Avi Kivity wrote:
> > > > On 04/22/2012 04:56 PM, Gleb Natapov wrote:
> > > > > start. We will need it for migration anyway.
> > > > >
> > > > > > hotplug-able memory slots i.e. initial system memory is not modeled with
> > > > > > memslots. The concept could be generalized to include all memory though, or it
> > > > > > could more closely follow kvm-memory slots.
> > > > > OK, I hope final version will allow for memory < 4G to be hot-pluggable.
> > > > 
> > > > Why is that important?
> > > > 
> > > Because my feeling is that people that want to use this kind of feature
> > > what to start using it with VMs smaller than 4G. Of course not all
> > > memory have to be hot unpluggable. Making first 1M or event first 128M not
> > > unpluggable make perfect sense.
> > 
> > Can't you achieve this with -m 1G, -device dimm,size=1G,populated=true
> > -device dimm,size=1G,populated=false?
> > 
> From this:
>
> (for hw/pc.c PCI hole is currently [below_4g_mem_size, 4G), so
> hotplugged memory should start from max(4G, above_4g_mem_size).
>
> I understand that hotpluggable memory can start from above 4G only. With
> the config above we will have memory hole from 1G to PCI memory hole.
> May be not a big problem, but I do not see technical reason for the constrain.
>  
> > (I don't think hotplugging below 512MB is needed, but I don't have any
> > real data on this).
> > 
> 512MB looks like a reasonable limitation too, but again if there is not
> technical reason for having the limitation why have it?
>

I was thinking about not having tons of 128MB slots, so we don't have a
configuration that is far from reality.  But maybe this thinking is too
conservative.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 3/9][SeaBIOS] acpi: generate hotplug memory devices.
  2012-04-19 14:08 ` [RFC PATCH 3/9][SeaBIOS] acpi: generate hotplug memory devices Vasilis Liaskovitis
@ 2012-04-23 23:37   ` Kevin O'Connor
  2012-04-24  8:27     ` Vasilis Liaskovitis
  0 siblings, 1 reply; 38+ messages in thread
From: Kevin O'Connor @ 2012-04-23 23:37 UTC (permalink / raw)
  To: Vasilis Liaskovitis; +Cc: gleb, seabios, qemu-devel, kvm, avi

On Thu, Apr 19, 2012 at 04:08:41PM +0200, Vasilis Liaskovitis wrote:
>  The memory device generation is guided by qemu paravirt info. Seabios
>  first uses the info to setup SRAT entries for the hotplug-able memory slots.
>  Afterwards, build_memssdt uses the created SRAT entries to generate
>  appropriate memory device objects. One memory device (and corresponding SRAT
>  entry) is generated for each hotplug-able qemu memslot. Currently no SSDT
>  memory device is created for initial system memory (the method can be
>  generalized to all memory though).
> 
>  Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> ---
>  src/acpi.c |  151 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 files changed, 147 insertions(+), 4 deletions(-)
> 
> diff --git a/src/acpi.c b/src/acpi.c
> index 30888b9..5580099 100644
> --- a/src/acpi.c
> +++ b/src/acpi.c
> @@ -484,6 +484,131 @@ build_ssdt(void)
>      return ssdt;
>  }
>  
> +static unsigned char ssdt_mem[] = {
> +    0x5b,0x82,0x47,0x07,0x4d,0x50,0x41,0x41,

This patch looks like it uses the SSDT generation mechanism that was
present in SeaBIOS v1.6.3.  Since then, however, the runtime AML code
generation has been improved to be more dynamic.  Any runtime
generated AML code should be updated to use the newer mechanisms.

-Kevin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 0/9] ACPI memory hotplug
  2012-04-23 13:31           ` Avi Kivity
@ 2012-04-24  7:21             ` Gleb Natapov
  2012-04-24  9:09               ` Avi Kivity
  0 siblings, 1 reply; 38+ messages in thread
From: Gleb Natapov @ 2012-04-24  7:21 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Vasilis Liaskovitis, qemu-devel, kvm, seabios

On Mon, Apr 23, 2012 at 04:31:04PM +0300, Avi Kivity wrote:
> On 04/22/2012 05:20 PM, Gleb Natapov wrote:
> > On Sun, Apr 22, 2012 at 05:13:27PM +0300, Avi Kivity wrote:
> > > On 04/22/2012 05:09 PM, Gleb Natapov wrote:
> > > > On Sun, Apr 22, 2012 at 05:06:43PM +0300, Avi Kivity wrote:
> > > > > On 04/22/2012 04:56 PM, Gleb Natapov wrote:
> > > > > > start. We will need it for migration anyway.
> > > > > >
> > > > > > > hotplug-able memory slots i.e. initial system memory is not modeled with
> > > > > > > memslots. The concept could be generalized to include all memory though, or it
> > > > > > > could more closely follow kvm-memory slots.
> > > > > > OK, I hope final version will allow for memory < 4G to be hot-pluggable.
> > > > > 
> > > > > Why is that important?
> > > > > 
> > > > Because my feeling is that people that want to use this kind of feature
> > > > what to start using it with VMs smaller than 4G. Of course not all
> > > > memory have to be hot unpluggable. Making first 1M or event first 128M not
> > > > unpluggable make perfect sense.
> > > 
> > > Can't you achieve this with -m 1G, -device dimm,size=1G,populated=true
> > > -device dimm,size=1G,populated=false?
> > > 
> > From this:
> >
> > (for hw/pc.c PCI hole is currently [below_4g_mem_size, 4G), so
> > hotplugged memory should start from max(4G, above_4g_mem_size).
> >
> > I understand that hotpluggable memory can start from above 4G only. With
> > the config above we will have memory hole from 1G to PCI memory hole.
> > May be not a big problem, but I do not see technical reason for the constrain.
> >  
> > > (I don't think hotplugging below 512MB is needed, but I don't have any
> > > real data on this).
> > > 
> > 512MB looks like a reasonable limitation too, but again if there is not
> > technical reason for having the limitation why have it?
> >
> 
> I was thinking about not having tons of 128MB slots, so we don't have a
> configuration that is far from reality.  But maybe this thinking is too
> conservative.
> 
I think it is good interface to make memory that is specified with -m to
be one big unpluggable slot, but slots defined with -device should start
just above what -m specifies (after proper alignment). Memory hot-plug
granularity is controlled by slot's size parameter.

--
			Gleb.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 0/9] ACPI memory hotplug
  2012-04-23 12:31           ` Vasilis Liaskovitis
@ 2012-04-24  7:52             ` Gleb Natapov
  2012-04-24  8:24               ` Vasilis Liaskovitis
  0 siblings, 1 reply; 38+ messages in thread
From: Gleb Natapov @ 2012-04-24  7:52 UTC (permalink / raw)
  To: Vasilis Liaskovitis; +Cc: Avi Kivity, qemu-devel, kvm, seabios

On Mon, Apr 23, 2012 at 02:31:15PM +0200, Vasilis Liaskovitis wrote:
> Hi,
> 
> On Sun, Apr 22, 2012 at 05:20:59PM +0300, Gleb Natapov wrote:
> > On Sun, Apr 22, 2012 at 05:13:27PM +0300, Avi Kivity wrote:
> > > On 04/22/2012 05:09 PM, Gleb Natapov wrote:
> > > > On Sun, Apr 22, 2012 at 05:06:43PM +0300, Avi Kivity wrote:
> > > > > On 04/22/2012 04:56 PM, Gleb Natapov wrote:
> > > > > > start. We will need it for migration anyway.
> > > > > >
> > > > > > > hotplug-able memory slots i.e. initial system memory is not modeled with
> > > > > > > memslots. The concept could be generalized to include all memory though, or it
> > > > > > > could more closely follow kvm-memory slots.
> > > > > > OK, I hope final version will allow for memory < 4G to be hot-pluggable.
> > > > > 
> > > > > Why is that important?
> > > > > 
> > > > Because my feeling is that people that want to use this kind of feature
> > > > what to start using it with VMs smaller than 4G. Of course not all
> > > > memory have to be hot unpluggable. Making first 1M or event first 128M not
> > > > unpluggable make perfect sense.
> > > 
> > > Can't you achieve this with -m 1G, -device dimm,size=1G,populated=true
> > > -device dimm,size=1G,populated=false?
> > > 
> > From this:
> > 
> > (for hw/pc.c PCI hole is currently [below_4g_mem_size, 4G), so
> > hotplugged memory should start from max(4G, above_4g_mem_size).
> > 
> > I understand that hotpluggable memory can start from above 4G only. With
> > the config above we will have memory hole from 1G to PCI memory hole.
> > May be not a big problem, but I do not see technical reason for the constrain.
> >  
> The 440fx spec mentions: "The address range from the top of main DRAM to 4
> Gbytes (top of physical memory space supported by the 440FX PCIset) is normally
> mapped to PCI. The PMC forwards all accesses within this address range to PCI."
> 
> What we probably want is that the initial memory map creation takes into account
> all dimms specified (both populated/unpopulated) 
Yes.

> So "-m 1G, -device dimm,size=1G,populated=true -device dimm,size=1G,populated=false"
> would create a system map with top of memory and start of PCI-hole at 2G. 
> 
What -m 1G means on this command line? Isn't it redundant?
May be we should make -m create non unplaggable, populated slot starting
at address 0. Ten you config above will specify 3G memory with 2G
populated (first of which is not removable) and 1G unpopulated. PCI hole
starts above 3G.

> This may require some shifting of physical address offsets around
> 3.5GB-4GB - is this the minimum PCI hole allowed?
Currently it is 1G in QEMU code.
> 
> E.g. if we specify 4x1GB DIMMs (onlt the first initially populated)   
> -m 1G, -device dimm,size=1G,populated=true -device dimm,size=1G,populated=false
> -device dimm,size=1G,populated=false -device dimm,size=1G,populated=false
> 
> we create the following memory map:
> dimm0: [0,1G)
> dimm1: [1G, 2G)
> dimm2: [2G, 3G)
> dimm3: [4G, 5G) or dimm3 is split into [3G, 3.5G) and [4G, 4.5G)
> 
> does either of these options sound reasonable?
> 
We shouldn't split dimms IMO. Just unnecessary complication. Better make
bigger PCI hole.

--
			Gleb.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 0/9] ACPI memory hotplug
  2012-04-24  7:52             ` Gleb Natapov
@ 2012-04-24  8:24               ` Vasilis Liaskovitis
  2012-04-24  8:34                 ` Gleb Natapov
  0 siblings, 1 reply; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-24  8:24 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Avi Kivity, qemu-devel, kvm, seabios

Hi,
On Tue, Apr 24, 2012 at 10:52:24AM +0300, Gleb Natapov wrote:
> On Mon, Apr 23, 2012 at 02:31:15PM +0200, Vasilis Liaskovitis wrote:
> > The 440fx spec mentions: "The address range from the top of main DRAM to 4
> > Gbytes (top of physical memory space supported by the 440FX PCIset) is normally
> > mapped to PCI. The PMC forwards all accesses within this address range to PCI."
> > 
> > What we probably want is that the initial memory map creation takes into account
> > all dimms specified (both populated/unpopulated) 
> Yes.
> 
> > So "-m 1G, -device dimm,size=1G,populated=true -device dimm,size=1G,populated=false"
> > would create a system map with top of memory and start of PCI-hole at 2G. 
> > 
> What -m 1G means on this command line? Isn't it redundant?
yes, this was redundant with the original concept.

> May be we should make -m create non unplaggable, populated slot starting
> at address 0. Ten you config above will specify 3G memory with 2G
> populated (first of which is not removable) and 1G unpopulated. PCI hole
> starts above 3G.

I agree -m should mean one big unpluggable slot.

So in the new proposal,"-device dimm populated=true" means a hot-removable dimm
that has already been hotplugged. 

A question here is when exactly should the initial hot-add event for this dimm
be played? If the relevant OSPM has not yet been initialized (e.g. acpi_memhotplug
module in a linux guest needs to be loaded), the guest may not see the event. 
This is a general issue of course, but with initially populated hot-removable
dimms it may be a bigger issue. Can ospm acpi initialization be detected?

Or maybe you are suggesting "populated=true" is part of initial memory (i.e. not
hot-added, but still hot-removable). Though in that case guestOS may use it for
bootmem allocations, making hot-remove more likely to fail at the memory
offlining stage.

> 
> > This may require some shifting of physical address offsets around
> > 3.5GB-4GB - is this the minimum PCI hole allowed?
> Currently it is 1G in QEMU code.
ok

> > 
> > E.g. if we specify 4x1GB DIMMs (onlt the first initially populated)   
> > -m 1G, -device dimm,size=1G,populated=true -device dimm,size=1G,populated=false
> > -device dimm,size=1G,populated=false -device dimm,size=1G,populated=false
> > 
> > we create the following memory map:
> > dimm0: [0,1G)
> > dimm1: [1G, 2G)
> > dimm2: [2G, 3G)
> > dimm3: [4G, 5G) or dimm3 is split into [3G, 3.5G) and [4G, 4.5G)
> > 
> > does either of these options sound reasonable?
> > 
> We shouldn't split dimms IMO. Just unnecessary complication. Better make
> bigger PCI hole.

ok

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 3/9][SeaBIOS] acpi: generate hotplug memory devices.
  2012-04-23 23:37   ` Kevin O'Connor
@ 2012-04-24  8:27     ` Vasilis Liaskovitis
  0 siblings, 0 replies; 38+ messages in thread
From: Vasilis Liaskovitis @ 2012-04-24  8:27 UTC (permalink / raw)
  To: Kevin O'Connor; +Cc: gleb, seabios, qemu-devel, kvm, avi

Hi,

On Mon, Apr 23, 2012 at 07:37:51PM -0400, Kevin O'Connor wrote:
> On Thu, Apr 19, 2012 at 04:08:41PM +0200, Vasilis Liaskovitis wrote:
> >  The memory device generation is guided by qemu paravirt info. Seabios
> >  first uses the info to setup SRAT entries for the hotplug-able memory slots.
> >  Afterwards, build_memssdt uses the created SRAT entries to generate
> >  appropriate memory device objects. One memory device (and corresponding SRAT
> >  entry) is generated for each hotplug-able qemu memslot. Currently no SSDT
> >  memory device is created for initial system memory (the method can be
> >  generalized to all memory though).
> > 
> >  Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> > ---
> >  src/acpi.c |  151 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
> >  1 files changed, 147 insertions(+), 4 deletions(-)
> > 
> > diff --git a/src/acpi.c b/src/acpi.c
> > index 30888b9..5580099 100644
> > --- a/src/acpi.c
> > +++ b/src/acpi.c
> > @@ -484,6 +484,131 @@ build_ssdt(void)
> >      return ssdt;
> >  }
> >  
> > +static unsigned char ssdt_mem[] = {
> > +    0x5b,0x82,0x47,0x07,0x4d,0x50,0x41,0x41,
> 
> This patch looks like it uses the SSDT generation mechanism that was
> present in SeaBIOS v1.6.3.  Since then, however, the runtime AML code
> generation has been improved to be more dynamic.  Any runtime
> generated AML code should be updated to use the newer mechanisms.

thanks, I will look into the new mechanism and rewrite.

- Vasilis

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 0/9] ACPI memory hotplug
  2012-04-24  8:24               ` Vasilis Liaskovitis
@ 2012-04-24  8:34                 ` Gleb Natapov
  0 siblings, 0 replies; 38+ messages in thread
From: Gleb Natapov @ 2012-04-24  8:34 UTC (permalink / raw)
  To: Vasilis Liaskovitis; +Cc: seabios, Avi Kivity, kvm, qemu-devel

On Tue, Apr 24, 2012 at 10:24:51AM +0200, Vasilis Liaskovitis wrote:
> Hi,
> On Tue, Apr 24, 2012 at 10:52:24AM +0300, Gleb Natapov wrote:
> > On Mon, Apr 23, 2012 at 02:31:15PM +0200, Vasilis Liaskovitis wrote:
> > > The 440fx spec mentions: "The address range from the top of main DRAM to 4
> > > Gbytes (top of physical memory space supported by the 440FX PCIset) is normally
> > > mapped to PCI. The PMC forwards all accesses within this address range to PCI."
> > > 
> > > What we probably want is that the initial memory map creation takes into account
> > > all dimms specified (both populated/unpopulated) 
> > Yes.
> > 
> > > So "-m 1G, -device dimm,size=1G,populated=true -device dimm,size=1G,populated=false"
> > > would create a system map with top of memory and start of PCI-hole at 2G. 
> > > 
> > What -m 1G means on this command line? Isn't it redundant?
> yes, this was redundant with the original concept.
> 
> > May be we should make -m create non unplaggable, populated slot starting
> > at address 0. Ten you config above will specify 3G memory with 2G
> > populated (first of which is not removable) and 1G unpopulated. PCI hole
> > starts above 3G.
> 
> I agree -m should mean one big unpluggable slot.
> 
> So in the new proposal,"-device dimm populated=true" means a hot-removable dimm
> that has already been hotplugged. 
> 
Yes.

> A question here is when exactly should the initial hot-add event for this dimm
> be played? If the relevant OSPM has not yet been initialized (e.g. acpi_memhotplug
> module in a linux guest needs to be loaded), the guest may not see the event. 
> This is a general issue of course, but with initially populated hot-removable
> dimms it may be a bigger issue. Can ospm acpi initialization be detected?
> 
> Or maybe you are suggesting "populated=true" is part of initial memory (i.e. not
> hot-added, but still hot-removable). Though in that case guestOS may use it for
> bootmem allocations, making hot-remove more likely to fail at the memory
> offlining stage.
>
If memory slot is populated even before OSPM is started BIOS will detect
that by reading mem_sts and will create e820 map appropriately. OSPM
will detect it by evaluating _STA during boot. This is not unique for
memory hot-plug. Any hotpluggable device have the same issue.

--
			Gleb.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 0/9] ACPI memory hotplug
  2012-04-24  7:21             ` Gleb Natapov
@ 2012-04-24  9:09               ` Avi Kivity
  0 siblings, 0 replies; 38+ messages in thread
From: Avi Kivity @ 2012-04-24  9:09 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Vasilis Liaskovitis, qemu-devel, kvm, seabios

On 04/24/2012 10:21 AM, Gleb Natapov wrote:
> > 
> > I was thinking about not having tons of 128MB slots, so we don't have a
> > configuration that is far from reality.  But maybe this thinking is too
> > conservative.
> > 
> I think it is good interface to make memory that is specified with -m to
> be one big unpluggable slot, but slots defined with -device should start
> just above what -m specifies (after proper alignment). Memory hot-plug
> granularity is controlled by slot's size parameter.
>
>

Agree.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [SeaBIOS] [RFC PATCH 9/9] enable memory devices if e820 entry is present
  2012-04-19 14:08 ` [RFC PATCH 9/9][SeaBIOS] enable memory devices if e820 entry is present Vasilis Liaskovitis
@ 2012-04-26  0:58   ` Wen Congyang
  0 siblings, 0 replies; 38+ messages in thread
From: Wen Congyang @ 2012-04-26  0:58 UTC (permalink / raw)
  To: Vasilis Liaskovitis; +Cc: qemu-devel, kvm, seabios, avi

At 04/19/2012 10:08 PM, Vasilis Liaskovitis Wrote:
>  On a reboot, seabios regenerates srat/ssdt objects. If a valid e820 entry is
>  found spanning the whole address range of a hotplug memory device, the  device
>  will be enabled. This ensures persistency of hotplugged memory slots across VM
>  reboots.
> 
>  Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> ---
>  src/acpi.c   |    6 +++++-
>  src/memmap.c |   15 +++++++++++++++
>  2 files changed, 20 insertions(+), 1 deletions(-)
> 
> diff --git a/src/acpi.c b/src/acpi.c
> index 5580099..2ebed2e 100644
> --- a/src/acpi.c
> +++ b/src/acpi.c
> @@ -601,7 +601,11 @@ build_memssdt(void)
>      for (i = 0; i < nb_memdevs; i++) {
>          mem_base = (((u64)(entry->base_addr_high) << 32 )| entry->base_addr_low);
>          mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low);
> -        *(ssdt_ptr++) = 0x00;
> +        if (find_e820(mem_base, mem_len, E820_RAM)) {

This line will break the building:
Compilation complete. 0 Errors, 16 Warnings, 0 Remarks, 259 Optimizations
  Compiling whole program out/ccode32flat.o
src/acpi.c: In function ‘build_memssdt’:
src/acpi.c:604: warning: implicit declaration of function ‘find_e820’
src/memmap.c:137: note: previous definition of ‘find_e820’ was here
src/acpi.c:604: error: ‘E820_RAM’ undeclared (first use in this function)
src/acpi.c:604: error: (Each undeclared identifier is reported only once
src/acpi.c:604: error: for each function it appears in.)
make: *** [out/ccode32flat.o] Error 1

You should declare the function find_e820() in src/memmap.h and include
this header file in src/acpi.c.

Thanks
Wen Congyang

> +            *(ssdt_ptr++) = 0x01;
> +        }
> +        else
> +            *(ssdt_ptr++) = 0x00;
>          entry++;
>      }
>      build_header((void*)ssdt, SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
> diff --git a/src/memmap.c b/src/memmap.c
> index 56865b4..9790da1 100644
> --- a/src/memmap.c
> +++ b/src/memmap.c
> @@ -131,6 +131,21 @@ add_e820(u64 start, u64 size, u32 type)
>      //dump_map();
>  }
>  
> +// Check if an e820 entry exists that covers the memory range
> +// [start, start+size) with same type as type.
> +int
> +find_e820(u64 start, u64 size, u32 type)
> +{
> +    int i;
> +    for (i=0; i<e820_count; i++) {
> +        struct e820entry *e = &e820_list[i];
> +        if ((e->start <= start) && (e->size >= (size + start - e->start)) &&
> +            (e->type == type))
> +            return 1;
> +    }
> +    return 0;
> +}
> +
>  // Report on final memory locations.
>  void
>  memmap_finalize(void)


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2012-04-26  0:54 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-19 14:08 [RFC PATCH 0/9] ACPI memory hotplug Vasilis Liaskovitis
2012-04-19 14:08 ` [RFC PATCH 1/9][SeaBIOS] Add SSDT memory device support Vasilis Liaskovitis
2012-04-19 14:08 ` [RFC PATCH 2/9][SeaBIOS] Implement acpi-dsdt functions for memory hotplug Vasilis Liaskovitis
2012-04-20 10:55   ` Igor Mammedov
2012-04-20 14:11     ` [Qemu-devel] " Vasilis Liaskovitis
2012-04-19 14:08 ` [RFC PATCH 3/9][SeaBIOS] acpi: generate hotplug memory devices Vasilis Liaskovitis
2012-04-23 23:37   ` Kevin O'Connor
2012-04-24  8:27     ` Vasilis Liaskovitis
2012-04-19 14:08 ` [RFC PATCH 4/9] Implement memslot device abstraction Vasilis Liaskovitis
2012-04-19 14:08 ` [RFC PATCH 5/9] acpi_piix4: Implement memory device hotplug registers Vasilis Liaskovitis
2012-04-19 14:08 ` [RFC PATCH 6/9] pc: pass paravirt info for hotplug memory slots to BIOS Vasilis Liaskovitis
2012-04-19 14:21   ` Avi Kivity
2012-04-20 10:33   ` Igor Mammedov
2012-04-20 16:35     ` [Qemu-devel] " Vasilis Liaskovitis
2012-04-19 14:08 ` [RFC PATCH 7/9] Implement memslot command-line option and memslot hmp command Vasilis Liaskovitis
2012-04-19 14:22   ` Avi Kivity
2012-04-19 18:10     ` Vasilis Liaskovitis
2012-04-19 14:08 ` [RFC PATCH 8/9] pc: adjust e820 map on hot-add and hot-remove Vasilis Liaskovitis
2012-04-22 13:58   ` Gleb Natapov
2012-04-23 11:27     ` Vasilis Liaskovitis
2012-04-23 11:30       ` Gleb Natapov
2012-04-19 14:08 ` [RFC PATCH 9/9][SeaBIOS] enable memory devices if e820 entry is present Vasilis Liaskovitis
2012-04-26  0:58   ` [SeaBIOS] [RFC PATCH 9/9] " Wen Congyang
2012-04-19 14:49 ` [Qemu-devel] [RFC PATCH 0/9] ACPI memory hotplug Anthony Liguori
2012-04-19 18:09   ` Vasilis Liaskovitis
2012-04-20 14:20 ` Vasilis Liaskovitis
2012-04-22 13:56 ` Gleb Natapov
2012-04-22 14:06   ` Avi Kivity
2012-04-22 14:09     ` Gleb Natapov
2012-04-22 14:13       ` Avi Kivity
2012-04-22 14:20         ` Gleb Natapov
2012-04-23 12:31           ` Vasilis Liaskovitis
2012-04-24  7:52             ` Gleb Natapov
2012-04-24  8:24               ` Vasilis Liaskovitis
2012-04-24  8:34                 ` Gleb Natapov
2012-04-23 13:31           ` Avi Kivity
2012-04-24  7:21             ` Gleb Natapov
2012-04-24  9:09               ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.