All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/2] pc: fix /etc/acpi/tables size in fw_cfg for -M pc-2.0
@ 2014-07-23 16:37 Paolo Bonzini
  2014-07-23 16:37 ` [Qemu-devel] [PATCH 1/2] acpi-dsdt: procedurally generate _PRT Paolo Bonzini
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Paolo Bonzini @ 2014-07-23 16:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, mst, dgilbert, amit.shah, imammedo, lersek

Changing the ACPI table size causes migration to break, and the memory
hotplug work opened our eyes on how horribly we were breaking things in
2.0 already.

Unfortunately when reviewing the design I assumed incorrectly that all
tables would be placed in separate fw_cfg files.  This would have been
better, because you can always move stuff to a new SSDT (and thus a new
file), keeping the sizes under control.

Hard-code 64k as the maximum ACPI table size; for -M pc-i440fx-2.0
and -M pc-i440fx-1.7 compute the payload size of QEMU 2.0 and always
use that one.  This works always for QEMU 2.0, and also for 1.7
except for a few values of "-smp maxcpus".

The first patch is needed to shrink the ACPI tables and make them
smaller than they used to be in 2.0.

Please test and ack.  I'll do more testing tomorrow.

Paolo


Paolo Bonzini (2):
  acpi-dsdt: procedurally generate _PRT
  pc: hack for migration compatibility from QEMU 2.0

 hw/i386/acpi-build.c  | 61 +++++++++++++++++++++++++++++++---
 hw/i386/acpi-dsdt.dsl | 90 ++++++++++++++++++++++-----------------------------
 hw/i386/pc_piix.c     | 20 ++++++++++++
 hw/i386/pc_q35.c      |  5 +++
 include/hw/i386/pc.h  |  1 +
 5 files changed, 122 insertions(+), 55 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 1/2] acpi-dsdt: procedurally generate _PRT
  2014-07-23 16:37 [Qemu-devel] [PATCH 0/2] pc: fix /etc/acpi/tables size in fw_cfg for -M pc-2.0 Paolo Bonzini
@ 2014-07-23 16:37 ` Paolo Bonzini
  2014-07-23 19:27   ` Laszlo Ersek
  2014-07-24  8:22   ` Igor Mammedov
  2014-07-23 16:37 ` [Qemu-devel] [PATCH 2/2] pc: hack for migration compatibility from QEMU 2.0 Paolo Bonzini
  2014-07-24 15:22 ` [Qemu-devel] [PATCH 0/2] pc: fix /etc/acpi/tables size in fw_cfg for -M pc-2.0 Igor Mammedov
  2 siblings, 2 replies; 9+ messages in thread
From: Paolo Bonzini @ 2014-07-23 16:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, mst, dgilbert, amit.shah, imammedo, lersek

This replaces the _PRT constant with a method that computes it.

The problem is that the DSDT+SSDT have grown from 2.0 to 2.1,
enough to cross the 8k barrier (we align the ACPI tables to 4k
before putting them in fw_cfg).  This causes problems with
migration and the pc-2.0 machine type.

The solution to the problem is to hardcode 64k as the limit,
but this doesn't solve the bug with pc-2.0.  The fix will be
for QEMU 2.1 to use exactly the same size as QEMU 2.0 for the
ACPI tables.  First, however, we must make the actual AML size
equal or smaller; to do this, rewrite _PRT in a way that saves
over 1k of bytecode.

Tested on Windows XP.  Q35 already uses a method for _PRT
so most guests should be okay.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/i386/acpi-dsdt.dsl | 90 ++++++++++++++++++++++-----------------------------
 1 file changed, 39 insertions(+), 51 deletions(-)

diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
index 3cc0ea0..6ba0170 100644
--- a/hw/i386/acpi-dsdt.dsl
+++ b/hw/i386/acpi-dsdt.dsl
@@ -181,57 +181,45 @@ DefinitionBlock (
 
     Scope(\_SB) {
         Scope(PCI0) {
-            Name(_PRT, Package() {
-                /* PCI IRQ routing table, example from ACPI 2.0a specification,
-                   section 6.2.8.1 */
-                /* Note: we provide the same info as the PCI routing
-                   table of the Bochs BIOS */
-
-#define prt_slot(nr, lnk0, lnk1, lnk2, lnk3) \
-    Package() { nr##ffff, 0, lnk0, 0 }, \
-    Package() { nr##ffff, 1, lnk1, 0 }, \
-    Package() { nr##ffff, 2, lnk2, 0 }, \
-    Package() { nr##ffff, 3, lnk3, 0 }
-
-#define prt_slot0(nr) prt_slot(nr, LNKD, LNKA, LNKB, LNKC)
-#define prt_slot1(nr) prt_slot(nr, LNKA, LNKB, LNKC, LNKD)
-#define prt_slot2(nr) prt_slot(nr, LNKB, LNKC, LNKD, LNKA)
-#define prt_slot3(nr) prt_slot(nr, LNKC, LNKD, LNKA, LNKB)
-
-                prt_slot0(0x0000),
-                /* Device 1 is power mgmt device, and can only use irq 9 */
-                prt_slot(0x0001, LNKS, LNKB, LNKC, LNKD),
-                prt_slot2(0x0002),
-                prt_slot3(0x0003),
-                prt_slot0(0x0004),
-                prt_slot1(0x0005),
-                prt_slot2(0x0006),
-                prt_slot3(0x0007),
-                prt_slot0(0x0008),
-                prt_slot1(0x0009),
-                prt_slot2(0x000a),
-                prt_slot3(0x000b),
-                prt_slot0(0x000c),
-                prt_slot1(0x000d),
-                prt_slot2(0x000e),
-                prt_slot3(0x000f),
-                prt_slot0(0x0010),
-                prt_slot1(0x0011),
-                prt_slot2(0x0012),
-                prt_slot3(0x0013),
-                prt_slot0(0x0014),
-                prt_slot1(0x0015),
-                prt_slot2(0x0016),
-                prt_slot3(0x0017),
-                prt_slot0(0x0018),
-                prt_slot1(0x0019),
-                prt_slot2(0x001a),
-                prt_slot3(0x001b),
-                prt_slot0(0x001c),
-                prt_slot1(0x001d),
-                prt_slot2(0x001e),
-                prt_slot3(0x001f),
-            })
+            Method (_PRT, 0) {
+                Store(Package(128) {}, Local0)
+                Store(Zero, Local1)
+                While(LLess(Local1, 128)) {
+                    // slot = pin >> 2
+                    Store(ShiftRight(Local1, 2), Local2)
+
+                    // lnk = (slot + pin) & 3
+                    Store(And(Add(Local1, Local2), 3), Local3)
+                    If (LEqual(Local3, 0)) {
+                        Store(Package(4) { Zero, Zero, LNKD, Zero }, Local4)
+                    }
+                    If (LEqual(Local3, 1)) {
+                        // device 1 is the power-management device, needs SCI
+                        If (LEqual(Local1, 4)) {
+                            Store(Package(4) { Zero, Zero, LNKS, Zero }, Local4)
+                        } Else {
+                            Store(Package(4) { Zero, Zero, LNKA, Zero }, Local4)
+                        }
+                    }
+                    If (LEqual(Local3, 2)) {
+                        Store(Package(4) { Zero, Zero, LNKB, Zero }, Local4)
+                    }
+                    If (LEqual(Local3, 3)) {
+                        Store(Package(4) { Zero, Zero, LNKC, Zero }, Local4)
+                    }
+
+                    // Complete the interrupt routing entry:
+                    //    Package(4) { 0x[slot]FFFF, [pin], [link], 0) }
+
+                    Store(Or(ShiftLeft(Local2, 16), 0xFFFF), Index(Local4, 0))
+                    Store(And(Local1, 3),                    Index(Local4, 1))
+                    Store(Local4,                            Index(Local0, Local1))
+
+                    Increment(Local1)
+                }
+
+                Return(Local0)
+            }
         }
 
         Field(PCI0.ISA.P40C, ByteAcc, NoLock, Preserve) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 2/2] pc: hack for migration compatibility from QEMU 2.0
  2014-07-23 16:37 [Qemu-devel] [PATCH 0/2] pc: fix /etc/acpi/tables size in fw_cfg for -M pc-2.0 Paolo Bonzini
  2014-07-23 16:37 ` [Qemu-devel] [PATCH 1/2] acpi-dsdt: procedurally generate _PRT Paolo Bonzini
@ 2014-07-23 16:37 ` Paolo Bonzini
  2014-07-23 19:34   ` Laszlo Ersek
  2014-07-24  8:59   ` Igor Mammedov
  2014-07-24 15:22 ` [Qemu-devel] [PATCH 0/2] pc: fix /etc/acpi/tables size in fw_cfg for -M pc-2.0 Igor Mammedov
  2 siblings, 2 replies; 9+ messages in thread
From: Paolo Bonzini @ 2014-07-23 16:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, mst, dgilbert, amit.shah, imammedo, lersek

Changing the ACPI table size causes migration to break, and the memory
hotplug work opened our eyes on how horribly we were breaking things in
2.0 already.

The ACPI table size is rounded to the next 4k, which one would think
gives some headroom.  In practice this is not the case, because the user
can control the ACPI table size (each CPU adds 105 bytes) and so some
"-smp" values will break the 4k boundary and fail to migrate.  Similarly,
PCI bridges add ~1870 bytes to the SSDT.

To fix this, hard-code 64k as the maximum ACPI table size, which
(despite being an order of magnitude smaller than 640k) should be enough
for everyone.

To fix migration from QEMU 2.0, compute the payload size of QEMU 2.0
and always use that one.  The previous patch shrunk the ACPI tables
enough that the QEMU 2.0 size should always be enough.

Non-AML tables can change depending on the configuration (especially
MADT, SRAT, HPET) but they remain the same between QEMU 2.0 and 2.1,
so we only compute our padding based on the sizes of the SSDT and DSDT.

Migration from QEMU 1.7 should work for guests that have a number of CPUs
other than 12, 13, 14, 54, 55, 56, 97, 98, 139, 140, and that have no
PCI bridges.  It was already broken from QEMU 1.7 to QEMU 2.0 in the
same way, though.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/i386/acpi-build.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++----
 hw/i386/pc_piix.c    | 20 +++++++++++++++++
 hw/i386/pc_q35.c     |  5 +++++
 include/hw/i386/pc.h |  1 +
 4 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index ebc5f03..7373d93 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -25,7 +25,9 @@
 #include <glib.h>
 #include "qemu-common.h"
 #include "qemu/bitmap.h"
+#include "qemu/osdep.h"
 #include "qemu/range.h"
+#include "qemu/error-report.h"
 #include "hw/pci/pci.h"
 #include "qom/cpu.h"
 #include "hw/i386/pc.h"
@@ -87,6 +89,8 @@ typedef struct AcpiBuildPciBusHotplugState {
     struct AcpiBuildPciBusHotplugState *parent;
 } AcpiBuildPciBusHotplugState;
 
+unsigned bsel_alloc;
+
 static void acpi_get_dsdt(AcpiMiscInfo *info)
 {
     uint16_t *applesmc_sta;
@@ -759,8 +763,8 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
 static void acpi_set_pci_info(void)
 {
     PCIBus *bus = find_i440fx(); /* TODO: Q35 support */
-    unsigned bsel_alloc = 0;
 
+    assert(bsel_alloc == 0);
     if (bus) {
         /* Scan all PCI buses. Set property to enable acpi based hotplug. */
         pci_for_each_bus_depth_first(bus, acpi_set_bsel, NULL, &bsel_alloc);
@@ -1440,13 +1444,14 @@ static
 void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
 {
     GArray *table_offsets;
-    unsigned facs, dsdt, rsdt;
+    unsigned facs, ssdt, dsdt, rsdt;
     AcpiCpuInfo cpu;
     AcpiPmInfo pm;
     AcpiMiscInfo misc;
     AcpiMcfgInfo mcfg;
     PcPciInfo pci;
     uint8_t *u;
+    size_t aml_len = 0;
 
     acpi_get_cpu_info(&cpu);
     acpi_get_pm_info(&pm);
@@ -1474,13 +1479,20 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
     dsdt = tables->table_data->len;
     build_dsdt(tables->table_data, tables->linker, &misc);
 
+    /* Count the size of the DSDT and SSDT, we will need it for legacy
+     * sizing of ACPI tables.
+     */
+    aml_len += tables->table_data->len - dsdt;
+
     /* ACPI tables pointed to by RSDT */
     acpi_add_table(table_offsets, tables->table_data);
     build_fadt(tables->table_data, tables->linker, &pm, facs, dsdt);
 
+    ssdt = tables->table_data->len;
     acpi_add_table(table_offsets, tables->table_data);
     build_ssdt(tables->table_data, tables->linker, &cpu, &pm, &misc, &pci,
                guest_info);
+    aml_len += tables->table_data->len - ssdt;
 
     acpi_add_table(table_offsets, tables->table_data);
     build_madt(tables->table_data, tables->linker, &cpu, guest_info);
@@ -1513,12 +1525,53 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
     /* RSDP is in FSEG memory, so allocate it separately */
     build_rsdp(tables->rsdp, tables->linker, rsdt);
 
-    /* We'll expose it all to Guest so align size to reduce
+    /* We'll expose it all to Guest so we want to reduce
      * chance of size changes.
      * RSDP is small so it's easy to keep it immutable, no need to
      * bother with alignment.
+     *
+     * We used to align the tables to 4k, but of course this would
+     * too simple to be enough.  4k turned out to be too small an
+     * alignment very soon, and in fact it is almost impossible to
+     * keep the table size stable for all (max_cpus, max_memory_slots)
+     * combinations.  So the table size is always 64k for pc-2.1 and
+     * we give an error if the table grows beyond that limit.
+     *
+     * We still have the problem of migrating from "-M pc-2.0".  For that,
+     * we exploit the fact that QEMU 2.1 generates _smaller_ tables than 2.0
+     * and we can always pad the smaller tables with zeros.  We can then use
+     * the exact size of the 2.0 tables.
+     *
+     * All this is for PIIX4, since QEMU 2.0 didn't support Q35 migration.
      */
-    acpi_align_size(tables->table_data, 0x1000);
+    if (guest_info->legacy_acpi_table_size) {
+        /* Subtracting aml_len gives the size of fixed tables.  Then add the
+         * size of the PIIX4 DSDT/SSDT in QEMU 2.0.
+         */
+        int legacy_aml_len =
+            guest_info->legacy_acpi_table_size +
+            97 * max_cpus +
+            1875 * (MAX(bsel_alloc, 1) - 1);
+        int legacy_table_size =
+            ROUND_UP(tables->table_data->len - aml_len + legacy_aml_len, 0x1000);
+        if (tables->table_data->len > legacy_table_size) {
+            /* -M pc-2.0 doesn't support memory hotplug, so this should never
+             * happen.
+             */
+            error_report("This configuration is not supported with -M pc-2.0.");
+            error_report("Please report this to qemu-devel@nongnu.org.");
+            exit(1);
+        }
+        g_array_set_size(tables->table_data, legacy_table_size);
+    } else {
+        if (tables->table_data->len > 65536) {
+            /* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory slots.  */
+            error_report("Too many maximum CPUs, NUMA nodes or memory slots.");
+            error_report("Please decrease one of these parameters.");
+            exit(1);
+        }
+        g_array_set_size(tables->table_data, 0x10000);
+    }
 
     acpi_align_size(tables->linker, 0x1000);
 
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 7081c08..4d3da20 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -61,6 +61,7 @@ static const int ide_irq[MAX_IDE_BUS] = { 14, 15 };
 
 static bool has_pci_info;
 static bool has_acpi_build = true;
+static int legacy_acpi_table_size;
 static bool smbios_defaults = true;
 static bool smbios_legacy_mode;
 /* Make sure that guest addresses aligned at 1Gbyte boundaries get mapped to
@@ -163,6 +164,7 @@ static void pc_init1(MachineState *machine,
     guest_info = pc_guest_info_init(below_4g_mem_size, above_4g_mem_size);
 
     guest_info->has_acpi_build = has_acpi_build;
+    guest_info->legacy_acpi_table_size = legacy_acpi_table_size;
 
     guest_info->has_pci_info = has_pci_info;
     guest_info->isapc_ram_fw = !pci_enabled;
@@ -297,6 +299,24 @@ static void pc_init_pci(MachineState *machine)
 
 static void pc_compat_2_0(MachineState *machine)
 {
+    /* This value depends on the actual DSDT and SSDT compiled into
+     * the source QEMU; unfortunately it depends on the binary and
+     * not on the machine type, so we cannot make pc-1.7 work on
+     * both QEMU 1.7 and QEMU 2.0.
+     *
+     * Large variations cause migration to fail for more than one
+     * consecutive value of the "-smp" maxcpus option.
+     *
+     * For small variations of the kind caused by different iasl versions,
+     * the 4k rounding usually leaves slack.  However, there could be still
+     * one or two values that break.  For QEMU 1.7 and QEMU 2.0 the
+     * slack is only ~10 bytes before one "-smp maxcpus" value breaks!
+     * on the actual contents of the DSDT and SSDT).
+     *
+     * 6652 is valid for QEMU 2.0, the right value for pc-1.7 on
+     * QEMU 1.7 it is 6414.  For RHEL/CentOS 7.0 it is 6418.
+     */
+    legacy_acpi_table_size = 6652;
     smbios_legacy_mode = true;
     has_reserved_memory = false;
 }
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index f551961..c39ee98 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -155,6 +155,11 @@ static void pc_q35_init(MachineState *machine)
     guest_info->has_acpi_build = has_acpi_build;
     guest_info->has_reserved_memory = has_reserved_memory;
 
+    /* Migration was not supported in 2.0 for Q35, so do not bother
+     * with this hack (see hw/i386/acpi-build.c).
+     */
+    guest_info->legacy_acpi_table_size = 0;
+
     if (smbios_defaults) {
         MachineClass *mc = MACHINE_GET_CLASS(machine);
         /* These values are guest ABI, do not change */
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 1c0c382..f4b9b2b 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -94,6 +94,7 @@ struct PcGuestInfo {
     uint64_t *node_mem;
     uint64_t *node_cpu;
     FWCfgState *fw_cfg;
+    int legacy_acpi_table_size;
     bool has_acpi_build;
     bool has_reserved_memory;
 };
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] acpi-dsdt: procedurally generate _PRT
  2014-07-23 16:37 ` [Qemu-devel] [PATCH 1/2] acpi-dsdt: procedurally generate _PRT Paolo Bonzini
@ 2014-07-23 19:27   ` Laszlo Ersek
  2014-07-24  8:22   ` Igor Mammedov
  1 sibling, 0 replies; 9+ messages in thread
From: Laszlo Ersek @ 2014-07-23 19:27 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: amit.shah, imammedo, mst, dgilbert, peter.maydell

On 07/23/14 18:37, Paolo Bonzini wrote:
> This replaces the _PRT constant with a method that computes it.
> 
> The problem is that the DSDT+SSDT have grown from 2.0 to 2.1,
> enough to cross the 8k barrier (we align the ACPI tables to 4k
> before putting them in fw_cfg).  This causes problems with
> migration and the pc-2.0 machine type.
> 
> The solution to the problem is to hardcode 64k as the limit,
> but this doesn't solve the bug with pc-2.0.  The fix will be
> for QEMU 2.1 to use exactly the same size as QEMU 2.0 for the
> ACPI tables.  First, however, we must make the actual AML size
> equal or smaller; to do this, rewrite _PRT in a way that saves
> over 1k of bytecode.
> 
> Tested on Windows XP.  Q35 already uses a method for _PRT
> so most guests should be okay.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/i386/acpi-dsdt.dsl | 90 ++++++++++++++++++++++-----------------------------
>  1 file changed, 39 insertions(+), 51 deletions(-)
> 
> diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
> index 3cc0ea0..6ba0170 100644
> --- a/hw/i386/acpi-dsdt.dsl
> +++ b/hw/i386/acpi-dsdt.dsl
> @@ -181,57 +181,45 @@ DefinitionBlock (
>  
>      Scope(\_SB) {
>          Scope(PCI0) {
> -            Name(_PRT, Package() {
> -                /* PCI IRQ routing table, example from ACPI 2.0a specification,
> -                   section 6.2.8.1 */
> -                /* Note: we provide the same info as the PCI routing
> -                   table of the Bochs BIOS */
> -
> -#define prt_slot(nr, lnk0, lnk1, lnk2, lnk3) \
> -    Package() { nr##ffff, 0, lnk0, 0 }, \
> -    Package() { nr##ffff, 1, lnk1, 0 }, \
> -    Package() { nr##ffff, 2, lnk2, 0 }, \
> -    Package() { nr##ffff, 3, lnk3, 0 }
> -
> -#define prt_slot0(nr) prt_slot(nr, LNKD, LNKA, LNKB, LNKC)
> -#define prt_slot1(nr) prt_slot(nr, LNKA, LNKB, LNKC, LNKD)
> -#define prt_slot2(nr) prt_slot(nr, LNKB, LNKC, LNKD, LNKA)
> -#define prt_slot3(nr) prt_slot(nr, LNKC, LNKD, LNKA, LNKB)
> -
> -                prt_slot0(0x0000),
> -                /* Device 1 is power mgmt device, and can only use irq 9 */
> -                prt_slot(0x0001, LNKS, LNKB, LNKC, LNKD),
> -                prt_slot2(0x0002),
> -                prt_slot3(0x0003),
> -                prt_slot0(0x0004),
> -                prt_slot1(0x0005),
> -                prt_slot2(0x0006),
> -                prt_slot3(0x0007),
> -                prt_slot0(0x0008),
> -                prt_slot1(0x0009),
> -                prt_slot2(0x000a),
> -                prt_slot3(0x000b),
> -                prt_slot0(0x000c),
> -                prt_slot1(0x000d),
> -                prt_slot2(0x000e),
> -                prt_slot3(0x000f),
> -                prt_slot0(0x0010),
> -                prt_slot1(0x0011),
> -                prt_slot2(0x0012),
> -                prt_slot3(0x0013),
> -                prt_slot0(0x0014),
> -                prt_slot1(0x0015),
> -                prt_slot2(0x0016),
> -                prt_slot3(0x0017),
> -                prt_slot0(0x0018),
> -                prt_slot1(0x0019),
> -                prt_slot2(0x001a),
> -                prt_slot3(0x001b),
> -                prt_slot0(0x001c),
> -                prt_slot1(0x001d),
> -                prt_slot2(0x001e),
> -                prt_slot3(0x001f),
> -            })
> +            Method (_PRT, 0) {
> +                Store(Package(128) {}, Local0)
> +                Store(Zero, Local1)
> +                While(LLess(Local1, 128)) {
> +                    // slot = pin >> 2
> +                    Store(ShiftRight(Local1, 2), Local2)
> +
> +                    // lnk = (slot + pin) & 3
> +                    Store(And(Add(Local1, Local2), 3), Local3)
> +                    If (LEqual(Local3, 0)) {
> +                        Store(Package(4) { Zero, Zero, LNKD, Zero }, Local4)
> +                    }
> +                    If (LEqual(Local3, 1)) {
> +                        // device 1 is the power-management device, needs SCI
> +                        If (LEqual(Local1, 4)) {
> +                            Store(Package(4) { Zero, Zero, LNKS, Zero }, Local4)
> +                        } Else {
> +                            Store(Package(4) { Zero, Zero, LNKA, Zero }, Local4)
> +                        }
> +                    }
> +                    If (LEqual(Local3, 2)) {
> +                        Store(Package(4) { Zero, Zero, LNKB, Zero }, Local4)
> +                    }
> +                    If (LEqual(Local3, 3)) {
> +                        Store(Package(4) { Zero, Zero, LNKC, Zero }, Local4)
> +                    }
> +
> +                    // Complete the interrupt routing entry:
> +                    //    Package(4) { 0x[slot]FFFF, [pin], [link], 0) }
> +
> +                    Store(Or(ShiftLeft(Local2, 16), 0xFFFF), Index(Local4, 0))
> +                    Store(And(Local1, 3),                    Index(Local4, 1))
> +                    Store(Local4,                            Index(Local0, Local1))
> +
> +                    Increment(Local1)
> +                }
> +
> +                Return(Local0)
> +            }
>          }
>  
>          Field(PCI0.ISA.P40C, ByteAcc, NoLock, Preserve) {
> 

Awesome!

Reviewed-by: Laszlo Ersek <lersek@redhat.com>

(In this case you might consider a "tested-by" more useful, but I can't
promise to help in that regard.)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] pc: hack for migration compatibility from QEMU 2.0
  2014-07-23 16:37 ` [Qemu-devel] [PATCH 2/2] pc: hack for migration compatibility from QEMU 2.0 Paolo Bonzini
@ 2014-07-23 19:34   ` Laszlo Ersek
  2014-07-24  8:59   ` Igor Mammedov
  1 sibling, 0 replies; 9+ messages in thread
From: Laszlo Ersek @ 2014-07-23 19:34 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: amit.shah, imammedo, mst, dgilbert, peter.maydell

On 07/23/14 18:37, Paolo Bonzini wrote:
> Changing the ACPI table size causes migration to break, and the memory
> hotplug work opened our eyes on how horribly we were breaking things in
> 2.0 already.
> 
> The ACPI table size is rounded to the next 4k, which one would think
> gives some headroom.  In practice this is not the case, because the user
> can control the ACPI table size (each CPU adds 105 bytes) and so some
> "-smp" values will break the 4k boundary and fail to migrate.  Similarly,
> PCI bridges add ~1870 bytes to the SSDT.
> 
> To fix this, hard-code 64k as the maximum ACPI table size, which
> (despite being an order of magnitude smaller than 640k) should be enough
> for everyone.
> 
> To fix migration from QEMU 2.0, compute the payload size of QEMU 2.0
> and always use that one.  The previous patch shrunk the ACPI tables
> enough that the QEMU 2.0 size should always be enough.
> 
> Non-AML tables can change depending on the configuration (especially
> MADT, SRAT, HPET) but they remain the same between QEMU 2.0 and 2.1,
> so we only compute our padding based on the sizes of the SSDT and DSDT.
> 
> Migration from QEMU 1.7 should work for guests that have a number of CPUs
> other than 12, 13, 14, 54, 55, 56, 97, 98, 139, 140, and that have no
> PCI bridges.  It was already broken from QEMU 1.7 to QEMU 2.0 in the
> same way, though.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/i386/acpi-build.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++----
>  hw/i386/pc_piix.c    | 20 +++++++++++++++++
>  hw/i386/pc_q35.c     |  5 +++++
>  include/hw/i386/pc.h |  1 +
>  4 files changed, 83 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index ebc5f03..7373d93 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -25,7 +25,9 @@
>  #include <glib.h>
>  #include "qemu-common.h"
>  #include "qemu/bitmap.h"
> +#include "qemu/osdep.h"
>  #include "qemu/range.h"
> +#include "qemu/error-report.h"
>  #include "hw/pci/pci.h"
>  #include "qom/cpu.h"
>  #include "hw/i386/pc.h"
> @@ -87,6 +89,8 @@ typedef struct AcpiBuildPciBusHotplugState {
>      struct AcpiBuildPciBusHotplugState *parent;
>  } AcpiBuildPciBusHotplugState;
>  
> +unsigned bsel_alloc;
> +
>  static void acpi_get_dsdt(AcpiMiscInfo *info)
>  {
>      uint16_t *applesmc_sta;
> @@ -759,8 +763,8 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
>  static void acpi_set_pci_info(void)
>  {
>      PCIBus *bus = find_i440fx(); /* TODO: Q35 support */
> -    unsigned bsel_alloc = 0;
>  
> +    assert(bsel_alloc == 0);
>      if (bus) {
>          /* Scan all PCI buses. Set property to enable acpi based hotplug. */
>          pci_for_each_bus_depth_first(bus, acpi_set_bsel, NULL, &bsel_alloc);
> @@ -1440,13 +1444,14 @@ static
>  void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
>  {
>      GArray *table_offsets;
> -    unsigned facs, dsdt, rsdt;
> +    unsigned facs, ssdt, dsdt, rsdt;
>      AcpiCpuInfo cpu;
>      AcpiPmInfo pm;
>      AcpiMiscInfo misc;
>      AcpiMcfgInfo mcfg;
>      PcPciInfo pci;
>      uint8_t *u;
> +    size_t aml_len = 0;
>  
>      acpi_get_cpu_info(&cpu);
>      acpi_get_pm_info(&pm);
> @@ -1474,13 +1479,20 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
>      dsdt = tables->table_data->len;
>      build_dsdt(tables->table_data, tables->linker, &misc);
>  
> +    /* Count the size of the DSDT and SSDT, we will need it for legacy
> +     * sizing of ACPI tables.
> +     */
> +    aml_len += tables->table_data->len - dsdt;
> +
>      /* ACPI tables pointed to by RSDT */
>      acpi_add_table(table_offsets, tables->table_data);
>      build_fadt(tables->table_data, tables->linker, &pm, facs, dsdt);
>  
> +    ssdt = tables->table_data->len;
>      acpi_add_table(table_offsets, tables->table_data);
>      build_ssdt(tables->table_data, tables->linker, &cpu, &pm, &misc, &pci,
>                 guest_info);
> +    aml_len += tables->table_data->len - ssdt;
>  
>      acpi_add_table(table_offsets, tables->table_data);
>      build_madt(tables->table_data, tables->linker, &cpu, guest_info);
> @@ -1513,12 +1525,53 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
>      /* RSDP is in FSEG memory, so allocate it separately */
>      build_rsdp(tables->rsdp, tables->linker, rsdt);
>  
> -    /* We'll expose it all to Guest so align size to reduce
> +    /* We'll expose it all to Guest so we want to reduce
>       * chance of size changes.
>       * RSDP is small so it's easy to keep it immutable, no need to
>       * bother with alignment.
> +     *
> +     * We used to align the tables to 4k, but of course this would
> +     * too simple to be enough.  4k turned out to be too small an
> +     * alignment very soon, and in fact it is almost impossible to
> +     * keep the table size stable for all (max_cpus, max_memory_slots)
> +     * combinations.  So the table size is always 64k for pc-2.1 and
> +     * we give an error if the table grows beyond that limit.
> +     *
> +     * We still have the problem of migrating from "-M pc-2.0".  For that,
> +     * we exploit the fact that QEMU 2.1 generates _smaller_ tables than 2.0
> +     * and we can always pad the smaller tables with zeros.  We can then use
> +     * the exact size of the 2.0 tables.
> +     *
> +     * All this is for PIIX4, since QEMU 2.0 didn't support Q35 migration.
>       */
> -    acpi_align_size(tables->table_data, 0x1000);
> +    if (guest_info->legacy_acpi_table_size) {
> +        /* Subtracting aml_len gives the size of fixed tables.  Then add the
> +         * size of the PIIX4 DSDT/SSDT in QEMU 2.0.
> +         */
> +        int legacy_aml_len =
> +            guest_info->legacy_acpi_table_size +
> +            97 * max_cpus +
> +            1875 * (MAX(bsel_alloc, 1) - 1);
> +        int legacy_table_size =
> +            ROUND_UP(tables->table_data->len - aml_len + legacy_aml_len, 0x1000);
> +        if (tables->table_data->len > legacy_table_size) {
> +            /* -M pc-2.0 doesn't support memory hotplug, so this should never
> +             * happen.
> +             */
> +            error_report("This configuration is not supported with -M pc-2.0.");
> +            error_report("Please report this to qemu-devel@nongnu.org.");
> +            exit(1);
> +        }
> +        g_array_set_size(tables->table_data, legacy_table_size);
> +    } else {
> +        if (tables->table_data->len > 65536) {
> +            /* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory slots.  */
> +            error_report("Too many maximum CPUs, NUMA nodes or memory slots.");
> +            error_report("Please decrease one of these parameters.");
> +            exit(1);
> +        }
> +        g_array_set_size(tables->table_data, 0x10000);
> +    }
>  
>      acpi_align_size(tables->linker, 0x1000);
>  
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index 7081c08..4d3da20 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -61,6 +61,7 @@ static const int ide_irq[MAX_IDE_BUS] = { 14, 15 };
>  
>  static bool has_pci_info;
>  static bool has_acpi_build = true;
> +static int legacy_acpi_table_size;
>  static bool smbios_defaults = true;
>  static bool smbios_legacy_mode;
>  /* Make sure that guest addresses aligned at 1Gbyte boundaries get mapped to
> @@ -163,6 +164,7 @@ static void pc_init1(MachineState *machine,
>      guest_info = pc_guest_info_init(below_4g_mem_size, above_4g_mem_size);
>  
>      guest_info->has_acpi_build = has_acpi_build;
> +    guest_info->legacy_acpi_table_size = legacy_acpi_table_size;
>  
>      guest_info->has_pci_info = has_pci_info;
>      guest_info->isapc_ram_fw = !pci_enabled;
> @@ -297,6 +299,24 @@ static void pc_init_pci(MachineState *machine)
>  
>  static void pc_compat_2_0(MachineState *machine)
>  {
> +    /* This value depends on the actual DSDT and SSDT compiled into
> +     * the source QEMU; unfortunately it depends on the binary and
> +     * not on the machine type, so we cannot make pc-1.7 work on
> +     * both QEMU 1.7 and QEMU 2.0.
> +     *
> +     * Large variations cause migration to fail for more than one
> +     * consecutive value of the "-smp" maxcpus option.
> +     *
> +     * For small variations of the kind caused by different iasl versions,
> +     * the 4k rounding usually leaves slack.  However, there could be still
> +     * one or two values that break.  For QEMU 1.7 and QEMU 2.0 the
> +     * slack is only ~10 bytes before one "-smp maxcpus" value breaks!
> +     * on the actual contents of the DSDT and SSDT).

I think the last line of this comment paragraph:

    on the actual contents of the DSDT and SSDT).

is a remnant / earlier version of the very beginning of the first
comment paragraph:

    This value depends on the actual DSDT and SSDT compiled into

but I don't think this would warrant a respin, if we're pressed for time.

Reviewed-by: Laszlo Ersek <lersek@redhat.com>

> +     *
> +     * 6652 is valid for QEMU 2.0, the right value for pc-1.7 on
> +     * QEMU 1.7 it is 6414.  For RHEL/CentOS 7.0 it is 6418.
> +     */
> +    legacy_acpi_table_size = 6652;
>      smbios_legacy_mode = true;
>      has_reserved_memory = false;
>  }
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index f551961..c39ee98 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -155,6 +155,11 @@ static void pc_q35_init(MachineState *machine)
>      guest_info->has_acpi_build = has_acpi_build;
>      guest_info->has_reserved_memory = has_reserved_memory;
>  
> +    /* Migration was not supported in 2.0 for Q35, so do not bother
> +     * with this hack (see hw/i386/acpi-build.c).
> +     */
> +    guest_info->legacy_acpi_table_size = 0;
> +
>      if (smbios_defaults) {
>          MachineClass *mc = MACHINE_GET_CLASS(machine);
>          /* These values are guest ABI, do not change */
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 1c0c382..f4b9b2b 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -94,6 +94,7 @@ struct PcGuestInfo {
>      uint64_t *node_mem;
>      uint64_t *node_cpu;
>      FWCfgState *fw_cfg;
> +    int legacy_acpi_table_size;
>      bool has_acpi_build;
>      bool has_reserved_memory;
>  };
> 

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] acpi-dsdt: procedurally generate _PRT
  2014-07-23 16:37 ` [Qemu-devel] [PATCH 1/2] acpi-dsdt: procedurally generate _PRT Paolo Bonzini
  2014-07-23 19:27   ` Laszlo Ersek
@ 2014-07-24  8:22   ` Igor Mammedov
  1 sibling, 0 replies; 9+ messages in thread
From: Igor Mammedov @ 2014-07-24  8:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: peter.maydell, mst, qemu-devel, dgilbert, amit.shah, lersek

On Wed, 23 Jul 2014 18:37:45 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> This replaces the _PRT constant with a method that computes it.
> 
> The problem is that the DSDT+SSDT have grown from 2.0 to 2.1,
> enough to cross the 8k barrier (we align the ACPI tables to 4k
> before putting them in fw_cfg).  This causes problems with
> migration and the pc-2.0 machine type.
> 
> The solution to the problem is to hardcode 64k as the limit,
> but this doesn't solve the bug with pc-2.0.  The fix will be
> for QEMU 2.1 to use exactly the same size as QEMU 2.0 for the
> ACPI tables.  First, however, we must make the actual AML size
> equal or smaller; to do this, rewrite _PRT in a way that saves
> over 1k of bytecode.
> 
> Tested on Windows XP.  Q35 already uses a method for _PRT
> so most guests should be okay.
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/i386/acpi-dsdt.dsl | 90 ++++++++++++++++++++++-----------------------------
After changing this file, it's precompiled counterpart also need to be updated

hw/i386/acpi-dsdt.hex.generated

>  1 file changed, 39 insertions(+), 51 deletions(-)
> 
> diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
> index 3cc0ea0..6ba0170 100644
> --- a/hw/i386/acpi-dsdt.dsl
> +++ b/hw/i386/acpi-dsdt.dsl
> @@ -181,57 +181,45 @@ DefinitionBlock (
>  
>      Scope(\_SB) {
>          Scope(PCI0) {
> -            Name(_PRT, Package() {
> -                /* PCI IRQ routing table, example from ACPI 2.0a specification,
> -                   section 6.2.8.1 */
> -                /* Note: we provide the same info as the PCI routing
> -                   table of the Bochs BIOS */
> -
> -#define prt_slot(nr, lnk0, lnk1, lnk2, lnk3) \
> -    Package() { nr##ffff, 0, lnk0, 0 }, \
> -    Package() { nr##ffff, 1, lnk1, 0 }, \
> -    Package() { nr##ffff, 2, lnk2, 0 }, \
> -    Package() { nr##ffff, 3, lnk3, 0 }
> -
> -#define prt_slot0(nr) prt_slot(nr, LNKD, LNKA, LNKB, LNKC)
> -#define prt_slot1(nr) prt_slot(nr, LNKA, LNKB, LNKC, LNKD)
> -#define prt_slot2(nr) prt_slot(nr, LNKB, LNKC, LNKD, LNKA)
> -#define prt_slot3(nr) prt_slot(nr, LNKC, LNKD, LNKA, LNKB)
> -
> -                prt_slot0(0x0000),
> -                /* Device 1 is power mgmt device, and can only use irq 9 */
> -                prt_slot(0x0001, LNKS, LNKB, LNKC, LNKD),
> -                prt_slot2(0x0002),
> -                prt_slot3(0x0003),
> -                prt_slot0(0x0004),
> -                prt_slot1(0x0005),
> -                prt_slot2(0x0006),
> -                prt_slot3(0x0007),
> -                prt_slot0(0x0008),
> -                prt_slot1(0x0009),
> -                prt_slot2(0x000a),
> -                prt_slot3(0x000b),
> -                prt_slot0(0x000c),
> -                prt_slot1(0x000d),
> -                prt_slot2(0x000e),
> -                prt_slot3(0x000f),
> -                prt_slot0(0x0010),
> -                prt_slot1(0x0011),
> -                prt_slot2(0x0012),
> -                prt_slot3(0x0013),
> -                prt_slot0(0x0014),
> -                prt_slot1(0x0015),
> -                prt_slot2(0x0016),
> -                prt_slot3(0x0017),
> -                prt_slot0(0x0018),
> -                prt_slot1(0x0019),
> -                prt_slot2(0x001a),
> -                prt_slot3(0x001b),
> -                prt_slot0(0x001c),
> -                prt_slot1(0x001d),
> -                prt_slot2(0x001e),
> -                prt_slot3(0x001f),
> -            })
> +            Method (_PRT, 0) {
> +                Store(Package(128) {}, Local0)
> +                Store(Zero, Local1)
> +                While(LLess(Local1, 128)) {
> +                    // slot = pin >> 2
> +                    Store(ShiftRight(Local1, 2), Local2)
> +
> +                    // lnk = (slot + pin) & 3
> +                    Store(And(Add(Local1, Local2), 3), Local3)
> +                    If (LEqual(Local3, 0)) {
> +                        Store(Package(4) { Zero, Zero, LNKD, Zero }, Local4)
> +                    }
> +                    If (LEqual(Local3, 1)) {
> +                        // device 1 is the power-management device, needs SCI
> +                        If (LEqual(Local1, 4)) {
> +                            Store(Package(4) { Zero, Zero, LNKS, Zero }, Local4)
> +                        } Else {
> +                            Store(Package(4) { Zero, Zero, LNKA, Zero }, Local4)
> +                        }
> +                    }
> +                    If (LEqual(Local3, 2)) {
> +                        Store(Package(4) { Zero, Zero, LNKB, Zero }, Local4)
> +                    }
> +                    If (LEqual(Local3, 3)) {
> +                        Store(Package(4) { Zero, Zero, LNKC, Zero }, Local4)
> +                    }
> +
> +                    // Complete the interrupt routing entry:
> +                    //    Package(4) { 0x[slot]FFFF, [pin], [link], 0) }
> +
> +                    Store(Or(ShiftLeft(Local2, 16), 0xFFFF), Index(Local4, 0))
> +                    Store(And(Local1, 3),                    Index(Local4, 1))
> +                    Store(Local4,                            Index(Local0, Local1))
> +
> +                    Increment(Local1)
> +                }
> +
> +                Return(Local0)
> +            }
>          }
>  
>          Field(PCI0.ISA.P40C, ByteAcc, NoLock, Preserve) {

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] pc: hack for migration compatibility from QEMU 2.0
  2014-07-23 16:37 ` [Qemu-devel] [PATCH 2/2] pc: hack for migration compatibility from QEMU 2.0 Paolo Bonzini
  2014-07-23 19:34   ` Laszlo Ersek
@ 2014-07-24  8:59   ` Igor Mammedov
  2014-07-24 14:28     ` Paolo Bonzini
  1 sibling, 1 reply; 9+ messages in thread
From: Igor Mammedov @ 2014-07-24  8:59 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: peter.maydell, mst, qemu-devel, dgilbert, amit.shah, lersek

On Wed, 23 Jul 2014 18:37:46 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Changing the ACPI table size causes migration to break, and the memory
> hotplug work opened our eyes on how horribly we were breaking things in
> 2.0 already.
> 
> The ACPI table size is rounded to the next 4k, which one would think
> gives some headroom.  In practice this is not the case, because the user
> can control the ACPI table size (each CPU adds 105 bytes) and so some
> "-smp" values will break the 4k boundary and fail to migrate.  Similarly,
> PCI bridges add ~1870 bytes to the SSDT.
> 
> To fix this, hard-code 64k as the maximum ACPI table size, which
> (despite being an order of magnitude smaller than 640k) should be enough
> for everyone.
> 
> To fix migration from QEMU 2.0, compute the payload size of QEMU 2.0
> and always use that one.  The previous patch shrunk the ACPI tables
> enough that the QEMU 2.0 size should always be enough.
> 
> Non-AML tables can change depending on the configuration (especially
> MADT, SRAT, HPET) but they remain the same between QEMU 2.0 and 2.1,
> so we only compute our padding based on the sizes of the SSDT and DSDT.
> 
> Migration from QEMU 1.7 should work for guests that have a number of CPUs
> other than 12, 13, 14, 54, 55, 56, 97, 98, 139, 140, and that have no
> PCI bridges.  It was already broken from QEMU 1.7 to QEMU 2.0 in the
> same way, though.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/i386/acpi-build.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++----
>  hw/i386/pc_piix.c    | 20 +++++++++++++++++
>  hw/i386/pc_q35.c     |  5 +++++
>  include/hw/i386/pc.h |  1 +
>  4 files changed, 83 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index ebc5f03..7373d93 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -25,7 +25,9 @@
>  #include <glib.h>
>  #include "qemu-common.h"
>  #include "qemu/bitmap.h"
> +#include "qemu/osdep.h"
>  #include "qemu/range.h"
> +#include "qemu/error-report.h"
>  #include "hw/pci/pci.h"
>  #include "qom/cpu.h"
>  #include "hw/i386/pc.h"
> @@ -87,6 +89,8 @@ typedef struct AcpiBuildPciBusHotplugState {
>      struct AcpiBuildPciBusHotplugState *parent;
>  } AcpiBuildPciBusHotplugState;
>  
> +unsigned bsel_alloc;
> +
>  static void acpi_get_dsdt(AcpiMiscInfo *info)
>  {
>      uint16_t *applesmc_sta;
> @@ -759,8 +763,8 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
>  static void acpi_set_pci_info(void)
>  {
>      PCIBus *bus = find_i440fx(); /* TODO: Q35 support */
> -    unsigned bsel_alloc = 0;
>  
> +    assert(bsel_alloc == 0);
>      if (bus) {
>          /* Scan all PCI buses. Set property to enable acpi based hotplug. */
>          pci_for_each_bus_depth_first(bus, acpi_set_bsel, NULL, &bsel_alloc);
> @@ -1440,13 +1444,14 @@ static
>  void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
>  {
>      GArray *table_offsets;
> -    unsigned facs, dsdt, rsdt;
> +    unsigned facs, ssdt, dsdt, rsdt;
>      AcpiCpuInfo cpu;
>      AcpiPmInfo pm;
>      AcpiMiscInfo misc;
>      AcpiMcfgInfo mcfg;
>      PcPciInfo pci;
>      uint8_t *u;
> +    size_t aml_len = 0;
>  
>      acpi_get_cpu_info(&cpu);
>      acpi_get_pm_info(&pm);
> @@ -1474,13 +1479,20 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
>      dsdt = tables->table_data->len;
>      build_dsdt(tables->table_data, tables->linker, &misc);
>  
> +    /* Count the size of the DSDT and SSDT, we will need it for legacy
> +     * sizing of ACPI tables.
> +     */
> +    aml_len += tables->table_data->len - dsdt;
> +
>      /* ACPI tables pointed to by RSDT */
>      acpi_add_table(table_offsets, tables->table_data);
>      build_fadt(tables->table_data, tables->linker, &pm, facs, dsdt);
>  
> +    ssdt = tables->table_data->len;
>      acpi_add_table(table_offsets, tables->table_data);
>      build_ssdt(tables->table_data, tables->linker, &cpu, &pm, &misc, &pci,
>                 guest_info);
> +    aml_len += tables->table_data->len - ssdt;
>  
>      acpi_add_table(table_offsets, tables->table_data);
>      build_madt(tables->table_data, tables->linker, &cpu, guest_info);
> @@ -1513,12 +1525,53 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
>      /* RSDP is in FSEG memory, so allocate it separately */
>      build_rsdp(tables->rsdp, tables->linker, rsdt);
>  
> -    /* We'll expose it all to Guest so align size to reduce
> +    /* We'll expose it all to Guest so we want to reduce
>       * chance of size changes.
>       * RSDP is small so it's easy to keep it immutable, no need to
>       * bother with alignment.
> +     *
> +     * We used to align the tables to 4k, but of course this would
> +     * too simple to be enough.  4k turned out to be too small an
> +     * alignment very soon, and in fact it is almost impossible to
> +     * keep the table size stable for all (max_cpus, max_memory_slots)
> +     * combinations.  So the table size is always 64k for pc-2.1 and
> +     * we give an error if the table grows beyond that limit.
> +     *
> +     * We still have the problem of migrating from "-M pc-2.0".  For that,
> +     * we exploit the fact that QEMU 2.1 generates _smaller_ tables than 2.0
> +     * and we can always pad the smaller tables with zeros.  We can then use
> +     * the exact size of the 2.0 tables.
> +     *
> +     * All this is for PIIX4, since QEMU 2.0 didn't support Q35 migration.
>       */
> -    acpi_align_size(tables->table_data, 0x1000);
> +    if (guest_info->legacy_acpi_table_size) {
> +        /* Subtracting aml_len gives the size of fixed tables.  Then add the
> +         * size of the PIIX4 DSDT/SSDT in QEMU 2.0.
> +         */
> +        int legacy_aml_len =
> +            guest_info->legacy_acpi_table_size +
> +            97 * max_cpus +
Commit message says it's 105 and not 97 so one of them should be fixed.
Also please replace magic numbers (above and below) with defines so that
it would be clear what they mean in the future.

> +            1875 * (MAX(bsel_alloc, 1) - 1);
> +        int legacy_table_size =
> +            ROUND_UP(tables->table_data->len - aml_len + legacy_aml_len, 0x1000);
line over 80 characters

> +        if (tables->table_data->len > legacy_table_size) {
> +            /* -M pc-2.0 doesn't support memory hotplug, so this should never
> +             * happen.
it supports hotplug on PCI bridges, which could lead to this branch,
just dropping this comment is fine.

> +             */
> +            error_report("This configuration is not supported with -M pc-2.0.");
For user it leaves open questions: why? What is wrong?

> +            error_report("Please report this to qemu-devel@nongnu.org.");
> +            exit(1);
> +        }
> +        g_array_set_size(tables->table_data, legacy_table_size);
> +    } else {
> +        if (tables->table_data->len > 65536) {
Looking in future if we expand amount of supported VCPUs to 1024,
SSDT table will quickly grow to 100K, perhaps 128K or 256K would be better?


> +            /* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory slots.  */
isn't for 2.1 VCPUs max 256, or even for 2.0?

line over 80 characters

> +            error_report("Too many maximum CPUs, NUMA nodes or memory slots.");
Add PCI bridges here since they affect size greatly, and even if user removes
all CPUs and turns off memory hotplug, he still will get this error if bridge devices
at startup will exceed above limit.

> +            error_report("Please decrease one of these parameters.");
> +            exit(1);
> +        }
> +        g_array_set_size(tables->table_data, 0x10000);
Maybe define for size here and above?

> +    }
>  
>      acpi_align_size(tables->linker, 0x1000);
>  
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index 7081c08..4d3da20 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -61,6 +61,7 @@ static const int ide_irq[MAX_IDE_BUS] = { 14, 15 };
>  
>  static bool has_pci_info;
>  static bool has_acpi_build = true;
> +static int legacy_acpi_table_size;
>  static bool smbios_defaults = true;
>  static bool smbios_legacy_mode;
>  /* Make sure that guest addresses aligned at 1Gbyte boundaries get mapped to
> @@ -163,6 +164,7 @@ static void pc_init1(MachineState *machine,
>      guest_info = pc_guest_info_init(below_4g_mem_size, above_4g_mem_size);
>  
>      guest_info->has_acpi_build = has_acpi_build;
> +    guest_info->legacy_acpi_table_size = legacy_acpi_table_size;
>  
>      guest_info->has_pci_info = has_pci_info;
>      guest_info->isapc_ram_fw = !pci_enabled;
> @@ -297,6 +299,24 @@ static void pc_init_pci(MachineState *machine)
>  
>  static void pc_compat_2_0(MachineState *machine)
>  {
> +    /* This value depends on the actual DSDT and SSDT compiled into
> +     * the source QEMU; unfortunately it depends on the binary and
> +     * not on the machine type, so we cannot make pc-1.7 work on
> +     * both QEMU 1.7 and QEMU 2.0.
> +     *
> +     * Large variations cause migration to fail for more than one
> +     * consecutive value of the "-smp" maxcpus option.
> +     *
> +     * For small variations of the kind caused by different iasl versions,
> +     * the 4k rounding usually leaves slack.  However, there could be still
> +     * one or two values that break.  For QEMU 1.7 and QEMU 2.0 the
> +     * slack is only ~10 bytes before one "-smp maxcpus" value breaks!
> +     * on the actual contents of the DSDT and SSDT).
> +     *
> +     * 6652 is valid for QEMU 2.0, the right value for pc-1.7 on
> +     * QEMU 1.7 it is 6414.  For RHEL/CentOS 7.0 it is 6418.
> +     */
> +    legacy_acpi_table_size = 6652;
>      smbios_legacy_mode = true;
>      has_reserved_memory = false;
>  }
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index f551961..c39ee98 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -155,6 +155,11 @@ static void pc_q35_init(MachineState *machine)
>      guest_info->has_acpi_build = has_acpi_build;
>      guest_info->has_reserved_memory = has_reserved_memory;
>  
> +    /* Migration was not supported in 2.0 for Q35, so do not bother
> +     * with this hack (see hw/i386/acpi-build.c).
> +     */
> +    guest_info->legacy_acpi_table_size = 0;
> +
>      if (smbios_defaults) {
>          MachineClass *mc = MACHINE_GET_CLASS(machine);
>          /* These values are guest ABI, do not change */
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 1c0c382..f4b9b2b 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -94,6 +94,7 @@ struct PcGuestInfo {
>      uint64_t *node_mem;
>      uint64_t *node_cpu;
>      FWCfgState *fw_cfg;
> +    int legacy_acpi_table_size;
>      bool has_acpi_build;
>      bool has_reserved_memory;
>  };

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] pc: hack for migration compatibility from QEMU 2.0
  2014-07-24  8:59   ` Igor Mammedov
@ 2014-07-24 14:28     ` Paolo Bonzini
  0 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2014-07-24 14:28 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: peter.maydell, mst, qemu-devel, dgilbert, amit.shah, lersek

Il 24/07/2014 10:59, Igor Mammedov ha scritto:
> On Wed, 23 Jul 2014 18:37:46 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> Changing the ACPI table size causes migration to break, and the memory
>> hotplug work opened our eyes on how horribly we were breaking things in
>> 2.0 already.
>>
>> The ACPI table size is rounded to the next 4k, which one would think
>> gives some headroom.  In practice this is not the case, because the user
>> can control the ACPI table size (each CPU adds 105 bytes) and so some
>> "-smp" values will break the 4k boundary and fail to migrate.  Similarly,
>> PCI bridges add ~1870 bytes to the SSDT.
>>
>> To fix this, hard-code 64k as the maximum ACPI table size, which
>> (despite being an order of magnitude smaller than 640k) should be enough
>> for everyone.
>>
>> To fix migration from QEMU 2.0, compute the payload size of QEMU 2.0
>> and always use that one.  The previous patch shrunk the ACPI tables
>> enough that the QEMU 2.0 size should always be enough.
>>
>> Non-AML tables can change depending on the configuration (especially
>> MADT, SRAT, HPET) but they remain the same between QEMU 2.0 and 2.1,
>> so we only compute our padding based on the sizes of the SSDT and DSDT.
>>
>> Migration from QEMU 1.7 should work for guests that have a number of CPUs
>> other than 12, 13, 14, 54, 55, 56, 97, 98, 139, 140, and that have no
>> PCI bridges.  It was already broken from QEMU 1.7 to QEMU 2.0 in the
>> same way, though.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  hw/i386/acpi-build.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++----
>>  hw/i386/pc_piix.c    | 20 +++++++++++++++++
>>  hw/i386/pc_q35.c     |  5 +++++
>>  include/hw/i386/pc.h |  1 +
>>  4 files changed, 83 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index ebc5f03..7373d93 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -25,7 +25,9 @@
>>  #include <glib.h>
>>  #include "qemu-common.h"
>>  #include "qemu/bitmap.h"
>> +#include "qemu/osdep.h"
>>  #include "qemu/range.h"
>> +#include "qemu/error-report.h"
>>  #include "hw/pci/pci.h"
>>  #include "qom/cpu.h"
>>  #include "hw/i386/pc.h"
>> @@ -87,6 +89,8 @@ typedef struct AcpiBuildPciBusHotplugState {
>>      struct AcpiBuildPciBusHotplugState *parent;
>>  } AcpiBuildPciBusHotplugState;
>>  
>> +unsigned bsel_alloc;
>> +
>>  static void acpi_get_dsdt(AcpiMiscInfo *info)
>>  {
>>      uint16_t *applesmc_sta;
>> @@ -759,8 +763,8 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
>>  static void acpi_set_pci_info(void)
>>  {
>>      PCIBus *bus = find_i440fx(); /* TODO: Q35 support */
>> -    unsigned bsel_alloc = 0;
>>  
>> +    assert(bsel_alloc == 0);
>>      if (bus) {
>>          /* Scan all PCI buses. Set property to enable acpi based hotplug. */
>>          pci_for_each_bus_depth_first(bus, acpi_set_bsel, NULL, &bsel_alloc);
>> @@ -1440,13 +1444,14 @@ static
>>  void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
>>  {
>>      GArray *table_offsets;
>> -    unsigned facs, dsdt, rsdt;
>> +    unsigned facs, ssdt, dsdt, rsdt;
>>      AcpiCpuInfo cpu;
>>      AcpiPmInfo pm;
>>      AcpiMiscInfo misc;
>>      AcpiMcfgInfo mcfg;
>>      PcPciInfo pci;
>>      uint8_t *u;
>> +    size_t aml_len = 0;
>>  
>>      acpi_get_cpu_info(&cpu);
>>      acpi_get_pm_info(&pm);
>> @@ -1474,13 +1479,20 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
>>      dsdt = tables->table_data->len;
>>      build_dsdt(tables->table_data, tables->linker, &misc);
>>  
>> +    /* Count the size of the DSDT and SSDT, we will need it for legacy
>> +     * sizing of ACPI tables.
>> +     */
>> +    aml_len += tables->table_data->len - dsdt;
>> +
>>      /* ACPI tables pointed to by RSDT */
>>      acpi_add_table(table_offsets, tables->table_data);
>>      build_fadt(tables->table_data, tables->linker, &pm, facs, dsdt);
>>  
>> +    ssdt = tables->table_data->len;
>>      acpi_add_table(table_offsets, tables->table_data);
>>      build_ssdt(tables->table_data, tables->linker, &cpu, &pm, &misc, &pci,
>>                 guest_info);
>> +    aml_len += tables->table_data->len - ssdt;
>>  
>>      acpi_add_table(table_offsets, tables->table_data);
>>      build_madt(tables->table_data, tables->linker, &cpu, guest_info);
>> @@ -1513,12 +1525,53 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
>>      /* RSDP is in FSEG memory, so allocate it separately */
>>      build_rsdp(tables->rsdp, tables->linker, rsdt);
>>  
>> -    /* We'll expose it all to Guest so align size to reduce
>> +    /* We'll expose it all to Guest so we want to reduce
>>       * chance of size changes.
>>       * RSDP is small so it's easy to keep it immutable, no need to
>>       * bother with alignment.
>> +     *
>> +     * We used to align the tables to 4k, but of course this would
>> +     * too simple to be enough.  4k turned out to be too small an
>> +     * alignment very soon, and in fact it is almost impossible to
>> +     * keep the table size stable for all (max_cpus, max_memory_slots)
>> +     * combinations.  So the table size is always 64k for pc-2.1 and
>> +     * we give an error if the table grows beyond that limit.
>> +     *
>> +     * We still have the problem of migrating from "-M pc-2.0".  For that,
>> +     * we exploit the fact that QEMU 2.1 generates _smaller_ tables than 2.0
>> +     * and we can always pad the smaller tables with zeros.  We can then use
>> +     * the exact size of the 2.0 tables.
>> +     *
>> +     * All this is for PIIX4, since QEMU 2.0 didn't support Q35 migration.
>>       */
>> -    acpi_align_size(tables->table_data, 0x1000);
>> +    if (guest_info->legacy_acpi_table_size) {
>> +        /* Subtracting aml_len gives the size of fixed tables.  Then add the
>> +         * size of the PIIX4 DSDT/SSDT in QEMU 2.0.
>> +         */
>> +        int legacy_aml_len =
>> +            guest_info->legacy_acpi_table_size +
>> +            97 * max_cpus +
> Commit message says it's 105 and not 97 so one of them should be fixed.
> Also please replace magic numbers (above and below) with defines so that
> it would be clear what they mean in the future.

Right, it's 97 in the SSDT and 8 in the MADT.

>> +            1875 * (MAX(bsel_alloc, 1) - 1);
>> +        int legacy_table_size =
>> +            ROUND_UP(tables->table_data->len - aml_len + legacy_aml_len, 0x1000);
> line over 80 characters
> 
>> +        if (tables->table_data->len > legacy_table_size) {
>> +            /* -M pc-2.0 doesn't support memory hotplug, so this should never
>> +             * happen.
> it supports hotplug on PCI bridges, which could lead to this branch,
> just dropping this comment is fine.

Hotplug on PCI bridges is accounted, see the 1875 above.

> Looking in future if we expand amount of supported VCPUs to 1024,
> SSDT table will quickly grow to 100K, perhaps 128K or 256K would be better?

This memory is allocated by the BIOS (including all the unused space at
the end), so I'd rather not have an exaggerate padding).

> 
>> +            /* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory slots.  */
> isn't for 2.1 VCPUs max 256, or even for 2.0?

Yeah, this is just an example.  The limit is really just what the kernel
reports.

> line over 80 characters
> 
>> +            error_report("Too many maximum CPUs, NUMA nodes or memory slots.");
> Add PCI bridges here since they affect size greatly, and even if user removes
> all CPUs and turns off memory hotplug, he still will get this error if bridge devices
> at startup will exceed above limit.

Ok.

>> +            error_report("Please decrease one of these parameters.");
>> +            exit(1);
>> +        }
>> +        g_array_set_size(tables->table_data, 0x10000);
> Maybe define for size here and above?

Oops, of course. :)

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] pc: fix /etc/acpi/tables size in fw_cfg for -M pc-2.0
  2014-07-23 16:37 [Qemu-devel] [PATCH 0/2] pc: fix /etc/acpi/tables size in fw_cfg for -M pc-2.0 Paolo Bonzini
  2014-07-23 16:37 ` [Qemu-devel] [PATCH 1/2] acpi-dsdt: procedurally generate _PRT Paolo Bonzini
  2014-07-23 16:37 ` [Qemu-devel] [PATCH 2/2] pc: hack for migration compatibility from QEMU 2.0 Paolo Bonzini
@ 2014-07-24 15:22 ` Igor Mammedov
  2 siblings, 0 replies; 9+ messages in thread
From: Igor Mammedov @ 2014-07-24 15:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: peter.maydell, mst, qemu-devel, dgilbert, amit.shah, lersek

On Wed, 23 Jul 2014 18:37:44 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Changing the ACPI table size causes migration to break, and the memory
> hotplug work opened our eyes on how horribly we were breaking things in
> 2.0 already.
> 
> Unfortunately when reviewing the design I assumed incorrectly that all
> tables would be placed in separate fw_cfg files.  This would have been
> better, because you can always move stuff to a new SSDT (and thus a new
> file), keeping the sizes under control.
> 
> Hard-code 64k as the maximum ACPI table size; for -M pc-i440fx-2.0
> and -M pc-i440fx-1.7 compute the payload size of QEMU 2.0 and always
> use that one.  This works always for QEMU 2.0, and also for 1.7
> except for a few values of "-smp maxcpus".
> 
> The first patch is needed to shrink the ACPI tables and make them
> smaller than they used to be in 2.0.
> 
> Please test and ack.  I'll do more testing tomorrow.
> 
> Paolo
> 
> 
> Paolo Bonzini (2):
>   acpi-dsdt: procedurally generate _PRT
>   pc: hack for migration compatibility from QEMU 2.0
> 
>  hw/i386/acpi-build.c  | 61 +++++++++++++++++++++++++++++++---
>  hw/i386/acpi-dsdt.dsl | 90 ++++++++++++++++++++++-----------------------------
>  hw/i386/pc_piix.c     | 20 ++++++++++++
>  hw/i386/pc_q35.c      |  5 +++
>  include/hw/i386/pc.h  |  1 +
>  5 files changed, 122 insertions(+), 55 deletions(-)
> 

Aside of my cosmetic comments per-patch,

I've tested series with booting guest in QEMU 1.7, migrating to QEMU 2.1 and rebooting guest there
with WS2003Ex64, WS2008DCx32, WS2012DCx64, WS2012RC2x64 guest OSes,
so on respin you can use my:

Tested-by: Igor Mammedov <imammedo@redhat.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-07-24 15:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-23 16:37 [Qemu-devel] [PATCH 0/2] pc: fix /etc/acpi/tables size in fw_cfg for -M pc-2.0 Paolo Bonzini
2014-07-23 16:37 ` [Qemu-devel] [PATCH 1/2] acpi-dsdt: procedurally generate _PRT Paolo Bonzini
2014-07-23 19:27   ` Laszlo Ersek
2014-07-24  8:22   ` Igor Mammedov
2014-07-23 16:37 ` [Qemu-devel] [PATCH 2/2] pc: hack for migration compatibility from QEMU 2.0 Paolo Bonzini
2014-07-23 19:34   ` Laszlo Ersek
2014-07-24  8:59   ` Igor Mammedov
2014-07-24 14:28     ` Paolo Bonzini
2014-07-24 15:22 ` [Qemu-devel] [PATCH 0/2] pc: fix /etc/acpi/tables size in fw_cfg for -M pc-2.0 Igor Mammedov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.