qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] i386/acpi: restore device paths for pre-5.1 vms
@ 2021-03-01 19:59 Vitaly Cheptsov
  2021-03-01 19:59 ` [PATCH] target/ppc: fix icount support on Book-e vms accessing SPRs Vitaly Cheptsov
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Vitaly Cheptsov @ 2021-03-01 19:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Vitaly Cheptsov, Michael S . Tsirkin, qemu-stable, Thomas Lamprecht

After fixing the _UID value for the primary PCI root bridge in
af1b80ae it was discovered that this change updates Windows
configuration in an incompatible way causing network configuration
failure unless DHCP is used. More details provided on the list:

https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg08484.html

This change reverts the _UID update from 1 to 0 for q35 and i440fx
VMs before version 5.2 to maintain the original behaviour when
upgrading.

Cc: qemu-stable@nongnu.org
Cc: qemu-devel@nongnu.org
Reported-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vitaly Cheptsov <cheptsov@ispras.ru>
---
 hw/i386/acpi-build.c | 4 ++--
 hw/i386/pc_piix.c    | 2 ++
 hw/i386/pc_q35.c     | 2 ++
 include/hw/i386/pc.h | 1 +
 4 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 31a5f6f4a5..442b4629a9 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1277,7 +1277,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
         dev = aml_device("PCI0");
         aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
         aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
-        aml_append(dev, aml_name_decl("_UID", aml_int(0)));
+        aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
         aml_append(sb_scope, dev);
         aml_append(dsdt, sb_scope);
 
@@ -1296,7 +1296,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
         aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
         aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
         aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
-        aml_append(dev, aml_name_decl("_UID", aml_int(0)));
+        aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
         aml_append(dev, build_q35_osc_method());
         aml_append(sb_scope, dev);
         if (mcfg_valid) {
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 2904b40163..46cc951073 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -405,6 +405,7 @@ static void pc_i440fx_machine_options(MachineClass *m)
 {
     PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
     pcmc->default_nic_model = "e1000";
+    pcmc->pci_root_uid = 0;
 
     m->family = "pc_piix";
     m->desc = "Standard PC (i440FX + PIIX, 1996)";
@@ -448,6 +449,7 @@ static void pc_i440fx_5_1_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
     compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
     pcmc->kvmclock_create_always = false;
+    pcmc->pci_root_uid = 1;
 }
 
 DEFINE_I440FX_MACHINE(v5_1, "pc-i440fx-5.1", NULL,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 0a212443aa..53450190f5 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -329,6 +329,7 @@ static void pc_q35_machine_options(MachineClass *m)
 {
     PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
     pcmc->default_nic_model = "e1000e";
+    pcmc->pci_root_uid = 0;
 
     m->family = "pc_q35";
     m->desc = "Standard PC (Q35 + ICH9, 2009)";
@@ -375,6 +376,7 @@ static void pc_q35_5_1_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
     compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
     pcmc->kvmclock_create_always = false;
+    pcmc->pci_root_uid = 1;
 }
 
 DEFINE_Q35_MACHINE(v5_1, "pc-q35-5.1", NULL,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index c9d194a5e7..d4c3d73c11 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -105,6 +105,7 @@ struct PCMachineClass {
     int legacy_acpi_table_size;
     unsigned acpi_data_size;
     bool do_not_add_smb_acpi;
+    int pci_root_uid;
 
     /* SMBIOS compat: */
     bool smbios_defaults;
-- 
2.24.3 (Apple Git-128)



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] target/ppc: fix icount support on Book-e vms accessing SPRs
  2021-03-01 19:59 [PATCH] i386/acpi: restore device paths for pre-5.1 vms Vitaly Cheptsov
@ 2021-03-01 19:59 ` Vitaly Cheptsov
  2021-03-02  7:05 ` [PATCH] i386/acpi: restore device paths for pre-5.1 vms Thomas Lamprecht
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Vitaly Cheptsov @ 2021-03-01 19:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vitaly Cheptsov

Failing to guard SPR access with gen_io_start/gen_stop_exception
causes "Bad icount read" exceptions when running VMs with
e500mc and e500v2 CPUs with an icount parameter.

Cc: qemu-devel@nongnu.org
Signed-off-by: Vitaly Cheptsov <cheptsov@ispras.ru>
---
 target/ppc/translate_init.c.inc | 36 +++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/target/ppc/translate_init.c.inc b/target/ppc/translate_init.c.inc
index e7324e85cd..09c9ae2c98 100644
--- a/target/ppc/translate_init.c.inc
+++ b/target/ppc/translate_init.c.inc
@@ -567,35 +567,71 @@ static void spr_write_601_ubatl(DisasContext *ctx, int sprn, int gprn)
 #if !defined(CONFIG_USER_ONLY)
 static void spr_read_40x_pit(DisasContext *ctx, int gprn, int sprn)
 {
+    if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+        gen_io_start();
+    }
     gen_helper_load_40x_pit(cpu_gpr[gprn], cpu_env);
+    if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+        gen_stop_exception(ctx);
+    }
 }
 
 static void spr_write_40x_pit(DisasContext *ctx, int sprn, int gprn)
 {
+    if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+        gen_io_start();
+    }
     gen_helper_store_40x_pit(cpu_env, cpu_gpr[gprn]);
+    if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+        gen_stop_exception(ctx);
+    }
 }
 
 static void spr_write_40x_dbcr0(DisasContext *ctx, int sprn, int gprn)
 {
+    if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+        gen_io_start();
+    }
     gen_store_spr(sprn, cpu_gpr[gprn]);
     gen_helper_store_40x_dbcr0(cpu_env, cpu_gpr[gprn]);
     /* We must stop translation as we may have rebooted */
     gen_stop_exception(ctx);
+    if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+        gen_stop_exception(ctx);
+    }
 }
 
 static void spr_write_40x_sler(DisasContext *ctx, int sprn, int gprn)
 {
+    if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+        gen_io_start();
+    }
     gen_helper_store_40x_sler(cpu_env, cpu_gpr[gprn]);
+    if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+        gen_stop_exception(ctx);
+    }
 }
 
 static void spr_write_booke_tcr(DisasContext *ctx, int sprn, int gprn)
 {
+    if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+        gen_io_start();
+    }
     gen_helper_store_booke_tcr(cpu_env, cpu_gpr[gprn]);
+    if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+        gen_stop_exception(ctx);
+    }
 }
 
 static void spr_write_booke_tsr(DisasContext *ctx, int sprn, int gprn)
 {
+    if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+        gen_io_start();
+    }
     gen_helper_store_booke_tsr(cpu_env, cpu_gpr[gprn]);
+    if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+        gen_stop_exception(ctx);
+    }
 }
 #endif
 
-- 
2.24.3 (Apple Git-128)



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] i386/acpi: restore device paths for pre-5.1 vms
  2021-03-01 19:59 [PATCH] i386/acpi: restore device paths for pre-5.1 vms Vitaly Cheptsov
  2021-03-01 19:59 ` [PATCH] target/ppc: fix icount support on Book-e vms accessing SPRs Vitaly Cheptsov
@ 2021-03-02  7:05 ` Thomas Lamprecht
  2021-03-02  8:48 ` Igor Mammedov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Thomas Lamprecht @ 2021-03-02  7:05 UTC (permalink / raw)
  To: Vitaly Cheptsov, qemu-devel; +Cc: qemu-stable, Michael S . Tsirkin

On 01.03.21 20:59, Vitaly Cheptsov wrote:
> After fixing the _UID value for the primary PCI root bridge in
> af1b80ae it was discovered that this change updates Windows
> configuration in an incompatible way causing network configuration
> failure unless DHCP is used. More details provided on the list:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg08484.html
> 
> This change reverts the _UID update from 1 to 0 for q35 and i440fx
> VMs before version 5.2 to maintain the original behaviour when
> upgrading.
> 
> Cc: qemu-stable@nongnu.org
> Cc: qemu-devel@nongnu.org
> Reported-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Vitaly Cheptsov <cheptsov@ispras.ru>

Thanks for sending this! Works as advertised and can be cleanly cherry-picked
on top of the v5.2.0 tag.

Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i386/acpi: restore device paths for pre-5.1 vms
  2021-03-01 19:59 [PATCH] i386/acpi: restore device paths for pre-5.1 vms Vitaly Cheptsov
  2021-03-01 19:59 ` [PATCH] target/ppc: fix icount support on Book-e vms accessing SPRs Vitaly Cheptsov
  2021-03-02  7:05 ` [PATCH] i386/acpi: restore device paths for pre-5.1 vms Thomas Lamprecht
@ 2021-03-02  8:48 ` Igor Mammedov
  2021-03-02 10:14 ` Michael S. Tsirkin
  2021-03-22 15:43 ` Michael S. Tsirkin
  4 siblings, 0 replies; 20+ messages in thread
From: Igor Mammedov @ 2021-03-02  8:48 UTC (permalink / raw)
  To: Vitaly Cheptsov
  Cc: qemu-stable, Thomas Lamprecht, qemu-devel, Michael S . Tsirkin

On Mon,  1 Mar 2021 22:59:18 +0300
Vitaly Cheptsov <cheptsov@ispras.ru> wrote:

> After fixing the _UID value for the primary PCI root bridge in
> af1b80ae it was discovered that this change updates Windows
> configuration in an incompatible way causing network configuration
> failure unless DHCP is used. More details provided on the list:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg08484.html
> 
> This change reverts the _UID update from 1 to 0 for q35 and i440fx
> VMs before version 5.2 to maintain the original behaviour when
> upgrading.
> 
> Cc: qemu-stable@nongnu.org
> Cc: qemu-devel@nongnu.org
> Reported-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Vitaly Cheptsov <cheptsov@ispras.ru>

Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> ---
>  hw/i386/acpi-build.c | 4 ++--
>  hw/i386/pc_piix.c    | 2 ++
>  hw/i386/pc_q35.c     | 2 ++
>  include/hw/i386/pc.h | 1 +
>  4 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 31a5f6f4a5..442b4629a9 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1277,7 +1277,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>          dev = aml_device("PCI0");
>          aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
>          aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
> -        aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> +        aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
>          aml_append(sb_scope, dev);
>          aml_append(dsdt, sb_scope);
>  
> @@ -1296,7 +1296,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>          aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
>          aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
>          aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
> -        aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> +        aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
>          aml_append(dev, build_q35_osc_method());
>          aml_append(sb_scope, dev);
>          if (mcfg_valid) {
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index 2904b40163..46cc951073 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -405,6 +405,7 @@ static void pc_i440fx_machine_options(MachineClass *m)
>  {
>      PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
>      pcmc->default_nic_model = "e1000";
> +    pcmc->pci_root_uid = 0;
>  
>      m->family = "pc_piix";
>      m->desc = "Standard PC (i440FX + PIIX, 1996)";
> @@ -448,6 +449,7 @@ static void pc_i440fx_5_1_machine_options(MachineClass *m)
>      compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
>      compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
>      pcmc->kvmclock_create_always = false;
> +    pcmc->pci_root_uid = 1;
>  }
>  
>  DEFINE_I440FX_MACHINE(v5_1, "pc-i440fx-5.1", NULL,
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 0a212443aa..53450190f5 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -329,6 +329,7 @@ static void pc_q35_machine_options(MachineClass *m)
>  {
>      PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
>      pcmc->default_nic_model = "e1000e";
> +    pcmc->pci_root_uid = 0;
>  
>      m->family = "pc_q35";
>      m->desc = "Standard PC (Q35 + ICH9, 2009)";
> @@ -375,6 +376,7 @@ static void pc_q35_5_1_machine_options(MachineClass *m)
>      compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
>      compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
>      pcmc->kvmclock_create_always = false;
> +    pcmc->pci_root_uid = 1;
>  }
>  
>  DEFINE_Q35_MACHINE(v5_1, "pc-q35-5.1", NULL,
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index c9d194a5e7..d4c3d73c11 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -105,6 +105,7 @@ struct PCMachineClass {
>      int legacy_acpi_table_size;
>      unsigned acpi_data_size;
>      bool do_not_add_smb_acpi;
> +    int pci_root_uid;
>  
>      /* SMBIOS compat: */
>      bool smbios_defaults;



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i386/acpi: restore device paths for pre-5.1 vms
  2021-03-01 19:59 [PATCH] i386/acpi: restore device paths for pre-5.1 vms Vitaly Cheptsov
                   ` (2 preceding siblings ...)
  2021-03-02  8:48 ` Igor Mammedov
@ 2021-03-02 10:14 ` Michael S. Tsirkin
  2021-03-22 15:43 ` Michael S. Tsirkin
  4 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2021-03-02 10:14 UTC (permalink / raw)
  To: Vitaly Cheptsov; +Cc: Igor Mammedov, Thomas Lamprecht, qemu-devel, qemu-stable

On Mon, Mar 01, 2021 at 10:59:18PM +0300, Vitaly Cheptsov wrote:
> After fixing the _UID value for the primary PCI root bridge in
> af1b80ae it was discovered that this change updates Windows
> configuration in an incompatible way causing network configuration
> failure unless DHCP is used. More details provided on the list:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg08484.html
> 
> This change reverts the _UID update from 1 to 0 for q35 and i440fx
> VMs before version 5.2 to maintain the original behaviour when
> upgrading.
> 
> Cc: qemu-stable@nongnu.org
> Cc: qemu-devel@nongnu.org
> Reported-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Vitaly Cheptsov <cheptsov@ispras.ru>

Fixes: af1b80ae56c9 ("i386/acpi: fix inconsistent QEMU/OVMF device paths")

I'll pick it up now.


> ---
>  hw/i386/acpi-build.c | 4 ++--
>  hw/i386/pc_piix.c    | 2 ++
>  hw/i386/pc_q35.c     | 2 ++
>  include/hw/i386/pc.h | 1 +
>  4 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 31a5f6f4a5..442b4629a9 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1277,7 +1277,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>          dev = aml_device("PCI0");
>          aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
>          aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
> -        aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> +        aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
>          aml_append(sb_scope, dev);
>          aml_append(dsdt, sb_scope);
>  
> @@ -1296,7 +1296,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>          aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
>          aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
>          aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
> -        aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> +        aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
>          aml_append(dev, build_q35_osc_method());
>          aml_append(sb_scope, dev);
>          if (mcfg_valid) {
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index 2904b40163..46cc951073 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -405,6 +405,7 @@ static void pc_i440fx_machine_options(MachineClass *m)
>  {
>      PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
>      pcmc->default_nic_model = "e1000";
> +    pcmc->pci_root_uid = 0;
>  
>      m->family = "pc_piix";
>      m->desc = "Standard PC (i440FX + PIIX, 1996)";
> @@ -448,6 +449,7 @@ static void pc_i440fx_5_1_machine_options(MachineClass *m)
>      compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
>      compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
>      pcmc->kvmclock_create_always = false;
> +    pcmc->pci_root_uid = 1;
>  }
>  
>  DEFINE_I440FX_MACHINE(v5_1, "pc-i440fx-5.1", NULL,
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 0a212443aa..53450190f5 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -329,6 +329,7 @@ static void pc_q35_machine_options(MachineClass *m)
>  {
>      PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
>      pcmc->default_nic_model = "e1000e";
> +    pcmc->pci_root_uid = 0;
>  
>      m->family = "pc_q35";
>      m->desc = "Standard PC (Q35 + ICH9, 2009)";
> @@ -375,6 +376,7 @@ static void pc_q35_5_1_machine_options(MachineClass *m)
>      compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
>      compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
>      pcmc->kvmclock_create_always = false;
> +    pcmc->pci_root_uid = 1;
>  }
>  
>  DEFINE_Q35_MACHINE(v5_1, "pc-q35-5.1", NULL,
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index c9d194a5e7..d4c3d73c11 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -105,6 +105,7 @@ struct PCMachineClass {
>      int legacy_acpi_table_size;
>      unsigned acpi_data_size;
>      bool do_not_add_smb_acpi;
> +    int pci_root_uid;
>  
>      /* SMBIOS compat: */
>      bool smbios_defaults;
> -- 
> 2.24.3 (Apple Git-128)



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i386/acpi: restore device paths for pre-5.1 vms
  2021-03-01 19:59 [PATCH] i386/acpi: restore device paths for pre-5.1 vms Vitaly Cheptsov
                   ` (3 preceding siblings ...)
  2021-03-02 10:14 ` Michael S. Tsirkin
@ 2021-03-22 15:43 ` Michael S. Tsirkin
  2021-03-22 15:49   ` Vitaly Cheptsov
  4 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2021-03-22 15:43 UTC (permalink / raw)
  To: Vitaly Cheptsov; +Cc: Thomas Lamprecht, qemu-devel, qemu-stable

On Mon, Mar 01, 2021 at 10:59:18PM +0300, Vitaly Cheptsov wrote:
> After fixing the _UID value for the primary PCI root bridge in
> af1b80ae it was discovered that this change updates Windows
> configuration in an incompatible way causing network configuration
> failure unless DHCP is used. More details provided on the list:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg08484.html
> 
> This change reverts the _UID update from 1 to 0 for q35 and i440fx
> VMs before version 5.2 to maintain the original behaviour when
> upgrading.
> 
> Cc: qemu-stable@nongnu.org
> Cc: qemu-devel@nongnu.org
> Reported-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Vitaly Cheptsov <cheptsov@ispras.ru>
> ---
>  hw/i386/acpi-build.c | 4 ++--
>  hw/i386/pc_piix.c    | 2 ++
>  hw/i386/pc_q35.c     | 2 ++
>  include/hw/i386/pc.h | 1 +
>  4 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 31a5f6f4a5..442b4629a9 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1277,7 +1277,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>          dev = aml_device("PCI0");
>          aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
>          aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
> -        aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> +        aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
>          aml_append(sb_scope, dev);
>          aml_append(dsdt, sb_scope);
>  
> @@ -1296,7 +1296,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>          aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
>          aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
>          aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
> -        aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> +        aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
>          aml_append(dev, build_q35_osc_method());
>          aml_append(sb_scope, dev);
>          if (mcfg_valid) {
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index 2904b40163..46cc951073 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -405,6 +405,7 @@ static void pc_i440fx_machine_options(MachineClass *m)
>  {
>      PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
>      pcmc->default_nic_model = "e1000";
> +    pcmc->pci_root_uid = 0;
>  
>      m->family = "pc_piix";
>      m->desc = "Standard PC (i440FX + PIIX, 1996)";
> @@ -448,6 +449,7 @@ static void pc_i440fx_5_1_machine_options(MachineClass *m)
>      compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
>      compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
>      pcmc->kvmclock_create_always = false;
> +    pcmc->pci_root_uid = 1;
>  }
>  
>  DEFINE_I440FX_MACHINE(v5_1, "pc-i440fx-5.1", NULL,
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 0a212443aa..53450190f5 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -329,6 +329,7 @@ static void pc_q35_machine_options(MachineClass *m)
>  {
>      PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
>      pcmc->default_nic_model = "e1000e";
> +    pcmc->pci_root_uid = 0;
>  
>      m->family = "pc_q35";
>      m->desc = "Standard PC (Q35 + ICH9, 2009)";
> @@ -375,6 +376,7 @@ static void pc_q35_5_1_machine_options(MachineClass *m)
>      compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
>      compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
>      pcmc->kvmclock_create_always = false;
> +    pcmc->pci_root_uid = 1;
>  }
>  
>  DEFINE_Q35_MACHINE(v5_1, "pc-q35-5.1", NULL,
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index c9d194a5e7..d4c3d73c11 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -105,6 +105,7 @@ struct PCMachineClass {
>      int legacy_acpi_table_size;
>      unsigned acpi_data_size;
>      bool do_not_add_smb_acpi;
> +    int pci_root_uid;
>  
>      /* SMBIOS compat: */
>      bool smbios_defaults;

So this upstream, but I think we should also have a property
that allows people to have VMs installed with the old
machine type booted with a new machine type.

E.g.  "compat-pci-root-uid".

Vitaly could you take a look?


> -- 
> 2.24.3 (Apple Git-128)



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i386/acpi: restore device paths for pre-5.1 vms
  2021-03-22 15:43 ` Michael S. Tsirkin
@ 2021-03-22 15:49   ` Vitaly Cheptsov
  2021-03-23 14:48     ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Vitaly Cheptsov @ 2021-03-22 15:49 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Thomas Lamprecht, qemu-devel, qemu-stable

[-- Attachment #1: Type: text/plain, Size: 4854 bytes --]

Hi Michael,

That makes little sense in my opinion, these people can simply upgrade the VM version, which apparently looks the right way to do it in my opinion.

Best regards,
Vitaly

> 22 марта 2021 г., в 18:43, Michael S. Tsirkin <mst@redhat.com> написал(а):
> 
> On Mon, Mar 01, 2021 at 10:59:18PM +0300, Vitaly Cheptsov wrote:
>> After fixing the _UID value for the primary PCI root bridge in
>> af1b80ae it was discovered that this change updates Windows
>> configuration in an incompatible way causing network configuration
>> failure unless DHCP is used. More details provided on the list:
>> 
>> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg08484.html
>> 
>> This change reverts the _UID update from 1 to 0 for q35 and i440fx
>> VMs before version 5.2 to maintain the original behaviour when
>> upgrading.
>> 
>> Cc: qemu-stable@nongnu.org
>> Cc: qemu-devel@nongnu.org
>> Reported-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
>> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Vitaly Cheptsov <cheptsov@ispras.ru>
>> ---
>> hw/i386/acpi-build.c | 4 ++--
>> hw/i386/pc_piix.c    | 2 ++
>> hw/i386/pc_q35.c     | 2 ++
>> include/hw/i386/pc.h | 1 +
>> 4 files changed, 7 insertions(+), 2 deletions(-)
>> 
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index 31a5f6f4a5..442b4629a9 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -1277,7 +1277,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>         dev = aml_device("PCI0");
>>         aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
>>         aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
>> -        aml_append(dev, aml_name_decl("_UID", aml_int(0)));
>> +        aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
>>         aml_append(sb_scope, dev);
>>         aml_append(dsdt, sb_scope);
>> 
>> @@ -1296,7 +1296,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>         aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
>>         aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
>>         aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
>> -        aml_append(dev, aml_name_decl("_UID", aml_int(0)));
>> +        aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
>>         aml_append(dev, build_q35_osc_method());
>>         aml_append(sb_scope, dev);
>>         if (mcfg_valid) {
>> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
>> index 2904b40163..46cc951073 100644
>> --- a/hw/i386/pc_piix.c
>> +++ b/hw/i386/pc_piix.c
>> @@ -405,6 +405,7 @@ static void pc_i440fx_machine_options(MachineClass *m)
>> {
>>     PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
>>     pcmc->default_nic_model = "e1000";
>> +    pcmc->pci_root_uid = 0;
>> 
>>     m->family = "pc_piix";
>>     m->desc = "Standard PC (i440FX + PIIX, 1996)";
>> @@ -448,6 +449,7 @@ static void pc_i440fx_5_1_machine_options(MachineClass *m)
>>     compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
>>     compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
>>     pcmc->kvmclock_create_always = false;
>> +    pcmc->pci_root_uid = 1;
>> }
>> 
>> DEFINE_I440FX_MACHINE(v5_1, "pc-i440fx-5.1", NULL,
>> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
>> index 0a212443aa..53450190f5 100644
>> --- a/hw/i386/pc_q35.c
>> +++ b/hw/i386/pc_q35.c
>> @@ -329,6 +329,7 @@ static void pc_q35_machine_options(MachineClass *m)
>> {
>>     PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
>>     pcmc->default_nic_model = "e1000e";
>> +    pcmc->pci_root_uid = 0;
>> 
>>     m->family = "pc_q35";
>>     m->desc = "Standard PC (Q35 + ICH9, 2009)";
>> @@ -375,6 +376,7 @@ static void pc_q35_5_1_machine_options(MachineClass *m)
>>     compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
>>     compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
>>     pcmc->kvmclock_create_always = false;
>> +    pcmc->pci_root_uid = 1;
>> }
>> 
>> DEFINE_Q35_MACHINE(v5_1, "pc-q35-5.1", NULL,
>> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
>> index c9d194a5e7..d4c3d73c11 100644
>> --- a/include/hw/i386/pc.h
>> +++ b/include/hw/i386/pc.h
>> @@ -105,6 +105,7 @@ struct PCMachineClass {
>>     int legacy_acpi_table_size;
>>     unsigned acpi_data_size;
>>     bool do_not_add_smb_acpi;
>> +    int pci_root_uid;
>> 
>>     /* SMBIOS compat: */
>>     bool smbios_defaults;
> 
> So this upstream, but I think we should also have a property
> that allows people to have VMs installed with the old
> machine type booted with a new machine type.
> 
> E.g.  "compat-pci-root-uid".
> 
> Vitaly could you take a look?
> 
> 
>> --
>> 2.24.3 (Apple Git-128)
> 


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i386/acpi: restore device paths for pre-5.1 vms
  2021-03-22 15:49   ` Vitaly Cheptsov
@ 2021-03-23 14:48     ` Michael S. Tsirkin
  2021-03-23 14:55       ` Vitaly Cheptsov
  0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2021-03-23 14:48 UTC (permalink / raw)
  To: Vitaly Cheptsov; +Cc: Thomas Lamprecht, qemu-devel, qemu-stable

The issue is with people who installed a VM using 5.1 qemu,
migrated to 5.2, booted there and set a config on a device
e.g. IP on a NIC.
They now have a 5.1 machine type but changing uid back
like we do will break these VMs.

Unlikley to be common but let's at least create a way for these people
to used these VMs.


On Mon, Mar 22, 2021 at 06:49:09PM +0300, Vitaly Cheptsov wrote:
> Hi Michael,
> 
> That makes little sense in my opinion, these people can simply upgrade the VM version, which apparently looks the right way to do it in my opinion.
> 
> Best regards,
> Vitaly
> 
> > 22 марта 2021 г., в 18:43, Michael S. Tsirkin <mst@redhat.com> написал(а):
> > 
> > On Mon, Mar 01, 2021 at 10:59:18PM +0300, Vitaly Cheptsov wrote:
> >> After fixing the _UID value for the primary PCI root bridge in
> >> af1b80ae it was discovered that this change updates Windows
> >> configuration in an incompatible way causing network configuration
> >> failure unless DHCP is used. More details provided on the list:
> >> 
> >> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg08484.html
> >> 
> >> This change reverts the _UID update from 1 to 0 for q35 and i440fx
> >> VMs before version 5.2 to maintain the original behaviour when
> >> upgrading.
> >> 
> >> Cc: qemu-stable@nongnu.org
> >> Cc: qemu-devel@nongnu.org
> >> Reported-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
> >> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> >> Signed-off-by: Vitaly Cheptsov <cheptsov@ispras.ru>
> >> ---
> >> hw/i386/acpi-build.c | 4 ++--
> >> hw/i386/pc_piix.c    | 2 ++
> >> hw/i386/pc_q35.c     | 2 ++
> >> include/hw/i386/pc.h | 1 +
> >> 4 files changed, 7 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> >> index 31a5f6f4a5..442b4629a9 100644
> >> --- a/hw/i386/acpi-build.c
> >> +++ b/hw/i386/acpi-build.c
> >> @@ -1277,7 +1277,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> >>         dev = aml_device("PCI0");
> >>         aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
> >>         aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
> >> -        aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> >> +        aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
> >>         aml_append(sb_scope, dev);
> >>         aml_append(dsdt, sb_scope);
> >> 
> >> @@ -1296,7 +1296,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> >>         aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
> >>         aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
> >>         aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
> >> -        aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> >> +        aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
> >>         aml_append(dev, build_q35_osc_method());
> >>         aml_append(sb_scope, dev);
> >>         if (mcfg_valid) {
> >> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> >> index 2904b40163..46cc951073 100644
> >> --- a/hw/i386/pc_piix.c
> >> +++ b/hw/i386/pc_piix.c
> >> @@ -405,6 +405,7 @@ static void pc_i440fx_machine_options(MachineClass *m)
> >> {
> >>     PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
> >>     pcmc->default_nic_model = "e1000";
> >> +    pcmc->pci_root_uid = 0;
> >> 
> >>     m->family = "pc_piix";
> >>     m->desc = "Standard PC (i440FX + PIIX, 1996)";
> >> @@ -448,6 +449,7 @@ static void pc_i440fx_5_1_machine_options(MachineClass *m)
> >>     compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
> >>     compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
> >>     pcmc->kvmclock_create_always = false;
> >> +    pcmc->pci_root_uid = 1;
> >> }
> >> 
> >> DEFINE_I440FX_MACHINE(v5_1, "pc-i440fx-5.1", NULL,
> >> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> >> index 0a212443aa..53450190f5 100644
> >> --- a/hw/i386/pc_q35.c
> >> +++ b/hw/i386/pc_q35.c
> >> @@ -329,6 +329,7 @@ static void pc_q35_machine_options(MachineClass *m)
> >> {
> >>     PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
> >>     pcmc->default_nic_model = "e1000e";
> >> +    pcmc->pci_root_uid = 0;
> >> 
> >>     m->family = "pc_q35";
> >>     m->desc = "Standard PC (Q35 + ICH9, 2009)";
> >> @@ -375,6 +376,7 @@ static void pc_q35_5_1_machine_options(MachineClass *m)
> >>     compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
> >>     compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
> >>     pcmc->kvmclock_create_always = false;
> >> +    pcmc->pci_root_uid = 1;
> >> }
> >> 
> >> DEFINE_Q35_MACHINE(v5_1, "pc-q35-5.1", NULL,
> >> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> >> index c9d194a5e7..d4c3d73c11 100644
> >> --- a/include/hw/i386/pc.h
> >> +++ b/include/hw/i386/pc.h
> >> @@ -105,6 +105,7 @@ struct PCMachineClass {
> >>     int legacy_acpi_table_size;
> >>     unsigned acpi_data_size;
> >>     bool do_not_add_smb_acpi;
> >> +    int pci_root_uid;
> >> 
> >>     /* SMBIOS compat: */
> >>     bool smbios_defaults;
> > 
> > So this upstream, but I think we should also have a property
> > that allows people to have VMs installed with the old
> > machine type booted with a new machine type.
> > 
> > E.g.  "compat-pci-root-uid".
> > 
> > Vitaly could you take a look?
> > 
> > 
> >> --
> >> 2.24.3 (Apple Git-128)
> > 
> 




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i386/acpi: restore device paths for pre-5.1 vms
  2021-03-23 14:48     ` Michael S. Tsirkin
@ 2021-03-23 14:55       ` Vitaly Cheptsov
  2021-03-23 15:04         ` Thomas Lamprecht
  0 siblings, 1 reply; 20+ messages in thread
From: Vitaly Cheptsov @ 2021-03-23 14:55 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Thomas Lamprecht, qemu devel list, qemu-stable

[-- Attachment #1: Type: text/plain, Size: 5873 bytes --]

They can simply set the 5.2 VM version in such a case. I do not want to let this legacy hack to be enabled in any modern QEMU VM version, as it violates ACPI specification and makes the life more difficult for various other software like bootloaders and operating systems.

> 23 марта 2021 г., в 17:48, Michael S. Tsirkin <mst@redhat.com> написал(а):
> 
> The issue is with people who installed a VM using 5.1 qemu,
> migrated to 5.2, booted there and set a config on a device
> e.g. IP on a NIC.
> They now have a 5.1 machine type but changing uid back
> like we do will break these VMs.
> 
> Unlikley to be common but let's at least create a way for these people
> to used these VMs.
> 
> 
> On Mon, Mar 22, 2021 at 06:49:09PM +0300, Vitaly Cheptsov wrote:
>> Hi Michael,
>> 
>> That makes little sense in my opinion, these people can simply upgrade the VM version, which apparently looks the right way to do it in my opinion.
>> 
>> Best regards,
>> Vitaly
>> 
>>> 22 марта 2021 г., в 18:43, Michael S. Tsirkin <mst@redhat.com> написал(а):
>>> 
>>> On Mon, Mar 01, 2021 at 10:59:18PM +0300, Vitaly Cheptsov wrote:
>>>> After fixing the _UID value for the primary PCI root bridge in
>>>> af1b80ae it was discovered that this change updates Windows
>>>> configuration in an incompatible way causing network configuration
>>>> failure unless DHCP is used. More details provided on the list:
>>>> 
>>>> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg08484.html
>>>> 
>>>> This change reverts the _UID update from 1 to 0 for q35 and i440fx
>>>> VMs before version 5.2 to maintain the original behaviour when
>>>> upgrading.
>>>> 
>>>> Cc: qemu-stable@nongnu.org
>>>> Cc: qemu-devel@nongnu.org
>>>> Reported-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
>>>> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
>>>> Signed-off-by: Vitaly Cheptsov <cheptsov@ispras.ru>
>>>> ---
>>>> hw/i386/acpi-build.c | 4 ++--
>>>> hw/i386/pc_piix.c    | 2 ++
>>>> hw/i386/pc_q35.c     | 2 ++
>>>> include/hw/i386/pc.h | 1 +
>>>> 4 files changed, 7 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>>>> index 31a5f6f4a5..442b4629a9 100644
>>>> --- a/hw/i386/acpi-build.c
>>>> +++ b/hw/i386/acpi-build.c
>>>> @@ -1277,7 +1277,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>>>        dev = aml_device("PCI0");
>>>>        aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
>>>>        aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
>>>> -        aml_append(dev, aml_name_decl("_UID", aml_int(0)));
>>>> +        aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
>>>>        aml_append(sb_scope, dev);
>>>>        aml_append(dsdt, sb_scope);
>>>> 
>>>> @@ -1296,7 +1296,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>>>        aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
>>>>        aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
>>>>        aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
>>>> -        aml_append(dev, aml_name_decl("_UID", aml_int(0)));
>>>> +        aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
>>>>        aml_append(dev, build_q35_osc_method());
>>>>        aml_append(sb_scope, dev);
>>>>        if (mcfg_valid) {
>>>> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
>>>> index 2904b40163..46cc951073 100644
>>>> --- a/hw/i386/pc_piix.c
>>>> +++ b/hw/i386/pc_piix.c
>>>> @@ -405,6 +405,7 @@ static void pc_i440fx_machine_options(MachineClass *m)
>>>> {
>>>>    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
>>>>    pcmc->default_nic_model = "e1000";
>>>> +    pcmc->pci_root_uid = 0;
>>>> 
>>>>    m->family = "pc_piix";
>>>>    m->desc = "Standard PC (i440FX + PIIX, 1996)";
>>>> @@ -448,6 +449,7 @@ static void pc_i440fx_5_1_machine_options(MachineClass *m)
>>>>    compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
>>>>    compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
>>>>    pcmc->kvmclock_create_always = false;
>>>> +    pcmc->pci_root_uid = 1;
>>>> }
>>>> 
>>>> DEFINE_I440FX_MACHINE(v5_1, "pc-i440fx-5.1", NULL,
>>>> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
>>>> index 0a212443aa..53450190f5 100644
>>>> --- a/hw/i386/pc_q35.c
>>>> +++ b/hw/i386/pc_q35.c
>>>> @@ -329,6 +329,7 @@ static void pc_q35_machine_options(MachineClass *m)
>>>> {
>>>>    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
>>>>    pcmc->default_nic_model = "e1000e";
>>>> +    pcmc->pci_root_uid = 0;
>>>> 
>>>>    m->family = "pc_q35";
>>>>    m->desc = "Standard PC (Q35 + ICH9, 2009)";
>>>> @@ -375,6 +376,7 @@ static void pc_q35_5_1_machine_options(MachineClass *m)
>>>>    compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
>>>>    compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
>>>>    pcmc->kvmclock_create_always = false;
>>>> +    pcmc->pci_root_uid = 1;
>>>> }
>>>> 
>>>> DEFINE_Q35_MACHINE(v5_1, "pc-q35-5.1", NULL,
>>>> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
>>>> index c9d194a5e7..d4c3d73c11 100644
>>>> --- a/include/hw/i386/pc.h
>>>> +++ b/include/hw/i386/pc.h
>>>> @@ -105,6 +105,7 @@ struct PCMachineClass {
>>>>    int legacy_acpi_table_size;
>>>>    unsigned acpi_data_size;
>>>>    bool do_not_add_smb_acpi;
>>>> +    int pci_root_uid;
>>>> 
>>>>    /* SMBIOS compat: */
>>>>    bool smbios_defaults;
>>> 
>>> So this upstream, but I think we should also have a property
>>> that allows people to have VMs installed with the old
>>> machine type booted with a new machine type.
>>> 
>>> E.g.  "compat-pci-root-uid".
>>> 
>>> Vitaly could you take a look?
>>> 
>>> 
>>>> --
>>>> 2.24.3 (Apple Git-128)
>>> 
>> 
> 
> 


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i386/acpi: restore device paths for pre-5.1 vms
  2021-03-23 14:55       ` Vitaly Cheptsov
@ 2021-03-23 15:04         ` Thomas Lamprecht
  2021-03-23 16:54           ` Ways to deal with broken machine types Igor Mammedov
  2021-03-23 19:32           ` [PATCH] i386/acpi: restore device paths for pre-5.1 vms Michael S. Tsirkin
  0 siblings, 2 replies; 20+ messages in thread
From: Thomas Lamprecht @ 2021-03-23 15:04 UTC (permalink / raw)
  To: Vitaly Cheptsov, Michael S. Tsirkin; +Cc: qemu devel list, qemu-stable

On 23.03.21 15:55, Vitaly Cheptsov wrote:
>> 23 марта 2021 г., в 17:48, Michael S. Tsirkin <mst@redhat.com> написал(а):
>>
>> The issue is with people who installed a VM using 5.1 qemu,
>> migrated to 5.2, booted there and set a config on a device
>> e.g. IP on a NIC.
>> They now have a 5.1 machine type but changing uid back
>> like we do will break these VMs.
>>
>> Unlikley to be common but let's at least create a way for these people
>> to used these VMs.
>>
> They can simply set the 5.2 VM version in such a case. I do not want to 
let this legacy hack to be enabled in any modern QEMU VM version, as it violates ACPI specification and makes the life more difficult for various other software like bootloaders and operating systems.

Yeah here I agree with Vitaly, if they already used 5.2 and made some configurations
for those "new" devices they can just keep using 5.2?

If some of the devices got configured on 5.1 and some on 5.2 there's nothing we can
do anyway, from a QEMU POV - there the user always need to choose one machine version
and fix up the device configured while on the other machine.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Ways to deal with broken machine types
  2021-03-23 15:04         ` Thomas Lamprecht
@ 2021-03-23 16:54           ` Igor Mammedov
  2021-03-23 17:40             ` Daniel P. Berrangé
  2021-03-23 19:32           ` [PATCH] i386/acpi: restore device paths for pre-5.1 vms Michael S. Tsirkin
  1 sibling, 1 reply; 20+ messages in thread
From: Igor Mammedov @ 2021-03-23 16:54 UTC (permalink / raw)
  To: Thomas Lamprecht
  Cc: Peter Maydell, Daniel P. Berrangé,
	Michael S. Tsirkin, libvir-list, qemu devel list,
	Vitaly Cheptsov, Paolo Bonzini

On Tue, 23 Mar 2021 16:04:11 +0100
Thomas Lamprecht <t.lamprecht@proxmox.com> wrote:

> On 23.03.21 15:55, Vitaly Cheptsov wrote:
> >> 23 марта 2021 г., в 17:48, Michael S. Tsirkin <mst@redhat.com> написал(а):
> >>
> >> The issue is with people who installed a VM using 5.1 qemu,
> >> migrated to 5.2, booted there and set a config on a device
> >> e.g. IP on a NIC.
> >> They now have a 5.1 machine type but changing uid back
> >> like we do will break these VMs.
> >>
> >> Unlikley to be common but let's at least create a way for these people
> >> to used these VMs.
> >>  
> > They can simply set the 5.2 VM version in such a case. I do not want to   
> let this legacy hack to be enabled in any modern QEMU VM version, as it violates ACPI specification and makes the life more difficult for various other software like bootloaders and operating systems.
> 
> Yeah here I agree with Vitaly, if they already used 5.2 and made some configurations
> for those "new" devices they can just keep using 5.2?
> 
> If some of the devices got configured on 5.1 and some on 5.2 there's nothing we can
> do anyway, from a QEMU POV - there the user always need to choose one machine version
> and fix up the device configured while on the other machine.

According to testing it appears that issue affects virtio drivers so it may lead to
failure to boot guest (and there was at least 1 report about virtio-scsi being affected).

Let me hijack this thread for beyond this case scope.

I agree that for this particular bug we've done all we could, but
there is broader issue to discuss here.

We have machine versions to deal with hw compatibility issues and that covers most of the cases,
but occasionally we notice problem well after release(s),
so users may be stuck with broken VM and need to manually fix configuration (and/or VM).
Figuring out what's wrong and how to fix it is far from trivial. So lets discuss if we
can help to ease this pain, yes it will be late for first victims but it's still
better than never.

I'll try to sum up idea Michael suggested (here comes my unorganized brain-dump),

1. We can keep in VM's config QEMU version it was created on
   and as minimum warn user with a pointer to known issues if version in
   config mismatches version of actually used QEMU, with a knob to silence
   it for particular mismatch.

When an issue becomes know and resolved we know for sure how and what
changed and embed instructions on what options to use for fixing up VM's
config to preserve old HW config depending on QEMU version VM was installed on.

some more ideas:
   2. let mgmt layer to keep fixup list and apply them to config if available
       (user would need to upgrade mgmt or update fixup list somehow)
   3. let mgmt layer to pass VM's QEMU version to currently used QEMU, so
      that QEMU could maintain and apply fixups based on QEMU version + machine type.
      The user will have to upgrade to newer QEMU to get/use new fixups.

In my opinion both would lead to explosion of 'possibly needed' properties for each
change we introduce in hw/firmware(read ACPI) and very possibly a lot of conditional
branches in QEMU code. And I'm afraid it will become hard to maintain QEMU =>
more bugs in future.
Also it will lead to explosion of test matrix for downstreams who care about testing.

If we proactively gate changes on properties, we can just update fixup lists in mgmt,
without need to update QEMU (aka Insite rules) at a cost of complexity on QMEU side.

Alternatively we can be conservative in spawning new properties, that means creating
them only when issue is fixed and require users to update QEMU, so that fixups could
be applied to VM.

Feel free to shoot the messenger down or suggest ways how we can deal with the problem.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ways to deal with broken machine types
  2021-03-23 16:54           ` Ways to deal with broken machine types Igor Mammedov
@ 2021-03-23 17:40             ` Daniel P. Berrangé
  2021-03-23 19:40               ` Michael S. Tsirkin
  2021-03-26  0:48               ` Igor Mammedov
  0 siblings, 2 replies; 20+ messages in thread
From: Daniel P. Berrangé @ 2021-03-23 17:40 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Peter Maydell, Michael S. Tsirkin, libvir-list, qemu devel list,
	Vitaly Cheptsov, Paolo Bonzini, Thomas Lamprecht

On Tue, Mar 23, 2021 at 05:54:47PM +0100, Igor Mammedov wrote:
> Let me hijack this thread for beyond this case scope.
> 
> I agree that for this particular bug we've done all we could, but
> there is broader issue to discuss here.
> 
> We have machine versions to deal with hw compatibility issues and that covers most of the cases,
> but occasionally we notice problem well after release(s),
> so users may be stuck with broken VM and need to manually fix configuration (and/or VM).
> Figuring out what's wrong and how to fix it is far from trivial. So lets discuss if we
> can help to ease this pain, yes it will be late for first victims but it's still
> better than never.

To summarize the problem situation

 - We rely on a machine type version to encode a precise guest ABI.
 - Due a bug, we are in a situation where the same machine type
   encodes two distinct guest ABIs due to a mistake introduced
   betwen QEMU N-2 and N-1
 - We want to fix the bug in QEMU N
 - For incoming migration there is no way to distinguish between
   the ABIs used in N-2 and N-1, to pick the right one

So we're left with an unwinnable problem:

  - Not fixing the bug =>

       a) user migrating N-2 to N-1 have ABI change
       b) user migrating N-2 to N have ABI change
       c) user migrating N-1 to N are fine

    No mitigation for (a) or (b)

  - Fixing the bug =>

       a) user migrating N-2 to N-1 have ABI change.
       b) user migrating N-2 to N are fine
       c) user migrating N-1 to N have ABI change

    Bad situations (a) and (c) are mitigated by
    backporting fix to N-1-stable too.

Generally we have preferred to fix the bug, because we have
usually identified them fairly quickly after release, and
backporting the fix to stable has been sufficient mitigation
against ill effects. Basically the people left broken are a
relatively small set out of the total userbase.

The real challenge arises when we are slow to identify the
problem, such that we have a large number of people impacted.


> I'll try to sum up idea Michael suggested (here comes my unorganized brain-dump),
> 
> 1. We can keep in VM's config QEMU version it was created on
>    and as minimum warn user with a pointer to known issues if version in
>    config mismatches version of actually used QEMU, with a knob to silence
>    it for particular mismatch.
> 
> When an issue becomes know and resolved we know for sure how and what
> changed and embed instructions on what options to use for fixing up VM's
> config to preserve old HW config depending on QEMU version VM was installed on.

> some more ideas:
>    2. let mgmt layer to keep fixup list and apply them to config if available
>        (user would need to upgrade mgmt or update fixup list somehow)
>    3. let mgmt layer to pass VM's QEMU version to currently used QEMU, so
>       that QEMU could maintain and apply fixups based on QEMU version + machine type.
>       The user will have to upgrade to newer QEMU to get/use new fixups.

The nice thing about machine type versioning is that we are treating the
versions as opaque strings which represent a specific ABI, regardless of
the QEMU version. This means that even if distros backport fixes for bugs
or even new features, the machine type compatibility check remains a
simple equality comparsion.

As soon as you introduce the QEMU version though, we have created a
large matrix for compatibility. This matrix is expanded if a distro
chooses to backport fixes for any of the machine type bugs to their
stable streams. This can get particularly expensive when there are
multiple streams a distro is maintaining.

*IF* the original N-1 qemu has a property that could be queried by
the mgmt app to identify a machine type bug, then we could potentially
apply a fixup automatically.

eg query-machines command in QEMU version N could report against
"pc-i440fx-5.0", that there was a regression fix that has to be
applied if property "foo" had value "bar".

Now, the mgmt app wants to migrate from QEMU N-2 or N-1 to QEMU N.
It can query the value of "foo" on the source QEMU with qom-get.
It now knows whether it has to override this property "foo" when
spawning QEMU N on the target host.

Of course this doesn't help us if neither N-1 or N-2 QEMU had a
property that can be queried to identify the bug - ie if the
property in question was newly introduced in QEMU N to fix the
bug.

> In my opinion both would lead to explosion of 'possibly needed' properties for each
> change we introduce in hw/firmware(read ACPI) and very possibly a lot of conditional
> branches in QEMU code. And I'm afraid it will become hard to maintain QEMU =>
> more bugs in future.
> Also it will lead to explosion of test matrix for downstreams who care about testing.
> 
> If we proactively gate changes on properties, we can just update fixup lists in mgmt,
> without need to update QEMU (aka Insite rules) at a cost of complexity on QMEU side.
> 
> Alternatively we can be conservative in spawning new properties, that means creating
> them only when issue is fixed and require users to update QEMU, so that fixups could
> be applied to VM.
> 
> Feel free to shoot the messenger down or suggest ways how we can deal with the problem.

The best solution is of course to not have introduced the ABI change in
the first place. We have lots of testing, but upstream at least, I don't
think we have anything that is explicitly recording the ABI associated
with each machine type and validating that it hasn't changed. We rely on
the developers to follow the coding practices wrt setting machine type
defaults for back compat, and while we're good, we inevitably screw up
every now & then.

Downstreams do have some of this ABI testing - several problems like the
one we have there, have been identified when RHEL downstream QE did
migration tests and found a change in RHEL machine types, which then
was traced back to upstream.

I feel like we need some standard tool which can be run inside a VM
that dumps all the possible ABI relevant information about the virtual
machine in a nice data format.

We would have to run this for each machine type, and save the
results to git immediately after release. Then for every change to
master, we would have to run the test again for every historic
machine type version and compare to the recorded ABI record.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i386/acpi: restore device paths for pre-5.1 vms
  2021-03-23 15:04         ` Thomas Lamprecht
  2021-03-23 16:54           ` Ways to deal with broken machine types Igor Mammedov
@ 2021-03-23 19:32           ` Michael S. Tsirkin
  2021-03-26  0:12             ` Igor Mammedov
  1 sibling, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2021-03-23 19:32 UTC (permalink / raw)
  To: Thomas Lamprecht; +Cc: Vitaly Cheptsov, qemu devel list, qemu-stable

On Tue, Mar 23, 2021 at 04:04:11PM +0100, Thomas Lamprecht wrote:
> On 23.03.21 15:55, Vitaly Cheptsov wrote:
> >> 23 марта 2021 г., в 17:48, Michael S. Tsirkin <mst@redhat.com> написал(а):
> >>
> >> The issue is with people who installed a VM using 5.1 qemu,
> >> migrated to 5.2, booted there and set a config on a device
> >> e.g. IP on a NIC.
> >> They now have a 5.1 machine type but changing uid back
> >> like we do will break these VMs.
> >>
> >> Unlikley to be common but let's at least create a way for these people
> >> to used these VMs.
> >>
> > They can simply set the 5.2 VM version in such a case. I do not want to 
> let this legacy hack to be enabled in any modern QEMU VM version, as it violates ACPI specification and makes the life more difficult for various other software like bootloaders and operating systems.
> 
> Yeah here I agree with Vitaly, if they already used 5.2 and made some configurations
> for those "new" devices they can just keep using 5.2?

People just create a VM, the machine type gets set in stone then.

> If some of the devices got configured on 5.1 and some on 5.2 there's nothing we can
> do anyway, from a QEMU POV - there the user always need to choose one machine version
> and fix up the device configured while on the other machine.

I think I did not explain it well. Here is an example.


1. start qemu version 5.1 with machine type 5.1

2. reboot with qemu version 5.2 still machine type 5.1

3. set static IP on NIC

4. reboot with qemu version 6.0 still machine type 5.1

   at this point NIC loses the static IP.



What I suggest is that we use a property to at stage 4,
user can set a flag and avoid losing the static IP.
Yes in theory maybe at stage 2 or 4 user can switch to 5.2 machine type
but we were always telling people not to do that, let's not start.

-- 
MST



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ways to deal with broken machine types
  2021-03-23 17:40             ` Daniel P. Berrangé
@ 2021-03-23 19:40               ` Michael S. Tsirkin
  2021-03-30 11:21                 ` David Edmondson
  2021-03-26  0:48               ` Igor Mammedov
  1 sibling, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2021-03-23 19:40 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Peter Maydell, libvir-list, qemu devel list, Vitaly Cheptsov,
	Paolo Bonzini, Igor Mammedov, Thomas Lamprecht

On Tue, Mar 23, 2021 at 05:40:36PM +0000, Daniel P. Berrangé wrote:
> On Tue, Mar 23, 2021 at 05:54:47PM +0100, Igor Mammedov wrote:
> > Let me hijack this thread for beyond this case scope.
> > 
> > I agree that for this particular bug we've done all we could, but
> > there is broader issue to discuss here.
> > 
> > We have machine versions to deal with hw compatibility issues and that covers most of the cases,
> > but occasionally we notice problem well after release(s),
> > so users may be stuck with broken VM and need to manually fix configuration (and/or VM).
> > Figuring out what's wrong and how to fix it is far from trivial. So lets discuss if we
> > can help to ease this pain, yes it will be late for first victims but it's still
> > better than never.
> 
> To summarize the problem situation
> 
>  - We rely on a machine type version to encode a precise guest ABI.
>  - Due a bug, we are in a situation where the same machine type
>    encodes two distinct guest ABIs due to a mistake introduced
>    betwen QEMU N-2 and N-1
>  - We want to fix the bug in QEMU N
>  - For incoming migration there is no way to distinguish between
>    the ABIs used in N-2 and N-1, to pick the right one


Not just incoming migration. Same applies to a guest restart.


> So we're left with an unwinnable problem:
> 
>   - Not fixing the bug =>
> 
>        a) user migrating N-2 to N-1 have ABI change
>        b) user migrating N-2 to N have ABI change
>        c) user migrating N-1 to N are fine
> 
>     No mitigation for (a) or (b)
> 
>   - Fixing the bug =>
> 
>        a) user migrating N-2 to N-1 have ABI change.
>        b) user migrating N-2 to N are fine
>        c) user migrating N-1 to N have ABI change
> 
>     Bad situations (a) and (c) are mitigated by
>     backporting fix to N-1-stable too.
> 
> Generally we have preferred to fix the bug, because we have
> usually identified them fairly quickly after release, and
> backporting the fix to stable has been sufficient mitigation
> against ill effects. Basically the people left broken are a
> relatively small set out of the total userbase.
> 
> The real challenge arises when we are slow to identify the
> problem, such that we have a large number of people impacted.
> 
> 
> > I'll try to sum up idea Michael suggested (here comes my unorganized brain-dump),
> > 
> > 1. We can keep in VM's config QEMU version it was created on
> >    and as minimum warn user with a pointer to known issues if version in
> >    config mismatches version of actually used QEMU, with a knob to silence
> >    it for particular mismatch.
> > 
> > When an issue becomes know and resolved we know for sure how and what
> > changed and embed instructions on what options to use for fixing up VM's
> > config to preserve old HW config depending on QEMU version VM was installed on.
> 
> > some more ideas:
> >    2. let mgmt layer to keep fixup list and apply them to config if available
> >        (user would need to upgrade mgmt or update fixup list somehow)
> >    3. let mgmt layer to pass VM's QEMU version to currently used QEMU, so
> >       that QEMU could maintain and apply fixups based on QEMU version + machine type.
> >       The user will have to upgrade to newer QEMU to get/use new fixups.
> 
> The nice thing about machine type versioning is that we are treating the
> versions as opaque strings which represent a specific ABI, regardless of
> the QEMU version. This means that even if distros backport fixes for bugs
> or even new features, the machine type compatibility check remains a
> simple equality comparsion.
> 
> As soon as you introduce the QEMU version though, we have created a
> large matrix for compatibility.


Yes but. If we explicitly handle them all the same then
mechanically testing them all is an overkill.
We just need to test the ones that have bugs which we
care about fixing.


> This matrix is expanded if a distro
> chooses to backport fixes for any of the machine type bugs to their
> stable streams. This can get particularly expensive when there are
> multiple streams a distro is maintaining.
> 
> *IF* the original N-1 qemu has a property that could be queried by
> the mgmt app to identify a machine type bug, then we could potentially
> apply a fixup automatically.
> 
> eg query-machines command in QEMU version N could report against
> "pc-i440fx-5.0", that there was a regression fix that has to be
> applied if property "foo" had value "bar".
> 
> Now, the mgmt app wants to migrate from QEMU N-2 or N-1 to QEMU N.
> It can query the value of "foo" on the source QEMU with qom-get.
> It now knows whether it has to override this property "foo" when
> spawning QEMU N on the target host.
> 
> Of course this doesn't help us if neither N-1 or N-2 QEMU had a
> property that can be queried to identify the bug - ie if the
> property in question was newly introduced in QEMU N to fix the
> bug.
> 
> > In my opinion both would lead to explosion of 'possibly needed' properties for each
> > change we introduce in hw/firmware(read ACPI) and very possibly a lot of conditional
> > branches in QEMU code. And I'm afraid it will become hard to maintain QEMU =>
> > more bugs in future.
> > Also it will lead to explosion of test matrix for downstreams who care about testing.
> > 
> > If we proactively gate changes on properties, we can just update fixup lists in mgmt,
> > without need to update QEMU (aka Insite rules) at a cost of complexity on QMEU side.
> > 
> > Alternatively we can be conservative in spawning new properties, that means creating
> > them only when issue is fixed and require users to update QEMU, so that fixups could
> > be applied to VM.
> > 
> > Feel free to shoot the messenger down or suggest ways how we can deal with the problem.
> 
> The best solution is of course to not have introduced the ABI change in
> the first place. We have lots of testing, but upstream at least, I don't
> think we have anything that is explicitly recording the ABI associated
> with each machine type and validating that it hasn't changed. We rely on
> the developers to follow the coding practices wrt setting machine type
> defaults for back compat, and while we're good, we inevitably screw up
> every now & then.
> 
> Downstreams do have some of this ABI testing - several problems like the
> one we have there, have been identified when RHEL downstream QE did
> migration tests and found a change in RHEL machine types, which then
> was traced back to upstream.
> 
> I feel like we need some standard tool which can be run inside a VM
> that dumps all the possible ABI relevant information about the virtual
> machine in a nice data format.
> 
> We would have to run this for each machine type, and save the
> results to git immediately after release. Then for every change to
> master, we would have to run the test again for every historic
> machine type version and compare to the recorded ABI record.
> 
> Regards,
> Daniel


Unfortunately I do not think this is practical :(.

All examples of breakage I am aware of, we did not
realise some part of interface was part of guest ABI
and unsafe to change. We simply would not know to write a
test for it.

> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i386/acpi: restore device paths for pre-5.1 vms
  2021-03-23 19:32           ` [PATCH] i386/acpi: restore device paths for pre-5.1 vms Michael S. Tsirkin
@ 2021-03-26  0:12             ` Igor Mammedov
  0 siblings, 0 replies; 20+ messages in thread
From: Igor Mammedov @ 2021-03-26  0:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-stable, Vitaly Cheptsov, Thomas Lamprecht, qemu devel list

On Tue, 23 Mar 2021 15:32:23 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Mar 23, 2021 at 04:04:11PM +0100, Thomas Lamprecht wrote:
> > On 23.03.21 15:55, Vitaly Cheptsov wrote:  
> > >> 23 марта 2021 г., в 17:48, Michael S. Tsirkin <mst@redhat.com> написал(а):
> > >>
> > >> The issue is with people who installed a VM using 5.1 qemu,
> > >> migrated to 5.2, booted there and set a config on a device
> > >> e.g. IP on a NIC.
> > >> They now have a 5.1 machine type but changing uid back
> > >> like we do will break these VMs.
> > >>
> > >> Unlikley to be common but let's at least create a way for these people
> > >> to used these VMs.
> > >>  
> > > They can simply set the 5.2 VM version in such a case. I do not want to   
> > let this legacy hack to be enabled in any modern QEMU VM version, as it violates ACPI specification and makes the life more difficult for various other software like bootloaders and operating systems.
> > 
> > Yeah here I agree with Vitaly, if they already used 5.2 and made some configurations
> > for those "new" devices they can just keep using 5.2?  
> 
> People just create a VM, the machine type gets set in stone then.
> 
> > If some of the devices got configured on 5.1 and some on 5.2 there's nothing we can
> > do anyway, from a QEMU POV - there the user always need to choose one machine version
> > and fix up the device configured while on the other machine.  
> 
> I think I did not explain it well. Here is an example.
> 
> 
> 1. start qemu version 5.1 with machine type 5.1
> 
> 2. reboot with qemu version 5.2 still machine type 5.1
> 
> 3. set static IP on NIC
> 
> 4. reboot with qemu version 6.0 still machine type 5.1
> 
>    at this point NIC loses the static IP.
> 
> 
> 
> What I suggest is that we use a property to at stage 4,

Question is how users would know about fixup property, nobody knows about it
beside new QEMU.
Or even better how to make it transparent to user,
so the only action user would have to do is to upgrade to the newest QEMU.

> user can set a flag and avoid losing the static IP.
> Yes in theory maybe at stage 2 or 4 user can switch to 5.2 machine type
> but we were always telling people not to do that, let's not start.
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ways to deal with broken machine types
  2021-03-23 17:40             ` Daniel P. Berrangé
  2021-03-23 19:40               ` Michael S. Tsirkin
@ 2021-03-26  0:48               ` Igor Mammedov
  2021-03-29 14:46                 ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 20+ messages in thread
From: Igor Mammedov @ 2021-03-26  0:48 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Peter Maydell, Michael S. Tsirkin, libvir-list, qemu devel list,
	Vitaly Cheptsov, Paolo Bonzini, Thomas Lamprecht

On Tue, 23 Mar 2021 17:40:36 +0000
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Tue, Mar 23, 2021 at 05:54:47PM +0100, Igor Mammedov wrote:
> > Let me hijack this thread for beyond this case scope.
> > 
> > I agree that for this particular bug we've done all we could, but
> > there is broader issue to discuss here.
> > 
> > We have machine versions to deal with hw compatibility issues and that covers most of the cases,
> > but occasionally we notice problem well after release(s),
> > so users may be stuck with broken VM and need to manually fix configuration (and/or VM).
> > Figuring out what's wrong and how to fix it is far from trivial. So lets discuss if we
> > can help to ease this pain, yes it will be late for first victims but it's still
> > better than never.  
> 
> To summarize the problem situation
> 
>  - We rely on a machine type version to encode a precise guest ABI.
>  - Due a bug, we are in a situation where the same machine type
>    encodes two distinct guest ABIs due to a mistake introduced
>    betwen QEMU N-2 and N-1
>  - We want to fix the bug in QEMU N
>  - For incoming migration there is no way to distinguish between
>    the ABIs used in N-2 and N-1, to pick the right one
> 
> So we're left with an unwinnable problem:
> 
>   - Not fixing the bug =>
> 
>        a) user migrating N-2 to N-1 have ABI change
>        b) user migrating N-2 to N have ABI change
>        c) user migrating N-1 to N are fine
> 
>     No mitigation for (a) or (b)
> 
>   - Fixing the bug =>
> 
>        a) user migrating N-2 to N-1 have ABI change.
>        b) user migrating N-2 to N are fine
>        c) user migrating N-1 to N have ABI change
> 
>     Bad situations (a) and (c) are mitigated by
>     backporting fix to N-1-stable too.
> 
> Generally we have preferred to fix the bug, because we have
> usually identified them fairly quickly after release, and
> backporting the fix to stable has been sufficient mitigation
> against ill effects. Basically the people left broken are a
> relatively small set out of the total userbase.
> 
> The real challenge arises when we are slow to identify the
> problem, such that we have a large number of people impacted.
> 
> 
> > I'll try to sum up idea Michael suggested (here comes my unorganized brain-dump),
> > 
> > 1. We can keep in VM's config QEMU version it was created on
> >    and as minimum warn user with a pointer to known issues if version in
> >    config mismatches version of actually used QEMU, with a knob to silence
> >    it for particular mismatch.
> > 
> > When an issue becomes know and resolved we know for sure how and what
> > changed and embed instructions on what options to use for fixing up VM's
> > config to preserve old HW config depending on QEMU version VM was installed on.  
> 
> > some more ideas:
> >    2. let mgmt layer to keep fixup list and apply them to config if available
> >        (user would need to upgrade mgmt or update fixup list somehow)
> >    3. let mgmt layer to pass VM's QEMU version to currently used QEMU, so
> >       that QEMU could maintain and apply fixups based on QEMU version + machine type.
> >       The user will have to upgrade to newer QEMU to get/use new fixups.  
> 
> The nice thing about machine type versioning is that we are treating the
> versions as opaque strings which represent a specific ABI, regardless of
> the QEMU version. This means that even if distros backport fixes for bugs
> or even new features, the machine type compatibility check remains a
> simple equality comparsion.
> 
> As soon as you introduce the QEMU version though, we have created a
> large matrix for compatibility. This matrix is expanded if a distro
> chooses to backport fixes for any of the machine type bugs to their
> stable streams. This can get particularly expensive when there are
> multiple streams a distro is maintaining.
> 
> *IF* the original N-1 qemu has a property that could be queried by
> the mgmt app to identify a machine type bug, then we could potentially
> apply a fixup automatically.
> 
> eg query-machines command in QEMU version N could report against
> "pc-i440fx-5.0", that there was a regression fix that has to be
> applied if property "foo" had value "bar".
> 
> Now, the mgmt app wants to migrate from QEMU N-2 or N-1 to QEMU N.
> It can query the value of "foo" on the source QEMU with qom-get.
> It now knows whether it has to override this property "foo" when
> spawning QEMU N on the target host.
> 
> Of course this doesn't help us if neither N-1 or N-2 QEMU had a
> property that can be queried to identify the bug - ie if the
> property in question was newly introduced in QEMU N to fix the
> bug.
> 
> > In my opinion both would lead to explosion of 'possibly needed' properties for each
> > change we introduce in hw/firmware(read ACPI) and very possibly a lot of conditional
> > branches in QEMU code. And I'm afraid it will become hard to maintain QEMU =>
> > more bugs in future.
> > Also it will lead to explosion of test matrix for downstreams who care about testing.
> > 
> > If we proactively gate changes on properties, we can just update fixup lists in mgmt,
> > without need to update QEMU (aka Insite rules) at a cost of complexity on QMEU side.
> > 
> > Alternatively we can be conservative in spawning new properties, that means creating
> > them only when issue is fixed and require users to update QEMU, so that fixups could
> > be applied to VM.
> > 
> > Feel free to shoot the messenger down or suggest ways how we can deal with the problem.  
> 
> The best solution is of course to not have introduced the ABI change in
> the first place. We have lots of testing, but upstream at least, I don't
> think we have anything that is explicitly recording the ABI associated
> with each machine type and validating that it hasn't changed. We rely on
> the developers to follow the coding practices wrt setting machine type
> defaults for back compat, and while we're good, we inevitably screw up
> every now & then.
> 
> Downstreams do have some of this ABI testing - several problems like the
> one we have there, have been identified when RHEL downstream QE did
> migration tests and found a change in RHEL machine types, which then
> was traced back to upstream.
> 
> I feel like we need some standard tool which can be run inside a VM
> that dumps all the possible ABI relevant information about the virtual
> machine in a nice data format.
> 
> We would have to run this for each machine type, and save the
> results to git immediately after release. Then for every change to
> master, we would have to run the test again for every historic
> machine type version and compare to the recorded ABI record.

Like Michael said we don't know that something is broken until it's
too late and this particular case it's not even broken (strictly speaking
change is correct) and is not even a part of ABI (it's ACPI code, i.e. firmware).

Problem is in the way virtio drivers enumerate devices, which makes the same
device appear as a new one. We can work around issue on hypervisor side so user
won't loose network connectivity or would be able to boot guest after QEMU upgrade.

We can suggest user re-installing their Windows (method that fixes almost all Win issues)
or to try to make it pain-less for user in these rare cases, by upgrading to
new QEMU (or fixed stable) which has workaround, so only the first few has to suffer.

(I think downstreams would even more benefit from this, there were similar problems
there before).

Yes, It surely will expand test matrix, but it should be limited to specific cases
we implemented fixups for.

> 
> Regards,
> Daniel



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ways to deal with broken machine types
  2021-03-26  0:48               ` Igor Mammedov
@ 2021-03-29 14:46                 ` Dr. David Alan Gilbert
  2021-03-29 20:32                   ` Igor Mammedov
  0 siblings, 1 reply; 20+ messages in thread
From: Dr. David Alan Gilbert @ 2021-03-29 14:46 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Peter Maydell, Daniel P. Berrangé,
	Michael S. Tsirkin, libvir-list, qemu devel list,
	Vitaly Cheptsov, Paolo Bonzini, Thomas Lamprecht

* Igor Mammedov (imammedo@redhat.com) wrote:
> On Tue, 23 Mar 2021 17:40:36 +0000
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> > On Tue, Mar 23, 2021 at 05:54:47PM +0100, Igor Mammedov wrote:
> > > Let me hijack this thread for beyond this case scope.
> > > 
> > > I agree that for this particular bug we've done all we could, but
> > > there is broader issue to discuss here.
> > > 
> > > We have machine versions to deal with hw compatibility issues and that covers most of the cases,
> > > but occasionally we notice problem well after release(s),
> > > so users may be stuck with broken VM and need to manually fix configuration (and/or VM).
> > > Figuring out what's wrong and how to fix it is far from trivial. So lets discuss if we
> > > can help to ease this pain, yes it will be late for first victims but it's still
> > > better than never.  
> > 
> > To summarize the problem situation
> > 
> >  - We rely on a machine type version to encode a precise guest ABI.
> >  - Due a bug, we are in a situation where the same machine type
> >    encodes two distinct guest ABIs due to a mistake introduced
> >    betwen QEMU N-2 and N-1
> >  - We want to fix the bug in QEMU N
> >  - For incoming migration there is no way to distinguish between
> >    the ABIs used in N-2 and N-1, to pick the right one
> > 
> > So we're left with an unwinnable problem:
> > 
> >   - Not fixing the bug =>
> > 
> >        a) user migrating N-2 to N-1 have ABI change
> >        b) user migrating N-2 to N have ABI change
> >        c) user migrating N-1 to N are fine
> > 
> >     No mitigation for (a) or (b)
> > 
> >   - Fixing the bug =>
> > 
> >        a) user migrating N-2 to N-1 have ABI change.
> >        b) user migrating N-2 to N are fine
> >        c) user migrating N-1 to N have ABI change
> > 
> >     Bad situations (a) and (c) are mitigated by
> >     backporting fix to N-1-stable too.
> > 
> > Generally we have preferred to fix the bug, because we have
> > usually identified them fairly quickly after release, and
> > backporting the fix to stable has been sufficient mitigation
> > against ill effects. Basically the people left broken are a
> > relatively small set out of the total userbase.
> > 
> > The real challenge arises when we are slow to identify the
> > problem, such that we have a large number of people impacted.
> > 
> > 
> > > I'll try to sum up idea Michael suggested (here comes my unorganized brain-dump),
> > > 
> > > 1. We can keep in VM's config QEMU version it was created on
> > >    and as minimum warn user with a pointer to known issues if version in
> > >    config mismatches version of actually used QEMU, with a knob to silence
> > >    it for particular mismatch.
> > > 
> > > When an issue becomes know and resolved we know for sure how and what
> > > changed and embed instructions on what options to use for fixing up VM's
> > > config to preserve old HW config depending on QEMU version VM was installed on.  
> > 
> > > some more ideas:
> > >    2. let mgmt layer to keep fixup list and apply them to config if available
> > >        (user would need to upgrade mgmt or update fixup list somehow)
> > >    3. let mgmt layer to pass VM's QEMU version to currently used QEMU, so
> > >       that QEMU could maintain and apply fixups based on QEMU version + machine type.
> > >       The user will have to upgrade to newer QEMU to get/use new fixups.  
> > 
> > The nice thing about machine type versioning is that we are treating the
> > versions as opaque strings which represent a specific ABI, regardless of
> > the QEMU version. This means that even if distros backport fixes for bugs
> > or even new features, the machine type compatibility check remains a
> > simple equality comparsion.
> > 
> > As soon as you introduce the QEMU version though, we have created a
> > large matrix for compatibility. This matrix is expanded if a distro
> > chooses to backport fixes for any of the machine type bugs to their
> > stable streams. This can get particularly expensive when there are
> > multiple streams a distro is maintaining.
> > 
> > *IF* the original N-1 qemu has a property that could be queried by
> > the mgmt app to identify a machine type bug, then we could potentially
> > apply a fixup automatically.
> > 
> > eg query-machines command in QEMU version N could report against
> > "pc-i440fx-5.0", that there was a regression fix that has to be
> > applied if property "foo" had value "bar".
> > 
> > Now, the mgmt app wants to migrate from QEMU N-2 or N-1 to QEMU N.
> > It can query the value of "foo" on the source QEMU with qom-get.
> > It now knows whether it has to override this property "foo" when
> > spawning QEMU N on the target host.
> > 
> > Of course this doesn't help us if neither N-1 or N-2 QEMU had a
> > property that can be queried to identify the bug - ie if the
> > property in question was newly introduced in QEMU N to fix the
> > bug.
> > 
> > > In my opinion both would lead to explosion of 'possibly needed' properties for each
> > > change we introduce in hw/firmware(read ACPI) and very possibly a lot of conditional
> > > branches in QEMU code. And I'm afraid it will become hard to maintain QEMU =>
> > > more bugs in future.
> > > Also it will lead to explosion of test matrix for downstreams who care about testing.
> > > 
> > > If we proactively gate changes on properties, we can just update fixup lists in mgmt,
> > > without need to update QEMU (aka Insite rules) at a cost of complexity on QMEU side.
> > > 
> > > Alternatively we can be conservative in spawning new properties, that means creating
> > > them only when issue is fixed and require users to update QEMU, so that fixups could
> > > be applied to VM.
> > > 
> > > Feel free to shoot the messenger down or suggest ways how we can deal with the problem.  
> > 
> > The best solution is of course to not have introduced the ABI change in
> > the first place. We have lots of testing, but upstream at least, I don't
> > think we have anything that is explicitly recording the ABI associated
> > with each machine type and validating that it hasn't changed. We rely on
> > the developers to follow the coding practices wrt setting machine type
> > defaults for back compat, and while we're good, we inevitably screw up
> > every now & then.
> > 
> > Downstreams do have some of this ABI testing - several problems like the
> > one we have there, have been identified when RHEL downstream QE did
> > migration tests and found a change in RHEL machine types, which then
> > was traced back to upstream.
> > 
> > I feel like we need some standard tool which can be run inside a VM
> > that dumps all the possible ABI relevant information about the virtual
> > machine in a nice data format.
> > 
> > We would have to run this for each machine type, and save the
> > results to git immediately after release. Then for every change to
> > master, we would have to run the test again for every historic
> > machine type version and compare to the recorded ABI record.
> 
> Like Michael said we don't know that something is broken until it's
> too late and this particular case it's not even broken (strictly speaking
> change is correct) and is not even a part of ABI (it's ACPI code, i.e. firmware).
> 
> Problem is in the way virtio drivers enumerate devices, which makes the same
> device appear as a new one. We can work around issue on hypervisor side so user
> won't loose network connectivity or would be able to boot guest after QEMU upgrade.
> 
> We can suggest user re-installing their Windows (method that fixes almost all Win issues)
> or to try to make it pain-less for user in these rare cases, by upgrading to
> new QEMU (or fixed stable) which has workaround, so only the first few has to suffer.
> 
> (I think downstreams would even more benefit from this, there were similar problems
> there before).
> 
> Yes, It surely will expand test matrix, but it should be limited to specific cases
> we implemented fixups for.

My suggestion from a long while ago (which no one liked) was to
include the source qemu version and then have a quirks list of things to
fix up.

Dave

> > 
> > Regards,
> > Daniel
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ways to deal with broken machine types
  2021-03-29 14:46                 ` Dr. David Alan Gilbert
@ 2021-03-29 20:32                   ` Igor Mammedov
  0 siblings, 0 replies; 20+ messages in thread
From: Igor Mammedov @ 2021-03-29 20:32 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Peter Maydell, Michael S. Tsirkin, libvir-list, qemu devel list,
	Vitaly Cheptsov, Paolo Bonzini, Thomas Lamprecht

On Mon, 29 Mar 2021 15:46:53 +0100
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Igor Mammedov (imammedo@redhat.com) wrote:
> > On Tue, 23 Mar 2021 17:40:36 +0000
> > Daniel P. Berrangé <berrange@redhat.com> wrote:
> >   
> > > On Tue, Mar 23, 2021 at 05:54:47PM +0100, Igor Mammedov wrote:  
> > > > Let me hijack this thread for beyond this case scope.
> > > > 
> > > > I agree that for this particular bug we've done all we could, but
> > > > there is broader issue to discuss here.
> > > > 
> > > > We have machine versions to deal with hw compatibility issues and that covers most of the cases,
> > > > but occasionally we notice problem well after release(s),
> > > > so users may be stuck with broken VM and need to manually fix configuration (and/or VM).
> > > > Figuring out what's wrong and how to fix it is far from trivial. So lets discuss if we
> > > > can help to ease this pain, yes it will be late for first victims but it's still
> > > > better than never.    
> > > 
> > > To summarize the problem situation
> > > 
> > >  - We rely on a machine type version to encode a precise guest ABI.
> > >  - Due a bug, we are in a situation where the same machine type
> > >    encodes two distinct guest ABIs due to a mistake introduced
> > >    betwen QEMU N-2 and N-1
> > >  - We want to fix the bug in QEMU N
> > >  - For incoming migration there is no way to distinguish between
> > >    the ABIs used in N-2 and N-1, to pick the right one
> > > 
> > > So we're left with an unwinnable problem:
> > > 
> > >   - Not fixing the bug =>
> > > 
> > >        a) user migrating N-2 to N-1 have ABI change
> > >        b) user migrating N-2 to N have ABI change
> > >        c) user migrating N-1 to N are fine
> > > 
> > >     No mitigation for (a) or (b)
> > > 
> > >   - Fixing the bug =>
> > > 
> > >        a) user migrating N-2 to N-1 have ABI change.
> > >        b) user migrating N-2 to N are fine
> > >        c) user migrating N-1 to N have ABI change
> > > 
> > >     Bad situations (a) and (c) are mitigated by
> > >     backporting fix to N-1-stable too.
> > > 
> > > Generally we have preferred to fix the bug, because we have
> > > usually identified them fairly quickly after release, and
> > > backporting the fix to stable has been sufficient mitigation
> > > against ill effects. Basically the people left broken are a
> > > relatively small set out of the total userbase.
> > > 
> > > The real challenge arises when we are slow to identify the
> > > problem, such that we have a large number of people impacted.
> > > 
> > >   
> > > > I'll try to sum up idea Michael suggested (here comes my unorganized brain-dump),
> > > > 
> > > > 1. We can keep in VM's config QEMU version it was created on
> > > >    and as minimum warn user with a pointer to known issues if version in
> > > >    config mismatches version of actually used QEMU, with a knob to silence
> > > >    it for particular mismatch.
> > > > 
> > > > When an issue becomes know and resolved we know for sure how and what
> > > > changed and embed instructions on what options to use for fixing up VM's
> > > > config to preserve old HW config depending on QEMU version VM was installed on.    
> > >   
> > > > some more ideas:
> > > >    2. let mgmt layer to keep fixup list and apply them to config if available
> > > >        (user would need to upgrade mgmt or update fixup list somehow)
> > > >    3. let mgmt layer to pass VM's QEMU version to currently used QEMU, so
> > > >       that QEMU could maintain and apply fixups based on QEMU version + machine type.
> > > >       The user will have to upgrade to newer QEMU to get/use new fixups.    
> > > 
> > > The nice thing about machine type versioning is that we are treating the
> > > versions as opaque strings which represent a specific ABI, regardless of
> > > the QEMU version. This means that even if distros backport fixes for bugs
> > > or even new features, the machine type compatibility check remains a
> > > simple equality comparsion.
> > > 
> > > As soon as you introduce the QEMU version though, we have created a
> > > large matrix for compatibility. This matrix is expanded if a distro
> > > chooses to backport fixes for any of the machine type bugs to their
> > > stable streams. This can get particularly expensive when there are
> > > multiple streams a distro is maintaining.
> > > 
> > > *IF* the original N-1 qemu has a property that could be queried by
> > > the mgmt app to identify a machine type bug, then we could potentially
> > > apply a fixup automatically.
> > > 
> > > eg query-machines command in QEMU version N could report against
> > > "pc-i440fx-5.0", that there was a regression fix that has to be
> > > applied if property "foo" had value "bar".
> > > 
> > > Now, the mgmt app wants to migrate from QEMU N-2 or N-1 to QEMU N.
> > > It can query the value of "foo" on the source QEMU with qom-get.
> > > It now knows whether it has to override this property "foo" when
> > > spawning QEMU N on the target host.
> > > 
> > > Of course this doesn't help us if neither N-1 or N-2 QEMU had a
> > > property that can be queried to identify the bug - ie if the
> > > property in question was newly introduced in QEMU N to fix the
> > > bug.
> > >   
> > > > In my opinion both would lead to explosion of 'possibly needed' properties for each
> > > > change we introduce in hw/firmware(read ACPI) and very possibly a lot of conditional
> > > > branches in QEMU code. And I'm afraid it will become hard to maintain QEMU =>
> > > > more bugs in future.
> > > > Also it will lead to explosion of test matrix for downstreams who care about testing.
> > > > 
> > > > If we proactively gate changes on properties, we can just update fixup lists in mgmt,
> > > > without need to update QEMU (aka Insite rules) at a cost of complexity on QMEU side.
> > > > 
> > > > Alternatively we can be conservative in spawning new properties, that means creating
> > > > them only when issue is fixed and require users to update QEMU, so that fixups could
> > > > be applied to VM.
> > > > 
> > > > Feel free to shoot the messenger down or suggest ways how we can deal with the problem.    
> > > 
> > > The best solution is of course to not have introduced the ABI change in
> > > the first place. We have lots of testing, but upstream at least, I don't
> > > think we have anything that is explicitly recording the ABI associated
> > > with each machine type and validating that it hasn't changed. We rely on
> > > the developers to follow the coding practices wrt setting machine type
> > > defaults for back compat, and while we're good, we inevitably screw up
> > > every now & then.
> > > 
> > > Downstreams do have some of this ABI testing - several problems like the
> > > one we have there, have been identified when RHEL downstream QE did
> > > migration tests and found a change in RHEL machine types, which then
> > > was traced back to upstream.
> > > 
> > > I feel like we need some standard tool which can be run inside a VM
> > > that dumps all the possible ABI relevant information about the virtual
> > > machine in a nice data format.
> > > 
> > > We would have to run this for each machine type, and save the
> > > results to git immediately after release. Then for every change to
> > > master, we would have to run the test again for every historic
> > > machine type version and compare to the recorded ABI record.  
> > 
> > Like Michael said we don't know that something is broken until it's
> > too late and this particular case it's not even broken (strictly speaking
> > change is correct) and is not even a part of ABI (it's ACPI code, i.e. firmware).
> > 
> > Problem is in the way virtio drivers enumerate devices, which makes the same
> > device appear as a new one. We can work around issue on hypervisor side so user
> > won't loose network connectivity or would be able to boot guest after QEMU upgrade.
> > 
> > We can suggest user re-installing their Windows (method that fixes almost all Win issues)
> > or to try to make it pain-less for user in these rare cases, by upgrading to
> > new QEMU (or fixed stable) which has workaround, so only the first few has to suffer.
> > 
> > (I think downstreams would even more benefit from this, there were similar problems
> > there before).
> > 
> > Yes, It surely will expand test matrix, but it should be limited to specific cases
> > we implemented fixups for.  
> 
> My suggestion from a long while ago (which no one liked) was to
> include the source qemu version and then have a quirks list of things to
> fix up.

That would help, in migration case but not much when starting VM.

Though idea here is similar to yours but applies to VM start time,
and then you don't need version in migration stream as target is told
about source version explicitly upon startup.

> Dave
> 
> > > 
> > > Regards,
> > > Daniel  
> > 
> >   



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ways to deal with broken machine types
  2021-03-23 19:40               ` Michael S. Tsirkin
@ 2021-03-30 11:21                 ` David Edmondson
  2021-03-30 11:53                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: David Edmondson @ 2021-03-30 11:21 UTC (permalink / raw)
  To: Michael S. Tsirkin, Daniel P. Berrangé
  Cc: Peter Maydell, libvir-list, qemu devel list, Vitaly Cheptsov,
	Paolo Bonzini, Igor Mammedov, Thomas Lamprecht

On Tuesday, 2021-03-23 at 15:40:24 -04, Michael S. Tsirkin wrote:

> On Tue, Mar 23, 2021 at 05:40:36PM +0000, Daniel P. Berrangé wrote:
>> On Tue, Mar 23, 2021 at 05:54:47PM +0100, Igor Mammedov wrote:
>> > Let me hijack this thread for beyond this case scope.
>> > 
>> > I agree that for this particular bug we've done all we could, but
>> > there is broader issue to discuss here.
>> > 
>> > We have machine versions to deal with hw compatibility issues and that covers most of the cases,
>> > but occasionally we notice problem well after release(s),
>> > so users may be stuck with broken VM and need to manually fix configuration (and/or VM).
>> > Figuring out what's wrong and how to fix it is far from trivial. So lets discuss if we
>> > can help to ease this pain, yes it will be late for first victims but it's still
>> > better than never.
>> 
>> To summarize the problem situation
>> 
>>  - We rely on a machine type version to encode a precise guest ABI.
>>  - Due a bug, we are in a situation where the same machine type
>>    encodes two distinct guest ABIs due to a mistake introduced
>>    betwen QEMU N-2 and N-1
>>  - We want to fix the bug in QEMU N
>>  - For incoming migration there is no way to distinguish between
>>    the ABIs used in N-2 and N-1, to pick the right one
>
>
> Not just incoming migration. Same applies to a guest restart.
>
>
>> So we're left with an unwinnable problem:
>> 
>>   - Not fixing the bug =>
>> 
>>        a) user migrating N-2 to N-1 have ABI change
>>        b) user migrating N-2 to N have ABI change
>>        c) user migrating N-1 to N are fine
>> 
>>     No mitigation for (a) or (b)
>> 
>>   - Fixing the bug =>
>> 
>>        a) user migrating N-2 to N-1 have ABI change.
>>        b) user migrating N-2 to N are fine
>>        c) user migrating N-1 to N have ABI change
>> 
>>     Bad situations (a) and (c) are mitigated by
>>     backporting fix to N-1-stable too.
>> 
>> Generally we have preferred to fix the bug, because we have
>> usually identified them fairly quickly after release, and
>> backporting the fix to stable has been sufficient mitigation
>> against ill effects. Basically the people left broken are a
>> relatively small set out of the total userbase.
>> 
>> The real challenge arises when we are slow to identify the
>> problem, such that we have a large number of people impacted.
>> 
>> 
>> > I'll try to sum up idea Michael suggested (here comes my unorganized brain-dump),
>> > 
>> > 1. We can keep in VM's config QEMU version it was created on
>> >    and as minimum warn user with a pointer to known issues if version in
>> >    config mismatches version of actually used QEMU, with a knob to silence
>> >    it for particular mismatch.
>> > 
>> > When an issue becomes know and resolved we know for sure how and what
>> > changed and embed instructions on what options to use for fixing up VM's
>> > config to preserve old HW config depending on QEMU version VM was installed on.
>> 
>> > some more ideas:
>> >    2. let mgmt layer to keep fixup list and apply them to config if available
>> >        (user would need to upgrade mgmt or update fixup list somehow)
>> >    3. let mgmt layer to pass VM's QEMU version to currently used QEMU, so
>> >       that QEMU could maintain and apply fixups based on QEMU version + machine type.
>> >       The user will have to upgrade to newer QEMU to get/use new fixups.
>> 
>> The nice thing about machine type versioning is that we are treating the
>> versions as opaque strings which represent a specific ABI, regardless of
>> the QEMU version. This means that even if distros backport fixes for bugs
>> or even new features, the machine type compatibility check remains a
>> simple equality comparsion.
>> 
>> As soon as you introduce the QEMU version though, we have created a
>> large matrix for compatibility.
>
>
> Yes but. If we explicitly handle them all the same then
> mechanically testing them all is an overkill.
> We just need to test the ones that have bugs which we
> care about fixing.
>
>
>> This matrix is expanded if a distro
>> chooses to backport fixes for any of the machine type bugs to their
>> stable streams. This can get particularly expensive when there are
>> multiple streams a distro is maintaining.
>> 
>> *IF* the original N-1 qemu has a property that could be queried by
>> the mgmt app to identify a machine type bug, then we could potentially
>> apply a fixup automatically.
>> 
>> eg query-machines command in QEMU version N could report against
>> "pc-i440fx-5.0", that there was a regression fix that has to be
>> applied if property "foo" had value "bar".
>> 
>> Now, the mgmt app wants to migrate from QEMU N-2 or N-1 to QEMU N.
>> It can query the value of "foo" on the source QEMU with qom-get.
>> It now knows whether it has to override this property "foo" when
>> spawning QEMU N on the target host.
>> 
>> Of course this doesn't help us if neither N-1 or N-2 QEMU had a
>> property that can be queried to identify the bug - ie if the
>> property in question was newly introduced in QEMU N to fix the
>> bug.
>> 
>> > In my opinion both would lead to explosion of 'possibly needed' properties for each
>> > change we introduce in hw/firmware(read ACPI) and very possibly a lot of conditional
>> > branches in QEMU code. And I'm afraid it will become hard to maintain QEMU =>
>> > more bugs in future.
>> > Also it will lead to explosion of test matrix for downstreams who care about testing.
>> > 
>> > If we proactively gate changes on properties, we can just update fixup lists in mgmt,
>> > without need to update QEMU (aka Insite rules) at a cost of complexity on QMEU side.
>> > 
>> > Alternatively we can be conservative in spawning new properties, that means creating
>> > them only when issue is fixed and require users to update QEMU, so that fixups could
>> > be applied to VM.
>> > 
>> > Feel free to shoot the messenger down or suggest ways how we can deal with the problem.
>> 
>> The best solution is of course to not have introduced the ABI change in
>> the first place. We have lots of testing, but upstream at least, I don't
>> think we have anything that is explicitly recording the ABI associated
>> with each machine type and validating that it hasn't changed. We rely on
>> the developers to follow the coding practices wrt setting machine type
>> defaults for back compat, and while we're good, we inevitably screw up
>> every now & then.
>> 
>> Downstreams do have some of this ABI testing - several problems like the
>> one we have there, have been identified when RHEL downstream QE did
>> migration tests and found a change in RHEL machine types, which then
>> was traced back to upstream.
>> 
>> I feel like we need some standard tool which can be run inside a VM
>> that dumps all the possible ABI relevant information about the virtual
>> machine in a nice data format.
>> 
>> We would have to run this for each machine type, and save the
>> results to git immediately after release. Then for every change to
>> master, we would have to run the test again for every historic
>> machine type version and compare to the recorded ABI record.
>> 
>> Regards,
>> Daniel
>
>
> Unfortunately I do not think this is practical :(.
>
> All examples of breakage I am aware of, we did not
> realise some part of interface was part of guest ABI
> and unsafe to change. We simply would not know to write a
> test for it.

While agreeing that it would not be possible to cover all aspects of the
ABI immediately, does that mean that some level of coverage would not be
useful?

>> -- 
>> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
>> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
>> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

dme.
-- 
Leaves are falling all around, it's time I was on my way.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ways to deal with broken machine types
  2021-03-30 11:21                 ` David Edmondson
@ 2021-03-30 11:53                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2021-03-30 11:53 UTC (permalink / raw)
  To: David Edmondson
  Cc: Peter Maydell, Daniel P. Berrangé,
	libvir-list, qemu devel list, Vitaly Cheptsov, Igor Mammedov,
	Paolo Bonzini, Thomas Lamprecht

On Tue, Mar 30, 2021 at 12:21:37PM +0100, David Edmondson wrote:
> > Unfortunately I do not think this is practical :(.
> >
> > All examples of breakage I am aware of, we did not
> > realise some part of interface was part of guest ABI
> > and unsafe to change. We simply would not know to write a
> > test for it.
> 
> While agreeing that it would not be possible to cover all aspects of the
> ABI immediately, does that mean that some level of coverage would not be
> useful?

Our testing already warns about ACPI table changes (which is what
happened here). We just verified them manually and thought they are
fine.

-- 
MST



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-03-30 11:55 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-01 19:59 [PATCH] i386/acpi: restore device paths for pre-5.1 vms Vitaly Cheptsov
2021-03-01 19:59 ` [PATCH] target/ppc: fix icount support on Book-e vms accessing SPRs Vitaly Cheptsov
2021-03-02  7:05 ` [PATCH] i386/acpi: restore device paths for pre-5.1 vms Thomas Lamprecht
2021-03-02  8:48 ` Igor Mammedov
2021-03-02 10:14 ` Michael S. Tsirkin
2021-03-22 15:43 ` Michael S. Tsirkin
2021-03-22 15:49   ` Vitaly Cheptsov
2021-03-23 14:48     ` Michael S. Tsirkin
2021-03-23 14:55       ` Vitaly Cheptsov
2021-03-23 15:04         ` Thomas Lamprecht
2021-03-23 16:54           ` Ways to deal with broken machine types Igor Mammedov
2021-03-23 17:40             ` Daniel P. Berrangé
2021-03-23 19:40               ` Michael S. Tsirkin
2021-03-30 11:21                 ` David Edmondson
2021-03-30 11:53                   ` Michael S. Tsirkin
2021-03-26  0:48               ` Igor Mammedov
2021-03-29 14:46                 ` Dr. David Alan Gilbert
2021-03-29 20:32                   ` Igor Mammedov
2021-03-23 19:32           ` [PATCH] i386/acpi: restore device paths for pre-5.1 vms Michael S. Tsirkin
2021-03-26  0:12             ` Igor Mammedov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).