All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/3] x86: fix cpu hotplug with secure boot
@ 2020-07-10 16:17 Igor Mammedov
  2020-07-10 16:17 ` [RFC 1/3] x86: lpc9: let firmware negotiate CPU hotplug SMI feature Igor Mammedov
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Igor Mammedov @ 2020-07-10 16:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: lersek, boris.ostrovsky, liran.alon

CPU hotplug with Secure Boot was not really supported and firmware wasn't aware
of hotplugged CPUs (which might lead to guest crashes). During 4.2 we introduced
locked SMI handler RAM arrea to make sure that guest OS wasn't able to inject
its own SMI handler and OVMF added initial CPU hotplug support.

This series is QEMU part of that support [1] which lets QMVF tell QEMU that
CPU hotplug with SMI broadcast enabled is supported so that QEMU would be able
to prevent hotplug in case it's not supported and trigger SMI on hotplug when
it's necessary. 

1) CPU hotplug negotiation part was introduced later so it might not be
in upstream OVMF yet or I might have missed the patch on edk2-devel
(Laszlo will point out to it/post formal patch)

Igor Mammedov (3):
  x86: lpc9: let firmware negotiate CPU hotplug SMI feature
  x86: cphp: prevent guest crash on CPU hotplug when broadcast SMI is in
    use
  x68: acpi: trigger SMI before scanning for hotplugged CPUs

 include/hw/acpi/cpu.h  |  1 +
 include/hw/i386/ich9.h |  1 +
 hw/acpi/cpu.c          |  6 ++++++
 hw/acpi/ich9.c         | 12 +++++++++++-
 hw/i386/acpi-build.c   | 33 ++++++++++++++++++++++++++++++++-
 hw/i386/pc.c           | 15 ++++++++++++++-
 hw/isa/lpc_ich9.c      | 10 ++++++++++
 7 files changed, 75 insertions(+), 3 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC 1/3] x86: lpc9: let firmware negotiate CPU hotplug SMI feature
  2020-07-10 16:17 [RFC 0/3] x86: fix cpu hotplug with secure boot Igor Mammedov
@ 2020-07-10 16:17 ` Igor Mammedov
  2020-07-14 10:19   ` Laszlo Ersek
  2020-07-10 16:17 ` [RFC 2/3] x86: cphp: prevent guest crash on CPU hotplug when broadcast SMI is in use Igor Mammedov
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Igor Mammedov @ 2020-07-10 16:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: lersek, boris.ostrovsky, liran.alon

It will allow firmware to notify QEMU that firmware requires SMI
being triggered on CPU hotplug, so that it would be able to account
for hotplugged CPU and relocate it to new SMM base.

Using the negotiated feature, follow up patches will insert SMI upcall
into AML code, to make sure that firmware processes hotplug before
guest OS would attempt to use new CPU.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 include/hw/i386/ich9.h | 1 +
 hw/i386/pc.c           | 4 +++-
 hw/isa/lpc_ich9.c      | 7 +++++++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/hw/i386/ich9.h b/include/hw/i386/ich9.h
index a98d10b252..422891a152 100644
--- a/include/hw/i386/ich9.h
+++ b/include/hw/i386/ich9.h
@@ -247,5 +247,6 @@ typedef struct ICH9LPCState {
 
 /* bit positions used in fw_cfg SMI feature negotiation */
 #define ICH9_LPC_SMI_F_BROADCAST_BIT            0
+#define ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT          1
 
 #endif /* HW_ICH9_H */
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index d7f27bc16b..6fe80c84d7 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -97,7 +97,9 @@
 #include "fw_cfg.h"
 #include "trace.h"
 
-GlobalProperty pc_compat_5_0[] = {};
+GlobalProperty pc_compat_5_0[] = {
+    { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
+};
 const size_t pc_compat_5_0_len = G_N_ELEMENTS(pc_compat_5_0);
 
 GlobalProperty pc_compat_4_2[] = {
diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index cd6e169d47..83da7346ab 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -373,6 +373,11 @@ static void smi_features_ok_callback(void *opaque)
         /* guest requests invalid features, leave @features_ok at zero */
         return;
     }
+    if (!(guest_features & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT)) &&
+        guest_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT)) {
+        /* cpu hotplug SMI requires SMI broadcast, leave @features_ok at zero */
+        return;
+    }
 
     /* valid feature subset requested, lock it down, report success */
     lpc->smi_negotiated_features = guest_features;
@@ -747,6 +752,8 @@ static Property ich9_lpc_properties[] = {
     DEFINE_PROP_BOOL("noreboot", ICH9LPCState, pin_strap.spkr_hi, true),
     DEFINE_PROP_BIT64("x-smi-broadcast", ICH9LPCState, smi_host_features,
                       ICH9_LPC_SMI_F_BROADCAST_BIT, true),
+    DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
+                      ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 2/3] x86: cphp: prevent guest crash on CPU hotplug when broadcast SMI is in use
  2020-07-10 16:17 [RFC 0/3] x86: fix cpu hotplug with secure boot Igor Mammedov
  2020-07-10 16:17 ` [RFC 1/3] x86: lpc9: let firmware negotiate CPU hotplug SMI feature Igor Mammedov
@ 2020-07-10 16:17 ` Igor Mammedov
  2020-07-14 10:56   ` Laszlo Ersek
  2020-07-10 16:17 ` [RFC 3/3] x68: acpi: trigger SMI before scanning for hotplugged CPUs Igor Mammedov
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Igor Mammedov @ 2020-07-10 16:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: lersek, boris.ostrovsky, liran.alon

There were reports of guest crash on CPU hotplug, when using q35 machine
type and QVMF with Secure Boot, due to hotplugged CPU trying to process SMI
at default SMI handler location without it being relocated by firmware first.

Fix it by refusing hotplug if firmware hasn't negotiatiad CPU hotplug SMI
support while SMI broadcast is in use.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/acpi/ich9.c | 12 +++++++++++-
 hw/i386/pc.c   | 11 +++++++++++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 2d204babc6..a22b434e0b 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -408,10 +408,20 @@ void ich9_pm_device_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
     ICH9LPCState *lpc = ICH9_LPC_DEVICE(hotplug_dev);
 
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
-        !lpc->pm.acpi_memory_hotplug.is_enabled)
+        !lpc->pm.acpi_memory_hotplug.is_enabled) {
         error_setg(errp,
                    "memory hotplug is not enabled: %s.memory-hotplug-support "
                    "is not set", object_get_typename(OBJECT(lpc)));
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+        uint64_t negotiated = lpc->smi_negotiated_features;
+
+        if (negotiated & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT) &&
+            !(negotiated & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT))) {
+            error_setg(errp, "cpu hotplug SMI was not enabled by firmware");
+            error_append_hint(errp, "update machine type to newer than 5.0 "
+                "and firmware that suppors CPU hotplug in Secure Boot mode");
+        }
+    }
 }
 
 void ich9_pm_device_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 6fe80c84d7..dc1e9157d7 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1508,6 +1508,17 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
         return;
     }
 
+    if (pcms->acpi_dev) {
+        Error *local_err = NULL;
+
+        hotplug_handler_pre_plug(HOTPLUG_HANDLER(pcms->acpi_dev), dev,
+                                 &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+
     init_topo_info(&topo_info, x86ms);
 
     env->nr_dies = x86ms->smp_dies;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 3/3] x68: acpi: trigger SMI before scanning for hotplugged CPUs
  2020-07-10 16:17 [RFC 0/3] x86: fix cpu hotplug with secure boot Igor Mammedov
  2020-07-10 16:17 ` [RFC 1/3] x86: lpc9: let firmware negotiate CPU hotplug SMI feature Igor Mammedov
  2020-07-10 16:17 ` [RFC 2/3] x86: cphp: prevent guest crash on CPU hotplug when broadcast SMI is in use Igor Mammedov
@ 2020-07-10 16:17 ` Igor Mammedov
  2020-07-14 12:28   ` Laszlo Ersek
  2020-07-14  9:58 ` [RFC 0/3] x86: fix cpu hotplug with secure boot Laszlo Ersek
  2020-07-14 18:26 ` Laszlo Ersek
  4 siblings, 1 reply; 19+ messages in thread
From: Igor Mammedov @ 2020-07-10 16:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: lersek, boris.ostrovsky, liran.alon

In case firmware has negotiated CPU hotplug SMI feature, generate
AML to describe SMI IO port region and send SMI to firmware
on each CPU hotplug SCI.

It might be not really usable, but should serve as a starting point to
discuss how better to deal with split hotplug sequence during hot-add
(
ex scenario where it will break is:
   hot-add
      -> (QEMU) add CPU in hotplug regs
      -> (QEMU) SCI
           -1-> (OS) scan
               -1-> (OS) SMI
               -1-> (FW) pull in CPU1 ***
               -1-> (OS) start iterating hotplug regs
   hot-add
      -> (QEMU) add CPU in hotplug regs
      -> (QEMU) SCI
            -2-> (OS) scan (blocked on mutex till previous scan is finished)
               -1-> (OS) 1st added CPU1 send device check event -> INIT/SIPI
               -1-> (OS) 1st added CPU2 send device check event -> INIT/SIPI
                       that's where it explodes, since FW didn't see CPU2
                       when SMI was called
)

hot remove will throw in yet another set of problems, so lets discuss
both here and see if we can  really share hotplug registers block between
FW and AML or we should do something else with it.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v0:
 - s/aml_string/aml_eisaid/
   /fixes Windows BSOD, on nonsense value/ (Laszlo Ersek <lersek@redhat.com>)
 - put SMI device under PCI0 like the rest of hotplug IO port
 - do not generate SMI AML if CPU hotplug SMI feature hasn't been negotiated
---
 include/hw/acpi/cpu.h |  1 +
 hw/acpi/cpu.c         |  6 ++++++
 hw/i386/acpi-build.c  | 33 ++++++++++++++++++++++++++++++++-
 hw/isa/lpc_ich9.c     |  3 +++
 4 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
index 62f0278ba2..0eeedaa491 100644
--- a/include/hw/acpi/cpu.h
+++ b/include/hw/acpi/cpu.h
@@ -50,6 +50,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
 typedef struct CPUHotplugFeatures {
     bool acpi_1_compatible;
     bool has_legacy_cphp;
+    const char *smi_path;
 } CPUHotplugFeatures;
 
 void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index 3d6a500fb7..6539bc23f6 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -14,6 +14,8 @@
 #define ACPI_CPU_CMD_DATA_OFFSET_RW 8
 #define ACPI_CPU_CMD_DATA2_OFFSET_R 0
 
+#define OVMF_CPUHP_SMI_CMD 4
+
 enum {
     CPHP_GET_NEXT_CPU_WITH_EVENT_CMD = 0,
     CPHP_OST_EVENT_CMD = 1,
@@ -473,6 +475,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
             Aml *next_cpu_cmd = aml_int(CPHP_GET_NEXT_CPU_WITH_EVENT_CMD);
 
             aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
+            if (opts.smi_path) {
+                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
+                    aml_name("%s", opts.smi_path)));
+            }
             aml_append(method, aml_store(one, has_event));
             while_ctx = aml_while(aml_equal(has_event, one));
             {
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index b7bcbbbb2a..1e21386f0c 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -95,6 +95,7 @@ typedef struct AcpiPmInfo {
     bool s3_disabled;
     bool s4_disabled;
     bool pcihp_bridge_en;
+    bool smi_on_cpuhp;
     uint8_t s4_val;
     AcpiFadtData fadt;
     uint16_t cpu_hp_io_base;
@@ -194,6 +195,7 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
     pm->cpu_hp_io_base = 0;
     pm->pcihp_io_base = 0;
     pm->pcihp_io_len = 0;
+    pm->smi_on_cpuhp = false;
 
     assert(obj);
     init_common_fadt_data(machine, obj, &pm->fadt);
@@ -213,6 +215,9 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
         pm->fadt.reset_val = 0xf;
         pm->fadt.flags |= 1 << ACPI_FADT_F_RESET_REG_SUP;
         pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
+        pm->smi_on_cpuhp =
+            !!(object_property_get_uint(lpc, "x-smi-negotiated-features", NULL)
+               & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
     }
 
     /* The above need not be conditional on machine type because the reset port
@@ -1515,6 +1520,31 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
         aml_append(dev, aml_name_decl("_UID", aml_int(1)));
         aml_append(dev, build_q35_osc_method());
         aml_append(sb_scope, dev);
+
+        if (pm->smi_on_cpuhp) {
+            /* reserve SMI block resources */
+            dev = aml_device("PCI0.SMI0");
+            aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A06")));
+            aml_append(dev, aml_name_decl("_UID", aml_string("SMI resources")));
+            crs = aml_resource_template();
+            aml_append(crs,
+                aml_io(
+                       AML_DECODE16,
+                       ACPI_PORT_SMI_CMD,
+                       ACPI_PORT_SMI_CMD,
+                       1,
+                       1)
+            );
+            aml_append(dev, aml_name_decl("_CRS", crs));
+            aml_append(dev, aml_operation_region("SMIR", AML_SYSTEM_IO,
+                aml_int(0xB2), 1));
+            field = aml_field("SMIR", AML_BYTE_ACC, AML_NOLOCK,
+                              AML_WRITE_AS_ZEROS);
+            aml_append(field, aml_named_field("SMIC", 8));
+            aml_append(dev, field);
+            aml_append(sb_scope, dev);
+        }
+
         aml_append(dsdt, sb_scope);
 
         build_hpet_aml(dsdt);
@@ -1530,7 +1560,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
         build_legacy_cpu_hotplug_aml(dsdt, machine, pm->cpu_hp_io_base);
     } else {
         CPUHotplugFeatures opts = {
-            .acpi_1_compatible = true, .has_legacy_cphp = true
+            .acpi_1_compatible = true, .has_legacy_cphp = true,
+            .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" : NULL,
         };
         build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
                        "\\_SB.PCI0", "\\_GPE._E02");
diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index 83da7346ab..5ebea70a8d 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -643,6 +643,9 @@ static void ich9_lpc_initfn(Object *obj)
                                   &acpi_enable_cmd, OBJ_PROP_FLAG_READ);
     object_property_add_uint8_ptr(OBJECT(lpc), ACPI_PM_PROP_ACPI_DISABLE_CMD,
                                   &acpi_disable_cmd, OBJ_PROP_FLAG_READ);
+    object_property_add_uint64_ptr(obj, "x-smi-negotiated-features",
+                                   &lpc->smi_negotiated_features,
+                                   OBJ_PROP_FLAG_READ);
 
     ich9_pm_add_properties(obj, &lpc->pm);
 }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC 0/3] x86: fix cpu hotplug with secure boot
  2020-07-10 16:17 [RFC 0/3] x86: fix cpu hotplug with secure boot Igor Mammedov
                   ` (2 preceding siblings ...)
  2020-07-10 16:17 ` [RFC 3/3] x68: acpi: trigger SMI before scanning for hotplugged CPUs Igor Mammedov
@ 2020-07-14  9:58 ` Laszlo Ersek
  2020-07-14 10:10   ` Laszlo Ersek
  2020-07-14 18:26 ` Laszlo Ersek
  4 siblings, 1 reply; 19+ messages in thread
From: Laszlo Ersek @ 2020-07-14  9:58 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel; +Cc: boris.ostrovsky, liran.alon

On 07/10/20 18:17, Igor Mammedov wrote:
> CPU hotplug with Secure Boot was not really supported and firmware wasn't aware
> of hotplugged CPUs (which might lead to guest crashes). During 4.2 we introduced
> locked SMI handler RAM arrea to make sure that guest OS wasn't able to inject
> its own SMI handler and OVMF added initial CPU hotplug support.
> 
> This series is QEMU part of that support [1] which lets QMVF tell QEMU that
> CPU hotplug with SMI broadcast enabled is supported so that QEMU would be able
> to prevent hotplug in case it's not supported and trigger SMI on hotplug when
> it's necessary. 
> 
> 1) CPU hotplug negotiation part was introduced later so it might not be
> in upstream OVMF yet or I might have missed the patch on edk2-devel
> (Laszlo will point out to it/post formal patch)

I'll post it later, after testing it with this patch series.

Thanks!
Laszlo

> 
> Igor Mammedov (3):
>   x86: lpc9: let firmware negotiate CPU hotplug SMI feature
>   x86: cphp: prevent guest crash on CPU hotplug when broadcast SMI is in
>     use
>   x68: acpi: trigger SMI before scanning for hotplugged CPUs
> 
>  include/hw/acpi/cpu.h  |  1 +
>  include/hw/i386/ich9.h |  1 +
>  hw/acpi/cpu.c          |  6 ++++++
>  hw/acpi/ich9.c         | 12 +++++++++++-
>  hw/i386/acpi-build.c   | 33 ++++++++++++++++++++++++++++++++-
>  hw/i386/pc.c           | 15 ++++++++++++++-
>  hw/isa/lpc_ich9.c      | 10 ++++++++++
>  7 files changed, 75 insertions(+), 3 deletions(-)
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/3] x86: fix cpu hotplug with secure boot
  2020-07-14  9:58 ` [RFC 0/3] x86: fix cpu hotplug with secure boot Laszlo Ersek
@ 2020-07-14 10:10   ` Laszlo Ersek
  0 siblings, 0 replies; 19+ messages in thread
From: Laszlo Ersek @ 2020-07-14 10:10 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel; +Cc: boris.ostrovsky, liran.alon

On 07/14/20 11:58, Laszlo Ersek wrote:
> On 07/10/20 18:17, Igor Mammedov wrote:
>> CPU hotplug with Secure Boot was not really supported and firmware wasn't aware
>> of hotplugged CPUs (which might lead to guest crashes). During 4.2 we introduced
>> locked SMI handler RAM arrea to make sure that guest OS wasn't able to inject
>> its own SMI handler and OVMF added initial CPU hotplug support.
>>
>> This series is QEMU part of that support [1] which lets QMVF tell QEMU that
>> CPU hotplug with SMI broadcast enabled is supported so that QEMU would be able
>> to prevent hotplug in case it's not supported and trigger SMI on hotplug when
>> it's necessary. 
>>
>> 1) CPU hotplug negotiation part was introduced later so it might not be
>> in upstream OVMF yet or I might have missed the patch on edk2-devel
>> (Laszlo will point out to it/post formal patch)
> 
> I'll post it later, after testing it with this patch series.

(For the time being I had only attached it to
<https://bugzilla.redhat.com/show_bug.cgi?id=1849177#c1>.)

> 
> Thanks!
> Laszlo
> 
>>
>> Igor Mammedov (3):
>>   x86: lpc9: let firmware negotiate CPU hotplug SMI feature
>>   x86: cphp: prevent guest crash on CPU hotplug when broadcast SMI is in
>>     use
>>   x68: acpi: trigger SMI before scanning for hotplugged CPUs
>>
>>  include/hw/acpi/cpu.h  |  1 +
>>  include/hw/i386/ich9.h |  1 +
>>  hw/acpi/cpu.c          |  6 ++++++
>>  hw/acpi/ich9.c         | 12 +++++++++++-
>>  hw/i386/acpi-build.c   | 33 ++++++++++++++++++++++++++++++++-
>>  hw/i386/pc.c           | 15 ++++++++++++++-
>>  hw/isa/lpc_ich9.c      | 10 ++++++++++
>>  7 files changed, 75 insertions(+), 3 deletions(-)
>>
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 1/3] x86: lpc9: let firmware negotiate CPU hotplug SMI feature
  2020-07-10 16:17 ` [RFC 1/3] x86: lpc9: let firmware negotiate CPU hotplug SMI feature Igor Mammedov
@ 2020-07-14 10:19   ` Laszlo Ersek
  0 siblings, 0 replies; 19+ messages in thread
From: Laszlo Ersek @ 2020-07-14 10:19 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel; +Cc: boris.ostrovsky, liran.alon

On 07/10/20 18:17, Igor Mammedov wrote:
> It will allow firmware to notify QEMU that firmware requires SMI
> being triggered on CPU hotplug, so that it would be able to account
> for hotplugged CPU and relocate it to new SMM base.
>
> Using the negotiated feature, follow up patches will insert SMI upcall
> into AML code, to make sure that firmware processes hotplug before
> guest OS would attempt to use new CPU.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  include/hw/i386/ich9.h | 1 +
>  hw/i386/pc.c           | 4 +++-
>  hw/isa/lpc_ich9.c      | 7 +++++++
>  3 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/include/hw/i386/ich9.h b/include/hw/i386/ich9.h
> index a98d10b252..422891a152 100644
> --- a/include/hw/i386/ich9.h
> +++ b/include/hw/i386/ich9.h
> @@ -247,5 +247,6 @@ typedef struct ICH9LPCState {
>
>  /* bit positions used in fw_cfg SMI feature negotiation */
>  #define ICH9_LPC_SMI_F_BROADCAST_BIT            0
> +#define ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT          1
>
>  #endif /* HW_ICH9_H */
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index d7f27bc16b..6fe80c84d7 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -97,7 +97,9 @@
>  #include "fw_cfg.h"
>  #include "trace.h"
>
> -GlobalProperty pc_compat_5_0[] = {};
> +GlobalProperty pc_compat_5_0[] = {
> +    { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
> +};
>  const size_t pc_compat_5_0_len = G_N_ELEMENTS(pc_compat_5_0);
>
>  GlobalProperty pc_compat_4_2[] = {
> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> index cd6e169d47..83da7346ab 100644
> --- a/hw/isa/lpc_ich9.c
> +++ b/hw/isa/lpc_ich9.c
> @@ -373,6 +373,11 @@ static void smi_features_ok_callback(void *opaque)
>          /* guest requests invalid features, leave @features_ok at zero */
>          return;
>      }
> +    if (!(guest_features & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT)) &&
> +        guest_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT)) {
> +        /* cpu hotplug SMI requires SMI broadcast, leave @features_ok at zero */
> +        return;
> +    }
>
>      /* valid feature subset requested, lock it down, report success */
>      lpc->smi_negotiated_features = guest_features;
> @@ -747,6 +752,8 @@ static Property ich9_lpc_properties[] = {
>      DEFINE_PROP_BOOL("noreboot", ICH9LPCState, pin_strap.spkr_hi, true),
>      DEFINE_PROP_BIT64("x-smi-broadcast", ICH9LPCState, smi_host_features,
>                        ICH9_LPC_SMI_F_BROADCAST_BIT, true),
> +    DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
> +                      ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>
>

This patch looks good to me:

Reviewed-by: Laszlo Ersek <lersek@redhat.com>

However I'd suggest that we introduce ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT
at once (bit position 2). We don't have to handle it for real, but we
should catch it in patch#2, and reject it gracefully.

For that, the property would be called "x-smi-cpu-hot-unplug", and the
error check in smi_features_ok_callback() would go like:

    if (!(guest_features & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT)) &&
        guest_features & (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
                          BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT))) {
        /* either one of the "CPU hotplug with SMI" and "CPU hot-unplug
         * with SMI" features requires SMI broadcast; leave @features_ok
         * at zero
         */
        return;
    }

The reason I'm suggesting catching LPC_SMI_F_CPU_HOT_UNPLUG_BIT at once
is that people will inevitably try to un-plug, when they see plug
succeed. The firmware itself does catch the (as yet unimplemented)
un-plug attempt, but the way the firmware fails is not -- cannot be --
graceful. It hangs, intentionally.

Failing the device_del QMP command is more graceful.

(For this series, I'll provide test results once I've read the patches.)

Thanks!
Laszlo



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 2/3] x86: cphp: prevent guest crash on CPU hotplug when broadcast SMI is in use
  2020-07-10 16:17 ` [RFC 2/3] x86: cphp: prevent guest crash on CPU hotplug when broadcast SMI is in use Igor Mammedov
@ 2020-07-14 10:56   ` Laszlo Ersek
  2020-07-17 12:57     ` Igor Mammedov
  0 siblings, 1 reply; 19+ messages in thread
From: Laszlo Ersek @ 2020-07-14 10:56 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel; +Cc: boris.ostrovsky, liran.alon

On 07/10/20 18:17, Igor Mammedov wrote:
> There were reports of guest crash on CPU hotplug, when using q35 machine
> type and QVMF with Secure Boot, due to hotplugged CPU trying to process SMI

(1) typo: s/QVMF/OVMF/ please

(2) Please replace "Secure Boot" with "SMM". In everyday practice it's
OK to use them interchangeably, but in this commit message I'd like us
to be more precise.

> at default SMI handler location without it being relocated by firmware first.
>
> Fix it by refusing hotplug if firmware hasn't negotiatiad CPU hotplug SMI

(3) s/negotiatiad/negotiated/

> support while SMI broadcast is in use.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  hw/acpi/ich9.c | 12 +++++++++++-
>  hw/i386/pc.c   | 11 +++++++++++
>  2 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
> index 2d204babc6..a22b434e0b 100644
> --- a/hw/acpi/ich9.c
> +++ b/hw/acpi/ich9.c
> @@ -408,10 +408,20 @@ void ich9_pm_device_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
>      ICH9LPCState *lpc = ICH9_LPC_DEVICE(hotplug_dev);
>
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
> -        !lpc->pm.acpi_memory_hotplug.is_enabled)
> +        !lpc->pm.acpi_memory_hotplug.is_enabled) {
>          error_setg(errp,
>                     "memory hotplug is not enabled: %s.memory-hotplug-support "
>                     "is not set", object_get_typename(OBJECT(lpc)));
> +    } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
> +        uint64_t negotiated = lpc->smi_negotiated_features;

Wow, this is a relief. I thought it would be a difficult problem to
access the ICH9-LPC object cleanly, on the call stack of the device_add
command. I didn't imagine it would be at our disposal immediately.

> +
> +        if (negotiated & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT) &&
> +            !(negotiated & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT))) {
> +            error_setg(errp, "cpu hotplug SMI was not enabled by firmware");

(4) Please let's call this

  cpu hotplug *with* SMI

not just

  cpu hotplug SMI

(Emphasis added on "with" just for the sake of this discussion; no need
to embed the asterisks in the message.)

Because:

In my thinking, the feature that the firmware negotiates is not:

  SMI or no SMI, on CPU hotplug

Instead, the firmware negotiates:

  CPU hotplug with SMI, or no CPU hotplug

IOW, "SMI-or-no-SMI" is not a sub-feature of CPU hotplug; the feature
being negotiated, when SMI broadcast is enabled, is CPU hotplug as a
whole. That's exactly what this patch implements.


> +            error_append_hint(errp, "update machine type to newer than 5.0 "
> +                "and firmware that suppors CPU hotplug in Secure Boot mode");

(5) Please replace

  "in Secure Boot mode"

with

  "with SMM"

(for "firmware that suppors CPU hotplug with SMM")

> +        }
> +    }
>  }
>
>  void ich9_pm_device_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 6fe80c84d7..dc1e9157d7 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1508,6 +1508,17 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>          return;
>      }
>
> +    if (pcms->acpi_dev) {
> +        Error *local_err = NULL;
> +
> +        hotplug_handler_pre_plug(HOTPLUG_HANDLER(pcms->acpi_dev), dev,
> +                                 &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +    }
> +
>      init_topo_info(&topo_info, x86ms);
>
>      env->nr_dies = x86ms->smp_dies;
>

(6) This looks sane to me, but I have a question for the *pre-patch*
code.

I notice that hotplug_handler_pre_plug() is already called from the
(completely unrelated) function pc_memory_pre_plug().

In pc_memory_pre_plug(), we have the following snippet:

    /*
     * When -no-acpi is used with Q35 machine type, no ACPI is built,
     * but pcms->acpi_dev is still created. Check !acpi_enabled in
     * addition to cover this case.
     */
    if (!pcms->acpi_dev || !x86_machine_is_acpi_enabled(X86_MACHINE(pcms))) {
        error_setg(errp,
                   "memory hotplug is not enabled: missing acpi device or acpi disabled");
        return;
    }

Whereas in pc_cpu_pre_plug(), the present patch only adds a
"pcms->acpi_dev" nullity check.

Should pc_cpu_pre_plug() check for ACPI enablement similarly to
pc_memory_pre_plug()?

I'm asking for two reasons:

(6a) for the feature at hand (CPU hotplug with SMI), maybe we should
write:

    if (pcms->acpi_dev && x86_machine_is_acpi_enabled(X86_MACHINE(pcms))) {

(6b) or maybe more strictly, copy the check from memory hotplug (just
update the error message):

    if (!pcms->acpi_dev || !x86_machine_is_acpi_enabled(X86_MACHINE(pcms))) {
        error_setg(errp,
                   "CPU hotplug is not enabled: missing acpi device or acpi disabled");
        return;
    }

Because CPU hotplug depends on ACPI too, just like memory hotplug,
regardless of firmware, and regardless of guest-SMM. Am I correct to
think that?

Basically, I'm asking if we should replicate original commit
8cd91acec8df ("pc: fail memory hot-plug/unplug with -no-acpi and Q35
machine type", 2018-01-12) for CPU hotplug first (in a separate patch!),
before dealing with "lpc->smi_negotiated_features" in this patch.

Hmm... I'm getting confused. I *do* see similar checks in pc_cpu_plug()
and pc_cpu_unplug_request_cb(). But:

- I don't understand what determines whether we put the ACPI check in
*PRE* plug functions, or the plug functions,

- and I don't understand why pc_cpu_plug() and
pc_cpu_unplug_request_cb() only check "pcms->acpi_dev", and not
x86_machine_is_acpi_enabled().


(7) According to my request under patch#1, I propose that we should
implement a similar rejection for CPU hot-unplug, in this series. (Can
be a separate patch, of course.)

Thanks!
Laszlo



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 3/3] x68: acpi: trigger SMI before scanning for hotplugged CPUs
  2020-07-10 16:17 ` [RFC 3/3] x68: acpi: trigger SMI before scanning for hotplugged CPUs Igor Mammedov
@ 2020-07-14 12:28   ` Laszlo Ersek
  2020-07-14 12:41     ` Laszlo Ersek
  2020-07-17 13:13     ` Igor Mammedov
  0 siblings, 2 replies; 19+ messages in thread
From: Laszlo Ersek @ 2020-07-14 12:28 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel; +Cc: boris.ostrovsky, Peter Krempa, liran.alon

(CC'ing Peter Krempa due to virsh setvcpu (singular) / setvcpus (plural)
references)

On 07/10/20 18:17, Igor Mammedov wrote:
> In case firmware has negotiated CPU hotplug SMI feature, generate
> AML to describe SMI IO port region and send SMI to firmware
> on each CPU hotplug SCI.
> 
> It might be not really usable, but should serve as a starting point to
> discuss how better to deal with split hotplug sequence during hot-add
> (
> ex scenario where it will break is:
>    hot-add
>       -> (QEMU) add CPU in hotplug regs
>       -> (QEMU) SCI
>            -1-> (OS) scan
>                -1-> (OS) SMI
>                -1-> (FW) pull in CPU1 ***
>                -1-> (OS) start iterating hotplug regs
>    hot-add
>       -> (QEMU) add CPU in hotplug regs
>       -> (QEMU) SCI
>             -2-> (OS) scan (blocked on mutex till previous scan is finished)
>                -1-> (OS) 1st added CPU1 send device check event -> INIT/SIPI
>                -1-> (OS) 1st added CPU2 send device check event -> INIT/SIPI
>                        that's where it explodes, since FW didn't see CPU2
>                        when SMI was called
> )
> 
> hot remove will throw in yet another set of problems, so lets discuss
> both here and see if we can  really share hotplug registers block between
> FW and AML or we should do something else with it.

This issue is generally triggered by management applications such as
libvirt that issue device_add commands in quick succession. For libvirt,
the command is "virsh setvcpus" (plural) with multiple CPUs specified
for plugging. The singular "virsh setvcpu" command, which is more
friendly towards guest NUMA, does not run into the symptom.

The scope of the scan method lock is not large enough, with SMI in the
picture.

I suggest that we not uproot the existing AML code or the hotplug
register block. Instead, I suggest that we add serialization at a higher
level, with sufficient scope.

QEMU:

- introduce a new flag standing for "CPU plug operation in progress"

- if ICH9_LPC_SMI_F_BROADCAST_BIT has been negotiated:

  - "device_add" and "device_del" should enforce
    ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT and
    ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, respectively

  - both device_add and device_del (for VCPUs) should set check the
    "in progress" flag.

    - If set, reject the request synchronously

    - Otherwise, set the flag, and commence the operation

  - in cpu_hotplug_wr(), where we emit the ACPI_DEVICE_OST event with
    qapi_event_send_acpi_device_ost(), clear the "in-progress" flag.

- If QEMU executes the QMP command processing and the cpu_hotplug_wr()
function on different (host OS) threads, then perhaps we should use an
atomic type for the flag. (Not sure about locking between QEMU threads,
sorry.) I don't really expect race conditions, but in case we ever get
stuck with the flag, we should make sure that the stuck state is "in
progress", and not "not in progress". (The former state can prevent
further plug operations, but cannot cause the guest to lose state.)

libvirtd:

- the above QEMU changes will gracefully prevent "virsh setvcpus"
(plural) from working, so the libvirtd changes should "only" aim at
coping with the new serialization added in QEMU

- introduce a new info channel between qemuDomainSetVcpusLive() and
qemuProcessHandleAcpiOstInfo(). In the former, delay sending the next
"device_add" command until ACPI_DEVICE_OST is queued by
qemuProcessHandleAcpiOstInfo().


My point is that we have 60+ SMI feature bits left, so if we have to
invent something entirely new for unplug, we can still do it later. I'm
unable to foresee the particulars of unplug at this stage, beyond what
I've said already in this thread (i.e. that we should immediately
introduce the unplug feature bit, just so we can reject unplug
gracefully). If the current ACPI stuff, the hotplug register block, etc.
prove *inextensible for unplug*, we can use further feature bits for
something entirely new. But I think we can extend what we have now for
completing plug only. And I think modifying QEMU and libvirtd for that
is fair -- I've just read the _OST method documentation in the ACPI
spec, and _OST was clearly designed for coordinating (un)plug between
agents. For example, my understanding is that memory hotplug can fail,
failure is reported via ACPI_DEVICE_OST to libvirtd, so ACPI_DEVICE_OST
is already used for syncronizing guest operations with management layer
operations.

> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> v0:
>  - s/aml_string/aml_eisaid/
>    /fixes Windows BSOD, on nonsense value/ (Laszlo Ersek <lersek@redhat.com>)
>  - put SMI device under PCI0 like the rest of hotplug IO port
>  - do not generate SMI AML if CPU hotplug SMI feature hasn't been negotiated
> ---
>  include/hw/acpi/cpu.h |  1 +
>  hw/acpi/cpu.c         |  6 ++++++
>  hw/i386/acpi-build.c  | 33 ++++++++++++++++++++++++++++++++-
>  hw/isa/lpc_ich9.c     |  3 +++
>  4 files changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> index 62f0278ba2..0eeedaa491 100644
> --- a/include/hw/acpi/cpu.h
> +++ b/include/hw/acpi/cpu.h
> @@ -50,6 +50,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
>  typedef struct CPUHotplugFeatures {
>      bool acpi_1_compatible;
>      bool has_legacy_cphp;
> +    const char *smi_path;
>  } CPUHotplugFeatures;
>  
>  void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> index 3d6a500fb7..6539bc23f6 100644
> --- a/hw/acpi/cpu.c
> +++ b/hw/acpi/cpu.c
> @@ -14,6 +14,8 @@
>  #define ACPI_CPU_CMD_DATA_OFFSET_RW 8
>  #define ACPI_CPU_CMD_DATA2_OFFSET_R 0
>  
> +#define OVMF_CPUHP_SMI_CMD 4
> +
>  enum {
>      CPHP_GET_NEXT_CPU_WITH_EVENT_CMD = 0,
>      CPHP_OST_EVENT_CMD = 1,
> @@ -473,6 +475,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>              Aml *next_cpu_cmd = aml_int(CPHP_GET_NEXT_CPU_WITH_EVENT_CMD);
>  
>              aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
> +            if (opts.smi_path) {
> +                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
> +                    aml_name("%s", opts.smi_path)));
> +            }
>              aml_append(method, aml_store(one, has_event));
>              while_ctx = aml_while(aml_equal(has_event, one));
>              {
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index b7bcbbbb2a..1e21386f0c 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -95,6 +95,7 @@ typedef struct AcpiPmInfo {
>      bool s3_disabled;
>      bool s4_disabled;
>      bool pcihp_bridge_en;
> +    bool smi_on_cpuhp;
>      uint8_t s4_val;
>      AcpiFadtData fadt;
>      uint16_t cpu_hp_io_base;
> @@ -194,6 +195,7 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>      pm->cpu_hp_io_base = 0;
>      pm->pcihp_io_base = 0;
>      pm->pcihp_io_len = 0;
> +    pm->smi_on_cpuhp = false;
>  
>      assert(obj);
>      init_common_fadt_data(machine, obj, &pm->fadt);
> @@ -213,6 +215,9 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>          pm->fadt.reset_val = 0xf;
>          pm->fadt.flags |= 1 << ACPI_FADT_F_RESET_REG_SUP;
>          pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
> +        pm->smi_on_cpuhp =
> +            !!(object_property_get_uint(lpc, "x-smi-negotiated-features", NULL)
> +               & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));

(1) style: I think the "&" operator should be placed at the end of the
line, not at the start

(2) style: can we introduce a macro for the new property name?

>      }
>  
>      /* The above need not be conditional on machine type because the reset port
> @@ -1515,6 +1520,31 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>          aml_append(dev, aml_name_decl("_UID", aml_int(1)));
>          aml_append(dev, build_q35_osc_method());
>          aml_append(sb_scope, dev);
> +
> +        if (pm->smi_on_cpuhp) {
> +            /* reserve SMI block resources */
> +            dev = aml_device("PCI0.SMI0");
> +            aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A06")));
> +            aml_append(dev, aml_name_decl("_UID", aml_string("SMI resources")));
> +            crs = aml_resource_template();
> +            aml_append(crs,
> +                aml_io(
> +                       AML_DECODE16,
> +                       ACPI_PORT_SMI_CMD,
> +                       ACPI_PORT_SMI_CMD,
> +                       1,
> +                       1)
> +            );

(3) Just a thought: I wonder if we should reserve both ports (0xB2 and
0xB3 too). For now we don't have any use for the "data" port, but
arguably it's part of the same register block.

> +            aml_append(dev, aml_name_decl("_CRS", crs));
> +            aml_append(dev, aml_operation_region("SMIR", AML_SYSTEM_IO,
> +                aml_int(0xB2), 1));

(4) Style: we should probably use ACPI_PORT_SMI_CMD rather than 0xB2

(5) Same as (3) -- make the "data" port part of the opregion?

(We'd still make the SMIC field below cover only 0xB2, of course.)

Anyway, feel free to ignore (3) and (5) if they are out of scope.

> +            field = aml_field("SMIR", AML_BYTE_ACC, AML_NOLOCK,
> +                              AML_WRITE_AS_ZEROS);
> +            aml_append(field, aml_named_field("SMIC", 8));
> +            aml_append(dev, field);
> +            aml_append(sb_scope, dev);
> +        }
> +
>          aml_append(dsdt, sb_scope);
>  
>          build_hpet_aml(dsdt);
> @@ -1530,7 +1560,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>          build_legacy_cpu_hotplug_aml(dsdt, machine, pm->cpu_hp_io_base);
>      } else {
>          CPUHotplugFeatures opts = {
> -            .acpi_1_compatible = true, .has_legacy_cphp = true
> +            .acpi_1_compatible = true, .has_legacy_cphp = true,
> +            .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" : NULL,
>          };
>          build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
>                         "\\_SB.PCI0", "\\_GPE._E02");
> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> index 83da7346ab..5ebea70a8d 100644
> --- a/hw/isa/lpc_ich9.c
> +++ b/hw/isa/lpc_ich9.c
> @@ -643,6 +643,9 @@ static void ich9_lpc_initfn(Object *obj)
>                                    &acpi_enable_cmd, OBJ_PROP_FLAG_READ);
>      object_property_add_uint8_ptr(OBJECT(lpc), ACPI_PM_PROP_ACPI_DISABLE_CMD,
>                                    &acpi_disable_cmd, OBJ_PROP_FLAG_READ);
> +    object_property_add_uint64_ptr(obj, "x-smi-negotiated-features",


(6) same as (2), we should use a macro for the property name.

> +                                   &lpc->smi_negotiated_features,
> +                                   OBJ_PROP_FLAG_READ);
>  
>      ich9_pm_add_properties(obj, &lpc->pm);
>  }
> 

I'll follow up with test results.

Thanks
Laszlo



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 3/3] x68: acpi: trigger SMI before scanning for hotplugged CPUs
  2020-07-14 12:28   ` Laszlo Ersek
@ 2020-07-14 12:41     ` Laszlo Ersek
  2020-07-14 15:19       ` Igor Mammedov
  2020-07-17 13:13     ` Igor Mammedov
  1 sibling, 1 reply; 19+ messages in thread
From: Laszlo Ersek @ 2020-07-14 12:41 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel; +Cc: boris.ostrovsky, Peter Krempa, liran.alon

On 07/14/20 14:28, Laszlo Ersek wrote:
> (CC'ing Peter Krempa due to virsh setvcpu (singular) / setvcpus (plural)
> references)
> 
> On 07/10/20 18:17, Igor Mammedov wrote:
>> In case firmware has negotiated CPU hotplug SMI feature, generate
>> AML to describe SMI IO port region and send SMI to firmware
>> on each CPU hotplug SCI.
>>
>> It might be not really usable, but should serve as a starting point to
>> discuss how better to deal with split hotplug sequence during hot-add
>> (
>> ex scenario where it will break is:
>>    hot-add
>>       -> (QEMU) add CPU in hotplug regs
>>       -> (QEMU) SCI
>>            -1-> (OS) scan
>>                -1-> (OS) SMI
>>                -1-> (FW) pull in CPU1 ***
>>                -1-> (OS) start iterating hotplug regs
>>    hot-add
>>       -> (QEMU) add CPU in hotplug regs
>>       -> (QEMU) SCI
>>             -2-> (OS) scan (blocked on mutex till previous scan is finished)
>>                -1-> (OS) 1st added CPU1 send device check event -> INIT/SIPI
>>                -1-> (OS) 1st added CPU2 send device check event -> INIT/SIPI
>>                        that's where it explodes, since FW didn't see CPU2
>>                        when SMI was called
>> )
>>
>> hot remove will throw in yet another set of problems, so lets discuss
>> both here and see if we can  really share hotplug registers block between
>> FW and AML or we should do something else with it.
> 
> This issue is generally triggered by management applications such as
> libvirt that issue device_add commands in quick succession. For libvirt,
> the command is "virsh setvcpus" (plural) with multiple CPUs specified
> for plugging. The singular "virsh setvcpu" command, which is more
> friendly towards guest NUMA, does not run into the symptom.
> 
> The scope of the scan method lock is not large enough, with SMI in the
> picture.
> 
> I suggest that we not uproot the existing AML code or the hotplug
> register block. Instead, I suggest that we add serialization at a higher
> level, with sufficient scope.
> 
> QEMU:
> 
> - introduce a new flag standing for "CPU plug operation in progress"
> 
> - if ICH9_LPC_SMI_F_BROADCAST_BIT has been negotiated:
> 
>   - "device_add" and "device_del" should enforce
>     ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT and
>     ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, respectively
> 
>   - both device_add and device_del (for VCPUs) should set check the
>     "in progress" flag.
> 
>     - If set, reject the request synchronously
> 
>     - Otherwise, set the flag, and commence the operation
> 
>   - in cpu_hotplug_wr(), where we emit the ACPI_DEVICE_OST event with
>     qapi_event_send_acpi_device_ost(), clear the "in-progress" flag.
> 
> - If QEMU executes the QMP command processing and the cpu_hotplug_wr()
> function on different (host OS) threads, then perhaps we should use an
> atomic type for the flag. (Not sure about locking between QEMU threads,
> sorry.) I don't really expect race conditions, but in case we ever get
> stuck with the flag, we should make sure that the stuck state is "in
> progress", and not "not in progress". (The former state can prevent
> further plug operations, but cannot cause the guest to lose state.)

Furthermore, the "CPU plug operation in progress" flag should be:
- either migrated,
- or a migration blocker.

Because on the destination host, device_add should be possible if and
only if the plug operation completed (either still on the source host,
or on the destination host).

I guess that the "migration blocker" option is easier.

Anyway I assume this is already handled with memory hotplug somehow
(i.e., migration attempt between device_add and ACPI_DEVICE_OST).

Thanks,
Laszlo



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 3/3] x68: acpi: trigger SMI before scanning for hotplugged CPUs
  2020-07-14 12:41     ` Laszlo Ersek
@ 2020-07-14 15:19       ` Igor Mammedov
  2020-07-15 12:38         ` Laszlo Ersek
  0 siblings, 1 reply; 19+ messages in thread
From: Igor Mammedov @ 2020-07-14 15:19 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: boris.ostrovsky, Peter Krempa, liran.alon, qemu-devel

On Tue, 14 Jul 2020 14:41:28 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 07/14/20 14:28, Laszlo Ersek wrote:
> > (CC'ing Peter Krempa due to virsh setvcpu (singular) / setvcpus (plural)
> > references)
> > 
> > On 07/10/20 18:17, Igor Mammedov wrote:  
> >> In case firmware has negotiated CPU hotplug SMI feature, generate
> >> AML to describe SMI IO port region and send SMI to firmware
> >> on each CPU hotplug SCI.
> >>
> >> It might be not really usable, but should serve as a starting point to
> >> discuss how better to deal with split hotplug sequence during hot-add
> >> (
> >> ex scenario where it will break is:
> >>    hot-add  
> >>       -> (QEMU) add CPU in hotplug regs
> >>       -> (QEMU) SCI  
> >>            -1-> (OS) scan
> >>                -1-> (OS) SMI
> >>                -1-> (FW) pull in CPU1 ***
> >>                -1-> (OS) start iterating hotplug regs
> >>    hot-add  
> >>       -> (QEMU) add CPU in hotplug regs
> >>       -> (QEMU) SCI  
> >>             -2-> (OS) scan (blocked on mutex till previous scan is finished)
> >>                -1-> (OS) 1st added CPU1 send device check event -> INIT/SIPI
> >>                -1-> (OS) 1st added CPU2 send device check event -> INIT/SIPI
> >>                        that's where it explodes, since FW didn't see CPU2
> >>                        when SMI was called
> >> )
> >>
> >> hot remove will throw in yet another set of problems, so lets discuss
> >> both here and see if we can  really share hotplug registers block between
> >> FW and AML or we should do something else with it.  
> > 
> > This issue is generally triggered by management applications such as
> > libvirt that issue device_add commands in quick succession. For libvirt,
> > the command is "virsh setvcpus" (plural) with multiple CPUs specified
> > for plugging. The singular "virsh setvcpu" command, which is more
> > friendly towards guest NUMA, does not run into the symptom.
> > 
> > The scope of the scan method lock is not large enough, with SMI in the
> > picture.
> > 
> > I suggest that we not uproot the existing AML code or the hotplug
> > register block. Instead, I suggest that we add serialization at a higher
> > level, with sufficient scope.
> > 
> > QEMU:
> > 
> > - introduce a new flag standing for "CPU plug operation in progress"
> > 
> > - if ICH9_LPC_SMI_F_BROADCAST_BIT has been negotiated:
> > 
> >   - "device_add" and "device_del" should enforce
> >     ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT and
> >     ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, respectively
> > 
> >   - both device_add and device_del (for VCPUs) should set check the
> >     "in progress" flag.
> > 
> >     - If set, reject the request synchronously
> > 
> >     - Otherwise, set the flag, and commence the operation
> > 
> >   - in cpu_hotplug_wr(), where we emit the ACPI_DEVICE_OST event with
> >     qapi_event_send_acpi_device_ost(), clear the "in-progress" flag.
> > 
> > - If QEMU executes the QMP command processing and the cpu_hotplug_wr()
> > function on different (host OS) threads, then perhaps we should use an
> > atomic type for the flag. (Not sure about locking between QEMU threads,
> > sorry.) I don't really expect race conditions, but in case we ever get
> > stuck with the flag, we should make sure that the stuck state is "in
> > progress", and not "not in progress". (The former state can prevent
> > further plug operations, but cannot cause the guest to lose state.)  
> 
> Furthermore, the "CPU plug operation in progress" flag should be:
> - either migrated,
> - or a migration blocker.
> 
> Because on the destination host, device_add should be possible if and
> only if the plug operation completed (either still on the source host,
> or on the destination host).

I have a way more simple alternative idea, which doesn't involve libvirt.

We can change AML to
  1. cache hotplugged CPUs from controller
  2. send SMI
  3. send Notify event to OS to online cached CPUs
this way we never INIT/SIPI cpus that FW hasn't seen yet
as for FW, it can relocate extra CPU that arrived after #1
it won't cause any harm as on the next SCI AML will pick up those
CPUs and SMI upcall will be just NOP.

I'll post a patch here on top of this series for you to try
(without any of your comments addressed yet, as it's already written
and I was testing it for a while to make sure it won't explode
with various windows versions)

> I guess that the "migration blocker" option is easier.
> 
> Anyway I assume this is already handled with memory hotplug somehow
> (i.e., migration attempt between device_add and ACPI_DEVICE_OST).

Thanks for comments,
I'll need some time to ponder on other comments and do some
palaeontology research to answer questions
(aka. I need to make up excuses for the code I wrote :) )
 
> Thanks,
> Laszlo



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/3] x86: fix cpu hotplug with secure boot
  2020-07-10 16:17 [RFC 0/3] x86: fix cpu hotplug with secure boot Igor Mammedov
                   ` (3 preceding siblings ...)
  2020-07-14  9:58 ` [RFC 0/3] x86: fix cpu hotplug with secure boot Laszlo Ersek
@ 2020-07-14 18:26 ` Laszlo Ersek
  4 siblings, 0 replies; 19+ messages in thread
From: Laszlo Ersek @ 2020-07-14 18:26 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel; +Cc: boris.ostrovsky, liran.alon

On 07/10/20 18:17, Igor Mammedov wrote:
> CPU hotplug with Secure Boot was not really supported and firmware wasn't aware
> of hotplugged CPUs (which might lead to guest crashes). During 4.2 we introduced
> locked SMI handler RAM arrea to make sure that guest OS wasn't able to inject
> its own SMI handler and OVMF added initial CPU hotplug support.
>
> This series is QEMU part of that support [1] which lets QMVF tell QEMU that
> CPU hotplug with SMI broadcast enabled is supported so that QEMU would be able
> to prevent hotplug in case it's not supported and trigger SMI on hotplug when
> it's necessary.
>
> 1) CPU hotplug negotiation part was introduced later so it might not be
> in upstream OVMF yet or I might have missed the patch on edk2-devel
> (Laszlo will point out to it/post formal patch)
>
> Igor Mammedov (3):
>   x86: lpc9: let firmware negotiate CPU hotplug SMI feature
>   x86: cphp: prevent guest crash on CPU hotplug when broadcast SMI is in
>     use
>   x68: acpi: trigger SMI before scanning for hotplugged CPUs
>
>  include/hw/acpi/cpu.h  |  1 +
>  include/hw/i386/ich9.h |  1 +
>  hw/acpi/cpu.c          |  6 ++++++
>  hw/acpi/ich9.c         | 12 +++++++++++-
>  hw/i386/acpi-build.c   | 33 ++++++++++++++++++++++++++++++++-
>  hw/i386/pc.c           | 15 ++++++++++++++-
>  hw/isa/lpc_ich9.c      | 10 ++++++++++
>  7 files changed, 75 insertions(+), 3 deletions(-)
>

I applied the series on top of QEMU commit 20c1df5476e1,
plus the unrelated build fix that I proposed in:

  Re: [PULL 14/31] target/i386: reimplement f2xm1 using floatx80
  operation

at:

  https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg04714.html
  (alternative link:
  <http://mid.mail-archive.com/a3302e58-c470-9305-b106-a2b6b2c52d39@redhat.com>)

I used the following command for hotplug:

$ virsh setvcpu ovmf.fedora.q35 3 --enable --live

Results:

  OVMF SMM_REQUIRE  edk2                machine type   result
  ----------------  ------------------  -------------  ----------------------------
  FALSE             9c6f3545aee0        pc-i440fx-5.0  hotplug accepted [1]
  FALSE             9c6f3545aee0        pc-i440fx-5.1  hotplug accepted [1]
  FALSE             9c6f3545aee0        pc-q35-5.0     hotplug accepted [1]
  FALSE             9c6f3545aee0        pc-q35-5.1     hotplug accepted [1]
  FALSE             9c6f3545aee0+patch  pc-i440fx-5.0  hotplug accepted [1]
  FALSE             9c6f3545aee0+patch  pc-i440fx-5.1  hotplug accepted [1]
  FALSE             9c6f3545aee0+patch  pc-q35-5.0     hotplug accepted [1]
  FALSE             9c6f3545aee0+patch  pc-q35-5.1     hotplug accepted [1]
  TRUE              9c6f3545aee0        pc-q35-5.0     hotplug rejected [2]
  TRUE              9c6f3545aee0        pc-q35-5.1     hotplug rejected [2]
  TRUE              9c6f3545aee0+BUG    pc-q35-5.1     negotiation rejected [3]
  TRUE              9c6f3545aee0+patch  pc-q35-5.0     hotplug rejected [2]
  TRUE              9c6f3545aee0+patch  pc-q35-5.1     hotplug accepted [4] [5] [6]

[1] In this case, i.e., when OVMF is built without -D SMM_REQUIRE, the
    firmware is not supposed to learn about CPU hotplug at OS runtime;
    only the OS should care. No SMI is raised in ACPI. So this is a
    regression-test for when SMM is not used at all.

[2] "error: internal error: unable to execute QEMU command 'device_add':
     cpu hotplug SMI was not enabled by firmware"

[3] In this test, I intentionally broke the firmware so that it would
    negotiate "CPU hotplug with SMI", but not "SMI broadcast". In this
    case, QEMU rejects the feature request, the firmware detects that,
    and hangs (intentionally, as this is a programming error in the
    firmware -- a failed assertion).

[4] Tested hotplug in Linux guest with "rdmsr -a 0x3a", "taskset -c X
    efibootmgr", and S3 suspend/resume, as described at
    https://github.com/tianocore/tianocore.github.io/wiki/Testing-SMM-with-QEMU,-KVM-and-libvirt#tests-to-perform-in-the-installed-guest-fedora-26-guest

[5] Also tested hotplug with Windows Server 2012 R2; checking the CPU
    count in the Task Manager | Logical Processors view, and in Device
    Manager | Processors. Tested S3 suspend/resume too.

[6] The S3 suspend/resume test is relevant in two forms -- first, after
    hot-adding CPUs, S3 suspend/resume continues to work.

    Second, during S3 resume, the "S3 boot script" re-selects the same
    SMI features originally negotiated during normal boot, so plugging a
    new CPU works after S3 resume too.

    Consider the following S3 boot script difference (executed during S3
    resume), taken between pc-q35-5.0 and pc-q35-5.1:

     ExecuteBootScript - 7BB67037
     EFI_BOOT_SCRIPT_MEM_WRITE_OPCODE
     BootScriptExecuteMemoryWrite - 0x7BB66018, 0x00000018, 0x00000000
     S3BootScriptWidthUint8 - 0x7BB66018 (0x00)
     S3BootScriptWidthUint8 - 0x7BB66019 (0x2B)
     S3BootScriptWidthUint8 - 0x7BB6601A (0x00)
     S3BootScriptWidthUint8 - 0x7BB6601B (0x18)
     S3BootScriptWidthUint8 - 0x7BB6601C (0x00)
     S3BootScriptWidthUint8 - 0x7BB6601D (0x00)
     S3BootScriptWidthUint8 - 0x7BB6601E (0x00)
     S3BootScriptWidthUint8 - 0x7BB6601F (0x08)
     S3BootScriptWidthUint8 - 0x7BB66020 (0x00)
     S3BootScriptWidthUint8 - 0x7BB66021 (0x00)
     S3BootScriptWidthUint8 - 0x7BB66022 (0x00)
     S3BootScriptWidthUint8 - 0x7BB66023 (0x00)
     S3BootScriptWidthUint8 - 0x7BB66024 (0x7B)
     S3BootScriptWidthUint8 - 0x7BB66025 (0xB6)
     S3BootScriptWidthUint8 - 0x7BB66026 (0x60)
     S3BootScriptWidthUint8 - 0x7BB66027 (0x28)
    -S3BootScriptWidthUint8 - 0x7BB66028 (0x01)
    +S3BootScriptWidthUint8 - 0x7BB66028 (0x03)
     S3BootScriptWidthUint8 - 0x7BB66029 (0x00)
     S3BootScriptWidthUint8 - 0x7BB6602A (0x00)
     S3BootScriptWidthUint8 - 0x7BB6602B (0x00)
     S3BootScriptWidthUint8 - 0x7BB6602C (0x00)
     S3BootScriptWidthUint8 - 0x7BB6602D (0x00)
     S3BootScriptWidthUint8 - 0x7BB6602E (0x00)
     S3BootScriptWidthUint8 - 0x7BB6602F (0x00)

    This is an fw_cfg DMA write preparation.

    - The first four bytes are "FWCfgDmaAccess.control" (which is big
      endian), performing a "select" (via bit#3 in value 0x18) for item
      0x2b, and initiating a "write" (bit#4 in value 0x18).

    - The next four bytes (BE) specify "FWCfgDmaAccess.length" = 8 --
      which is the size of the SMI guest features bitmask.

    - The next eight bytes (BE) are "FWCfgDmaAccess.address" =
      0x7BB66028.

    - The blob to transfer follows the FWCfgDmaAccess structure
      immediately, at 0x7BB66018 + 4 + 4 + 8 = 0x7BB66028. It consists
      of a little-endian uint64_t: the SMI guest  features bitmask. The
      diff shows that ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT (value 2) is
      re-selected during S3 resume, on "pc-q35-5.1".

    The firmware checks the result of the feature (re)selection during
    S3 resume too.

All of the test cases behaved as expected.

I understand this series is an RFC, and my own self requested updates,
but to capture the results thus far (for plug only):

Tested-by: Laszlo Ersek <lersek@redhat.com>

Thanks
Laszlo



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 3/3] x68: acpi: trigger SMI before scanning for hotplugged CPUs
  2020-07-14 15:19       ` Igor Mammedov
@ 2020-07-15 12:38         ` Laszlo Ersek
  2020-07-15 13:43           ` Igor Mammedov
  0 siblings, 1 reply; 19+ messages in thread
From: Laszlo Ersek @ 2020-07-15 12:38 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: boris.ostrovsky, Peter Krempa, liran.alon, qemu-devel

On 07/14/20 17:19, Igor Mammedov wrote:
> On Tue, 14 Jul 2020 14:41:28 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> On 07/14/20 14:28, Laszlo Ersek wrote:
>>> (CC'ing Peter Krempa due to virsh setvcpu (singular) / setvcpus (plural)
>>> references)
>>>
>>> On 07/10/20 18:17, Igor Mammedov wrote:  
>>>> In case firmware has negotiated CPU hotplug SMI feature, generate
>>>> AML to describe SMI IO port region and send SMI to firmware
>>>> on each CPU hotplug SCI.
>>>>
>>>> It might be not really usable, but should serve as a starting point to
>>>> discuss how better to deal with split hotplug sequence during hot-add
>>>> (
>>>> ex scenario where it will break is:
>>>>    hot-add  
>>>>       -> (QEMU) add CPU in hotplug regs
>>>>       -> (QEMU) SCI  
>>>>            -1-> (OS) scan
>>>>                -1-> (OS) SMI
>>>>                -1-> (FW) pull in CPU1 ***
>>>>                -1-> (OS) start iterating hotplug regs
>>>>    hot-add  
>>>>       -> (QEMU) add CPU in hotplug regs
>>>>       -> (QEMU) SCI  
>>>>             -2-> (OS) scan (blocked on mutex till previous scan is finished)
>>>>                -1-> (OS) 1st added CPU1 send device check event -> INIT/SIPI
>>>>                -1-> (OS) 1st added CPU2 send device check event -> INIT/SIPI
>>>>                        that's where it explodes, since FW didn't see CPU2
>>>>                        when SMI was called
>>>> )
>>>>
>>>> hot remove will throw in yet another set of problems, so lets discuss
>>>> both here and see if we can  really share hotplug registers block between
>>>> FW and AML or we should do something else with it.  
>>>
>>> This issue is generally triggered by management applications such as
>>> libvirt that issue device_add commands in quick succession. For libvirt,
>>> the command is "virsh setvcpus" (plural) with multiple CPUs specified
>>> for plugging. The singular "virsh setvcpu" command, which is more
>>> friendly towards guest NUMA, does not run into the symptom.
>>>
>>> The scope of the scan method lock is not large enough, with SMI in the
>>> picture.
>>>
>>> I suggest that we not uproot the existing AML code or the hotplug
>>> register block. Instead, I suggest that we add serialization at a higher
>>> level, with sufficient scope.
>>>
>>> QEMU:
>>>
>>> - introduce a new flag standing for "CPU plug operation in progress"
>>>
>>> - if ICH9_LPC_SMI_F_BROADCAST_BIT has been negotiated:
>>>
>>>   - "device_add" and "device_del" should enforce
>>>     ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT and
>>>     ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, respectively
>>>
>>>   - both device_add and device_del (for VCPUs) should set check the
>>>     "in progress" flag.
>>>
>>>     - If set, reject the request synchronously
>>>
>>>     - Otherwise, set the flag, and commence the operation
>>>
>>>   - in cpu_hotplug_wr(), where we emit the ACPI_DEVICE_OST event with
>>>     qapi_event_send_acpi_device_ost(), clear the "in-progress" flag.
>>>
>>> - If QEMU executes the QMP command processing and the cpu_hotplug_wr()
>>> function on different (host OS) threads, then perhaps we should use an
>>> atomic type for the flag. (Not sure about locking between QEMU threads,
>>> sorry.) I don't really expect race conditions, but in case we ever get
>>> stuck with the flag, we should make sure that the stuck state is "in
>>> progress", and not "not in progress". (The former state can prevent
>>> further plug operations, but cannot cause the guest to lose state.)  
>>
>> Furthermore, the "CPU plug operation in progress" flag should be:
>> - either migrated,
>> - or a migration blocker.
>>
>> Because on the destination host, device_add should be possible if and
>> only if the plug operation completed (either still on the source host,
>> or on the destination host).
> 
> I have a way more simple alternative idea, which doesn't involve libvirt.
> 
> We can change AML to
>   1. cache hotplugged CPUs from controller
>   2. send SMI
>   3. send Notify event to OS to online cached CPUs
> this way we never INIT/SIPI cpus that FW hasn't seen yet
> as for FW, it can relocate extra CPU that arrived after #1
> it won't cause any harm as on the next SCI AML will pick up those
> CPUs and SMI upcall will be just NOP.
> 
> I'll post a patch here on top of this series for you to try
> (without any of your comments addressed yet, as it's already written
> and I was testing it for a while to make sure it won't explode
> with various windows versions)

Sounds good, I'll be happy to test it.

Indeed "no event" is something that the fw deals with gracefully. (IIRC
I wanted to cover a "spurious SMI" explicitly.)

It didn't occur to me that you could dynamically size e.g. a package
object in AML. Based on my reading of the ACPI spec, "VarPackageOp" can
take a *runtime* "NumElements", so if you did two loops, the first loop
could count the pending stuff, and then a VarPackageOp could be used
with just the right NumElements... Anyway, I digress :)

> 
>> I guess that the "migration blocker" option is easier.
>>
>> Anyway I assume this is already handled with memory hotplug somehow
>> (i.e., migration attempt between device_add and ACPI_DEVICE_OST).
> 
> Thanks for comments,
> I'll need some time to ponder on other comments and do some
> palaeontology research to answer questions
> (aka. I need to make up excuses for the code I wrote :) )

haha, thanks :)
Laszlo



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 3/3] x68: acpi: trigger SMI before scanning for hotplugged CPUs
  2020-07-15 12:38         ` Laszlo Ersek
@ 2020-07-15 13:43           ` Igor Mammedov
  2020-07-16 12:36             ` Laszlo Ersek
  0 siblings, 1 reply; 19+ messages in thread
From: Igor Mammedov @ 2020-07-15 13:43 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: boris.ostrovsky, Peter Krempa, liran.alon, qemu-devel

On Wed, 15 Jul 2020 14:38:00 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 07/14/20 17:19, Igor Mammedov wrote:
> > On Tue, 14 Jul 2020 14:41:28 +0200
> > Laszlo Ersek <lersek@redhat.com> wrote:
> >   
> >> On 07/14/20 14:28, Laszlo Ersek wrote:  
> >>> (CC'ing Peter Krempa due to virsh setvcpu (singular) / setvcpus (plural)
> >>> references)
> >>>
> >>> On 07/10/20 18:17, Igor Mammedov wrote:    
> >>>> In case firmware has negotiated CPU hotplug SMI feature, generate
> >>>> AML to describe SMI IO port region and send SMI to firmware
> >>>> on each CPU hotplug SCI.
> >>>>
> >>>> It might be not really usable, but should serve as a starting point to
> >>>> discuss how better to deal with split hotplug sequence during hot-add
> >>>> (
> >>>> ex scenario where it will break is:
> >>>>    hot-add    
> >>>>       -> (QEMU) add CPU in hotplug regs
> >>>>       -> (QEMU) SCI    
> >>>>            -1-> (OS) scan
> >>>>                -1-> (OS) SMI
> >>>>                -1-> (FW) pull in CPU1 ***
> >>>>                -1-> (OS) start iterating hotplug regs
> >>>>    hot-add    
> >>>>       -> (QEMU) add CPU in hotplug regs
> >>>>       -> (QEMU) SCI    
> >>>>             -2-> (OS) scan (blocked on mutex till previous scan is finished)
> >>>>                -1-> (OS) 1st added CPU1 send device check event -> INIT/SIPI
> >>>>                -1-> (OS) 1st added CPU2 send device check event -> INIT/SIPI
> >>>>                        that's where it explodes, since FW didn't see CPU2
> >>>>                        when SMI was called
> >>>> )
> >>>>
> >>>> hot remove will throw in yet another set of problems, so lets discuss
> >>>> both here and see if we can  really share hotplug registers block between
> >>>> FW and AML or we should do something else with it.    
> >>>
> >>> This issue is generally triggered by management applications such as
> >>> libvirt that issue device_add commands in quick succession. For libvirt,
> >>> the command is "virsh setvcpus" (plural) with multiple CPUs specified
> >>> for plugging. The singular "virsh setvcpu" command, which is more
> >>> friendly towards guest NUMA, does not run into the symptom.
> >>>
> >>> The scope of the scan method lock is not large enough, with SMI in the
> >>> picture.
> >>>
> >>> I suggest that we not uproot the existing AML code or the hotplug
> >>> register block. Instead, I suggest that we add serialization at a higher
> >>> level, with sufficient scope.
> >>>
> >>> QEMU:
> >>>
> >>> - introduce a new flag standing for "CPU plug operation in progress"
> >>>
> >>> - if ICH9_LPC_SMI_F_BROADCAST_BIT has been negotiated:
> >>>
> >>>   - "device_add" and "device_del" should enforce
> >>>     ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT and
> >>>     ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, respectively
> >>>
> >>>   - both device_add and device_del (for VCPUs) should set check the
> >>>     "in progress" flag.
> >>>
> >>>     - If set, reject the request synchronously
> >>>
> >>>     - Otherwise, set the flag, and commence the operation
> >>>
> >>>   - in cpu_hotplug_wr(), where we emit the ACPI_DEVICE_OST event with
> >>>     qapi_event_send_acpi_device_ost(), clear the "in-progress" flag.
> >>>
> >>> - If QEMU executes the QMP command processing and the cpu_hotplug_wr()
> >>> function on different (host OS) threads, then perhaps we should use an
> >>> atomic type for the flag. (Not sure about locking between QEMU threads,
> >>> sorry.) I don't really expect race conditions, but in case we ever get
> >>> stuck with the flag, we should make sure that the stuck state is "in
> >>> progress", and not "not in progress". (The former state can prevent
> >>> further plug operations, but cannot cause the guest to lose state.)    
> >>
> >> Furthermore, the "CPU plug operation in progress" flag should be:
> >> - either migrated,
> >> - or a migration blocker.
> >>
> >> Because on the destination host, device_add should be possible if and
> >> only if the plug operation completed (either still on the source host,
> >> or on the destination host).  
> > 
> > I have a way more simple alternative idea, which doesn't involve libvirt.
> > 
> > We can change AML to
> >   1. cache hotplugged CPUs from controller
> >   2. send SMI
> >   3. send Notify event to OS to online cached CPUs
> > this way we never INIT/SIPI cpus that FW hasn't seen yet
> > as for FW, it can relocate extra CPU that arrived after #1
> > it won't cause any harm as on the next SCI AML will pick up those
> > CPUs and SMI upcall will be just NOP.
> > 
> > I'll post a patch here on top of this series for you to try
> > (without any of your comments addressed yet, as it's already written
> > and I was testing it for a while to make sure it won't explode
> > with various windows versions)  
> 
> Sounds good, I'll be happy to test it.
> 
> Indeed "no event" is something that the fw deals with gracefully. (IIRC
> I wanted to cover a "spurious SMI" explicitly.)
is it possible to distinguish "spurious SMI" vs hotplug SMI,
if yes then we probably do not care about any other SMIs except hotplug one.

> It didn't occur to me that you could dynamically size e.g. a package
> object in AML. Based on my reading of the ACPI spec, "VarPackageOp" can
> take a *runtime* "NumElements", so if you did two loops, the first loop
> could count the pending stuff, and then a VarPackageOp could be used
> with just the right NumElements... Anyway, I digress :)

well, it's mine field since Windows implement only a subset of spec
and VarPackageOp is not avalable on all version that support hotplug.
I think, I've narrowed language down to supported subset, so I need
to complete another round of testing to see if I didn't break anything
by accident.
 
> >   
> >> I guess that the "migration blocker" option is easier.
> >>
> >> Anyway I assume this is already handled with memory hotplug somehow
> >> (i.e., migration attempt between device_add and ACPI_DEVICE_OST).  
> > 
> > Thanks for comments,
> > I'll need some time to ponder on other comments and do some
> > palaeontology research to answer questions
> > (aka. I need to make up excuses for the code I wrote :) )  
> 
> haha, thanks :)
> Laszlo



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 3/3] x68: acpi: trigger SMI before scanning for hotplugged CPUs
  2020-07-15 13:43           ` Igor Mammedov
@ 2020-07-16 12:36             ` Laszlo Ersek
  0 siblings, 0 replies; 19+ messages in thread
From: Laszlo Ersek @ 2020-07-16 12:36 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: boris.ostrovsky, Peter Krempa, liran.alon, qemu-devel

On 07/15/20 15:43, Igor Mammedov wrote:
> On Wed, 15 Jul 2020 14:38:00 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> On 07/14/20 17:19, Igor Mammedov wrote:
>>> On Tue, 14 Jul 2020 14:41:28 +0200
>>> Laszlo Ersek <lersek@redhat.com> wrote:
>>>   
>>>> On 07/14/20 14:28, Laszlo Ersek wrote:  
>>>>> (CC'ing Peter Krempa due to virsh setvcpu (singular) / setvcpus (plural)
>>>>> references)
>>>>>
>>>>> On 07/10/20 18:17, Igor Mammedov wrote:    
>>>>>> In case firmware has negotiated CPU hotplug SMI feature, generate
>>>>>> AML to describe SMI IO port region and send SMI to firmware
>>>>>> on each CPU hotplug SCI.
>>>>>>
>>>>>> It might be not really usable, but should serve as a starting point to
>>>>>> discuss how better to deal with split hotplug sequence during hot-add
>>>>>> (
>>>>>> ex scenario where it will break is:
>>>>>>    hot-add    
>>>>>>       -> (QEMU) add CPU in hotplug regs
>>>>>>       -> (QEMU) SCI    
>>>>>>            -1-> (OS) scan
>>>>>>                -1-> (OS) SMI
>>>>>>                -1-> (FW) pull in CPU1 ***
>>>>>>                -1-> (OS) start iterating hotplug regs
>>>>>>    hot-add    
>>>>>>       -> (QEMU) add CPU in hotplug regs
>>>>>>       -> (QEMU) SCI    
>>>>>>             -2-> (OS) scan (blocked on mutex till previous scan is finished)
>>>>>>                -1-> (OS) 1st added CPU1 send device check event -> INIT/SIPI
>>>>>>                -1-> (OS) 1st added CPU2 send device check event -> INIT/SIPI
>>>>>>                        that's where it explodes, since FW didn't see CPU2
>>>>>>                        when SMI was called
>>>>>> )
>>>>>>
>>>>>> hot remove will throw in yet another set of problems, so lets discuss
>>>>>> both here and see if we can  really share hotplug registers block between
>>>>>> FW and AML or we should do something else with it.    
>>>>>
>>>>> This issue is generally triggered by management applications such as
>>>>> libvirt that issue device_add commands in quick succession. For libvirt,
>>>>> the command is "virsh setvcpus" (plural) with multiple CPUs specified
>>>>> for plugging. The singular "virsh setvcpu" command, which is more
>>>>> friendly towards guest NUMA, does not run into the symptom.
>>>>>
>>>>> The scope of the scan method lock is not large enough, with SMI in the
>>>>> picture.
>>>>>
>>>>> I suggest that we not uproot the existing AML code or the hotplug
>>>>> register block. Instead, I suggest that we add serialization at a higher
>>>>> level, with sufficient scope.
>>>>>
>>>>> QEMU:
>>>>>
>>>>> - introduce a new flag standing for "CPU plug operation in progress"
>>>>>
>>>>> - if ICH9_LPC_SMI_F_BROADCAST_BIT has been negotiated:
>>>>>
>>>>>   - "device_add" and "device_del" should enforce
>>>>>     ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT and
>>>>>     ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, respectively
>>>>>
>>>>>   - both device_add and device_del (for VCPUs) should set check the
>>>>>     "in progress" flag.
>>>>>
>>>>>     - If set, reject the request synchronously
>>>>>
>>>>>     - Otherwise, set the flag, and commence the operation
>>>>>
>>>>>   - in cpu_hotplug_wr(), where we emit the ACPI_DEVICE_OST event with
>>>>>     qapi_event_send_acpi_device_ost(), clear the "in-progress" flag.
>>>>>
>>>>> - If QEMU executes the QMP command processing and the cpu_hotplug_wr()
>>>>> function on different (host OS) threads, then perhaps we should use an
>>>>> atomic type for the flag. (Not sure about locking between QEMU threads,
>>>>> sorry.) I don't really expect race conditions, but in case we ever get
>>>>> stuck with the flag, we should make sure that the stuck state is "in
>>>>> progress", and not "not in progress". (The former state can prevent
>>>>> further plug operations, but cannot cause the guest to lose state.)    
>>>>
>>>> Furthermore, the "CPU plug operation in progress" flag should be:
>>>> - either migrated,
>>>> - or a migration blocker.
>>>>
>>>> Because on the destination host, device_add should be possible if and
>>>> only if the plug operation completed (either still on the source host,
>>>> or on the destination host).  
>>>
>>> I have a way more simple alternative idea, which doesn't involve libvirt.
>>>
>>> We can change AML to
>>>   1. cache hotplugged CPUs from controller
>>>   2. send SMI
>>>   3. send Notify event to OS to online cached CPUs
>>> this way we never INIT/SIPI cpus that FW hasn't seen yet
>>> as for FW, it can relocate extra CPU that arrived after #1
>>> it won't cause any harm as on the next SCI AML will pick up those
>>> CPUs and SMI upcall will be just NOP.
>>>
>>> I'll post a patch here on top of this series for you to try
>>> (without any of your comments addressed yet, as it's already written
>>> and I was testing it for a while to make sure it won't explode
>>> with various windows versions)  
>>
>> Sounds good, I'll be happy to test it.
>>
>> Indeed "no event" is something that the fw deals with gracefully. (IIRC
>> I wanted to cover a "spurious SMI" explicitly.)
> is it possible to distinguish "spurious SMI" vs hotplug SMI,
> if yes then we probably do not care about any other SMIs except hotplug one.

Sorry, I was unclear. By "spurious SMI", I meant that the hotplug SMI
(command value 4) is raised, but then the hotplug register block reports
no CPUs with pending events. The fw code handles that.

> 
>> It didn't occur to me that you could dynamically size e.g. a package
>> object in AML. Based on my reading of the ACPI spec, "VarPackageOp" can
>> take a *runtime* "NumElements", so if you did two loops, the first loop
>> could count the pending stuff, and then a VarPackageOp could be used
>> with just the right NumElements... Anyway, I digress :)
> 
> well, it's mine field since Windows implement only a subset of spec
> and VarPackageOp is not avalable on all version that support hotplug.

Ugh. :/

> I think, I've narrowed language down to supported subset, so I need
> to complete another round of testing to see if I didn't break anything
> by accident.

Thanks.
Laszlo

>  
>>>   
>>>> I guess that the "migration blocker" option is easier.
>>>>
>>>> Anyway I assume this is already handled with memory hotplug somehow
>>>> (i.e., migration attempt between device_add and ACPI_DEVICE_OST).  
>>>
>>> Thanks for comments,
>>> I'll need some time to ponder on other comments and do some
>>> palaeontology research to answer questions
>>> (aka. I need to make up excuses for the code I wrote :) )  
>>
>> haha, thanks :)
>> Laszlo
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 2/3] x86: cphp: prevent guest crash on CPU hotplug when broadcast SMI is in use
  2020-07-14 10:56   ` Laszlo Ersek
@ 2020-07-17 12:57     ` Igor Mammedov
  2020-07-20 17:29       ` Laszlo Ersek
  0 siblings, 1 reply; 19+ messages in thread
From: Igor Mammedov @ 2020-07-17 12:57 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: boris.ostrovsky, liran.alon, qemu-devel

On Tue, 14 Jul 2020 12:56:50 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 07/10/20 18:17, Igor Mammedov wrote:
[...]

> > @@ -1508,6 +1508,17 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
> >          return;
> >      }
> >
> > +    if (pcms->acpi_dev) {
> > +        Error *local_err = NULL;
> > +
> > +        hotplug_handler_pre_plug(HOTPLUG_HANDLER(pcms->acpi_dev), dev,
> > +                                 &local_err);
> > +        if (local_err) {
> > +            error_propagate(errp, local_err);
> > +            return;
> > +        }
> > +    }
> > +
> >      init_topo_info(&topo_info, x86ms);
> >
> >      env->nr_dies = x86ms->smp_dies;
> >  
> 
> (6) This looks sane to me, but I have a question for the *pre-patch*
> code.

(not so short introduction)

plug callbacks were designed for wiring up hotplugged device, hence
it has check for acpi_dev which is one of HW connections needed to make
it work. Later on coldplug was consolidated with it, so plug callbacks
are also handling coldplugged devices.

then later plug callback was split on pre_ and plug_, where pre_
part is called right before device_foo.realize() and plug_ right after.
Split was intended to make graceful failure easier, where pre_ is taking
care of checking already set properties and optionally setting additional
properties and can/should fail without side-effects, and plug_ part
should not fail (maybe there is still cleanup to do there) and used to
finish wiring after the device had been realized.


> 
> I notice that hotplug_handler_pre_plug() is already called from the
> (completely unrelated) function pc_memory_pre_plug().
> 
> In pc_memory_pre_plug(), we have the following snippet:
> 
>     /*
>      * When -no-acpi is used with Q35 machine type, no ACPI is built,
>      * but pcms->acpi_dev is still created. Check !acpi_enabled in
>      * addition to cover this case.
>      */
>     if (!pcms->acpi_dev || !x86_machine_is_acpi_enabled(X86_MACHINE(pcms))) {
>         error_setg(errp,
>                    "memory hotplug is not enabled: missing acpi device or acpi disabled");
>         return;
>     }
> 
> Whereas in pc_cpu_pre_plug(), the present patch only adds a
> "pcms->acpi_dev" nullity check.
> 
> Should pc_cpu_pre_plug() check for ACPI enablement similarly to
> pc_memory_pre_plug()?
extra check is not must have in pc_memory_pre_plug() as it should not break anything
(modulo MMIO memhp interface, which went unnoticed since probably nobody
uses MMIO memhp registers directly (i.e. QEMU's ACPI tables is sole user))
 
> I'm asking for two reasons:
> 
> (6a) for the feature at hand (CPU hotplug with SMI), maybe we should
> write:
> 
>     if (pcms->acpi_dev && x86_machine_is_acpi_enabled(X86_MACHINE(pcms))) {
pretty harmless in the same sense as in pc_memory_pre_plug(),
but I'd rather avoid checks that are not must have.


> (6b) or maybe more strictly, copy the check from memory hotplug (just
> update the error message):
> 
>     if (!pcms->acpi_dev || !x86_machine_is_acpi_enabled(X86_MACHINE(pcms))) {
>         error_setg(errp,
>                    "CPU hotplug is not enabled: missing acpi device or acpi disabled");
can't do it as it will break CPU clodplug when -no-cpi is used

>         return;
>     }
> 
> Because CPU hotplug depends on ACPI too, just like memory hotplug,
> regardless of firmware, and regardless of guest-SMM. Am I correct to
> think that?

We have pcms->acpi_dev check in x86 code because isapc/piix4 machines
will/might not create the pm device (which implements acpi hw). 

(1) if (pcmc->pci_enabled && x86_machine_is_acpi_enabled(X86_MACHINE(pcms)))

if that weren't the case, calls to acpi_dev would be unconditional

I'm adding check into pre_plug so we can gracefully reject device_add
in case firmware is not prepared for handling hotplug SMI. Since
the later might crash the guest. It's for the sake of better user
experience since QEMU might easily run older firmware.

If we don't care about that, we can drop negotiation and the check,
send SMI on hotplug when SMI broadcast is enabled, that might
crash guest since it might be not supported by used fw, but that
was already the case for quite a while and I've heard only few
complaints. (I guess most users have sense not to use something
not impl./supported)


> Basically, I'm asking if we should replicate original commit
> 8cd91acec8df ("pc: fail memory hot-plug/unplug with -no-acpi and Q35
> machine type", 2018-01-12) for CPU hotplug first (in a separate patch!),
> before dealing with "lpc->smi_negotiated_features" in this patch.

I'd rather leave hw part decoupled from acpi tables part,
nothing prevents users implementing their own fw/os which uses
our cpuhp interface directly without ACPI.

> Hmm... I'm getting confused. I *do* see similar checks in pc_cpu_plug()
> and pc_cpu_unplug_request_cb(). But:
> 
> - I don't understand what determines whether we put the ACPI check in
> *PRE* plug functions, or the plug functions,
I beleive [1] answers that

> - and I don't understand why pc_cpu_plug() and
> pc_cpu_unplug_request_cb() only check "pcms->acpi_dev", and not
> x86_machine_is_acpi_enabled().

x86_machine_is_acpi_enabled() is not must have in this case as
callbacks implement only hw part of cpuhp protocol (QEMU part),
what guest uses to handle it (fw tables(qemu generated),
or some custom code) is another story.


> (7) According to my request under patch#1, I propose that we should
> implement a similar rejection for CPU hot-unplug, in this series. (Can
> be a separate patch, of course.)
> 
> Thanks!
> Laszlo
> 
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 3/3] x68: acpi: trigger SMI before scanning for hotplugged CPUs
  2020-07-14 12:28   ` Laszlo Ersek
  2020-07-14 12:41     ` Laszlo Ersek
@ 2020-07-17 13:13     ` Igor Mammedov
  2020-07-20 19:12       ` Laszlo Ersek
  1 sibling, 1 reply; 19+ messages in thread
From: Igor Mammedov @ 2020-07-17 13:13 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: boris.ostrovsky, liran.alon, qemu-devel

On Tue, 14 Jul 2020 14:28:29 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> (CC'ing Peter Krempa due to virsh setvcpu (singular) / setvcpus (plural)
> references)
> 
> On 07/10/20 18:17, Igor Mammedov wrote:
[...]

> (3) Just a thought: I wonder if we should reserve both ports (0xB2 and
> 0xB3 too). For now we don't have any use for the "data" port, but
> arguably it's part of the same register block.

we probably should, might be used for unplug part.

BTW any ideas how we'd like to procceed with unplug?

current flow looks like:

QEMU                       OSPM
unplug_req()
1)   =>SCI     --->
  -------------------------
2)                   handle_sci()
                         scan for events and send
                             Notify per removed CPU
                             clear rm_evt
  -------------------------
3)                   offline cpu
  -------------------------
4)                    call _EJ0 to unplug CPU
                         write into ej_evt
              <-------------
  -------------------------
5)  unplug cb
 

We probably should modify _EJ0 to send SMI instead of direct access
to cpuhp block, the question is how OSPM would tell FW which CPU it
ejects.
                      

[...]



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 2/3] x86: cphp: prevent guest crash on CPU hotplug when broadcast SMI is in use
  2020-07-17 12:57     ` Igor Mammedov
@ 2020-07-20 17:29       ` Laszlo Ersek
  0 siblings, 0 replies; 19+ messages in thread
From: Laszlo Ersek @ 2020-07-20 17:29 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: boris.ostrovsky, liran.alon, qemu-devel

On 07/17/20 14:57, Igor Mammedov wrote:
> On Tue, 14 Jul 2020 12:56:50 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> On 07/10/20 18:17, Igor Mammedov wrote:
> [...]
> 
>>> @@ -1508,6 +1508,17 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>>>          return;
>>>      }
>>>
>>> +    if (pcms->acpi_dev) {
>>> +        Error *local_err = NULL;
>>> +
>>> +        hotplug_handler_pre_plug(HOTPLUG_HANDLER(pcms->acpi_dev), dev,
>>> +                                 &local_err);
>>> +        if (local_err) {
>>> +            error_propagate(errp, local_err);
>>> +            return;
>>> +        }
>>> +    }
>>> +
>>>      init_topo_info(&topo_info, x86ms);
>>>
>>>      env->nr_dies = x86ms->smp_dies;
>>>  
>>
>> (6) This looks sane to me, but I have a question for the *pre-patch*
>> code.
> 
> (not so short introduction)
> 
> plug callbacks were designed for wiring up hotplugged device, hence
> it has check for acpi_dev which is one of HW connections needed to make
> it work. Later on coldplug was consolidated with it, so plug callbacks
> are also handling coldplugged devices.
> 
> then later plug callback was split on pre_ and plug_, where pre_
> part is called right before device_foo.realize() and plug_ right after.
> Split was intended to make graceful failure easier, where pre_ is taking
> care of checking already set properties and optionally setting additional
> properties and can/should fail without side-effects, and plug_ part
> should not fail (maybe there is still cleanup to do there) and used to
> finish wiring after the device had been realized.
> 
> 
>>
>> I notice that hotplug_handler_pre_plug() is already called from the
>> (completely unrelated) function pc_memory_pre_plug().
>>
>> In pc_memory_pre_plug(), we have the following snippet:
>>
>>     /*
>>      * When -no-acpi is used with Q35 machine type, no ACPI is built,
>>      * but pcms->acpi_dev is still created. Check !acpi_enabled in
>>      * addition to cover this case.
>>      */
>>     if (!pcms->acpi_dev || !x86_machine_is_acpi_enabled(X86_MACHINE(pcms))) {
>>         error_setg(errp,
>>                    "memory hotplug is not enabled: missing acpi device or acpi disabled");
>>         return;
>>     }
>>
>> Whereas in pc_cpu_pre_plug(), the present patch only adds a
>> "pcms->acpi_dev" nullity check.
>>
>> Should pc_cpu_pre_plug() check for ACPI enablement similarly to
>> pc_memory_pre_plug()?
> extra check is not must have in pc_memory_pre_plug() as it should not break anything
> (modulo MMIO memhp interface, which went unnoticed since probably nobody
> uses MMIO memhp registers directly (i.e. QEMU's ACPI tables is sole user))
>  
>> I'm asking for two reasons:
>>
>> (6a) for the feature at hand (CPU hotplug with SMI), maybe we should
>> write:
>>
>>     if (pcms->acpi_dev && x86_machine_is_acpi_enabled(X86_MACHINE(pcms))) {
> pretty harmless in the same sense as in pc_memory_pre_plug(),
> but I'd rather avoid checks that are not must have.
> 
> 
>> (6b) or maybe more strictly, copy the check from memory hotplug (just
>> update the error message):
>>
>>     if (!pcms->acpi_dev || !x86_machine_is_acpi_enabled(X86_MACHINE(pcms))) {
>>         error_setg(errp,
>>                    "CPU hotplug is not enabled: missing acpi device or acpi disabled");
> can't do it as it will break CPU clodplug when -no-cpi is used
> 
>>         return;
>>     }
>>
>> Because CPU hotplug depends on ACPI too, just like memory hotplug,
>> regardless of firmware, and regardless of guest-SMM. Am I correct to
>> think that?
> 
> We have pcms->acpi_dev check in x86 code because isapc/piix4 machines
> will/might not create the pm device (which implements acpi hw). 
> 
> (1) if (pcmc->pci_enabled && x86_machine_is_acpi_enabled(X86_MACHINE(pcms)))
> 
> if that weren't the case, calls to acpi_dev would be unconditional
> 
> I'm adding check into pre_plug so we can gracefully reject device_add
> in case firmware is not prepared for handling hotplug SMI. Since
> the later might crash the guest. It's for the sake of better user
> experience since QEMU might easily run older firmware.
> 
> If we don't care about that, we can drop negotiation and the check,
> send SMI on hotplug when SMI broadcast is enabled, that might
> crash guest since it might be not supported by used fw, but that
> was already the case for quite a while and I've heard only few
> complaints. (I guess most users have sense not to use something
> not impl./supported)
> 
> 
>> Basically, I'm asking if we should replicate original commit
>> 8cd91acec8df ("pc: fail memory hot-plug/unplug with -no-acpi and Q35
>> machine type", 2018-01-12) for CPU hotplug first (in a separate patch!),
>> before dealing with "lpc->smi_negotiated_features" in this patch.
> 
> I'd rather leave hw part decoupled from acpi tables part,
> nothing prevents users implementing their own fw/os which uses
> our cpuhp interface directly without ACPI.
> 
>> Hmm... I'm getting confused. I *do* see similar checks in pc_cpu_plug()
>> and pc_cpu_unplug_request_cb(). But:
>>
>> - I don't understand what determines whether we put the ACPI check in
>> *PRE* plug functions, or the plug functions,
> I beleive [1] answers that
> 
>> - and I don't understand why pc_cpu_plug() and
>> pc_cpu_unplug_request_cb() only check "pcms->acpi_dev", and not
>> x86_machine_is_acpi_enabled().
> 
> x86_machine_is_acpi_enabled() is not must have in this case as
> callbacks implement only hw part of cpuhp protocol (QEMU part),
> what guest uses to handle it (fw tables(qemu generated),
> or some custom code) is another story.

Thank you for the explanation!
Laszlo



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 3/3] x68: acpi: trigger SMI before scanning for hotplugged CPUs
  2020-07-17 13:13     ` Igor Mammedov
@ 2020-07-20 19:12       ` Laszlo Ersek
  0 siblings, 0 replies; 19+ messages in thread
From: Laszlo Ersek @ 2020-07-20 19:12 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: boris.ostrovsky, liran.alon, qemu-devel

On 07/17/20 15:13, Igor Mammedov wrote:
> On Tue, 14 Jul 2020 14:28:29 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> (CC'ing Peter Krempa due to virsh setvcpu (singular) / setvcpus (plural)
>> references)
>>
>> On 07/10/20 18:17, Igor Mammedov wrote:
> [...]
> 
>> (3) Just a thought: I wonder if we should reserve both ports (0xB2 and
>> 0xB3 too). For now we don't have any use for the "data" port, but
>> arguably it's part of the same register block.
> 
> we probably should, might be used for unplug part.
> 
> BTW any ideas how we'd like to procceed with unplug?
> 
> current flow looks like:
> 
> QEMU                       OSPM
> unplug_req()
> 1)   =>SCI     --->
>   -------------------------
> 2)                   handle_sci()
>                          scan for events and send
>                              Notify per removed CPU
>                              clear rm_evt
>   -------------------------
> 3)                   offline cpu
>   -------------------------
> 4)                    call _EJ0 to unplug CPU
>                          write into ej_evt
>               <-------------
>   -------------------------
> 5)  unplug cb
>  
> 
> We probably should modify _EJ0 to send SMI instead of direct access
> to cpuhp block, the question is how OSPM would tell FW which CPU it
> ejects.

The optimal solution would be DataTableRegion, but of course Windows
doesn't support that. So the usual replacement is something like below
(we used a variant of it in "docs/specs/vmgenid.txt"):

- Create a new ACPI data table. See "Appendix O - UEFI ACPI Data Table",
table 94, in the UEFI 2.8 spec <https://uefi.org/specifications>. In
this table, set

  Identifier = B055E4C3-B5B5-4948-9D02-A46FCA3B0A3B

(I just generated this with "uuidgen"). Set DataOffset to 54 decimal.
Place 4 zeroes at offset 54, for Data.

- Create a SystemMemory Operation Region, 58 bytes in size (for covering
the entire table), with a single 4-byte Field at offset 54. Use the
add_pointer command for pointing the OperationRegion opcode to offset 0
(that is, the very beginning) of the above-created ACPI table.

(Normally I would suggest creating an Integer object, using add_pointer
on the Integer object, and then dynamically creating an OperationRegion
at that Integer address -- but I don't know if Windows can deal with
dynamically determined OperationRegion addresses.)

- In the CPU hot-unplug ACPI code, write the APIC ID (or the selector I
guess?) of the CPU being un-plugged to the field at offset 54, before
raising the SMI

- Raise the SMI with a new command (value 5, I guess)

- in OVMF, the CpuHotplugSmm driver can register an EndOfDxe callback,
and use EFI_ACPI_SDT_PROTOCOL.GetAcpiTable(), in a loop, for locating
the data table. Save a pointer (in SMRAM) to offset 54 of that table. At
runtime, when handling the SMI (command 5) read the APIC ID (or
selector) from the data field at offset 54.

(The AddTableToList() function in edk2's
"MdeModulePkg/Universal/Acpi/AcpiTableDxe/AcpiTableProtocol.c" file
makes sure that "UEFI ACPI Data Tables" are allocated in AcpiNVS memory,
which is permitted for use as an "SMM communication buffer".)

(The difference with VMGENID is that QEMU need not learn of the
guest-side address of the data table, hence no "write_pointer" command.
Instead, CpuHotplugSmm needs to learn of the address. That can be done
with in an EndOfDxe callback, using EFI_ACPI_SDT_PROTOCOL.GetAcpiTable()
in a loop, but that does require QEMU to produce a real ACPI data table,
not just a header-less blob placed in AcpiNVS memory.)

Thanks,
Laszlo



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-07-20 19:13 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-10 16:17 [RFC 0/3] x86: fix cpu hotplug with secure boot Igor Mammedov
2020-07-10 16:17 ` [RFC 1/3] x86: lpc9: let firmware negotiate CPU hotplug SMI feature Igor Mammedov
2020-07-14 10:19   ` Laszlo Ersek
2020-07-10 16:17 ` [RFC 2/3] x86: cphp: prevent guest crash on CPU hotplug when broadcast SMI is in use Igor Mammedov
2020-07-14 10:56   ` Laszlo Ersek
2020-07-17 12:57     ` Igor Mammedov
2020-07-20 17:29       ` Laszlo Ersek
2020-07-10 16:17 ` [RFC 3/3] x68: acpi: trigger SMI before scanning for hotplugged CPUs Igor Mammedov
2020-07-14 12:28   ` Laszlo Ersek
2020-07-14 12:41     ` Laszlo Ersek
2020-07-14 15:19       ` Igor Mammedov
2020-07-15 12:38         ` Laszlo Ersek
2020-07-15 13:43           ` Igor Mammedov
2020-07-16 12:36             ` Laszlo Ersek
2020-07-17 13:13     ` Igor Mammedov
2020-07-20 19:12       ` Laszlo Ersek
2020-07-14  9:58 ` [RFC 0/3] x86: fix cpu hotplug with secure boot Laszlo Ersek
2020-07-14 10:10   ` Laszlo Ersek
2020-07-14 18:26 ` Laszlo Ersek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.