All of lore.kernel.org
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: Laszlo Ersek <lersek@redhat.com>
Cc: qemu-devel@nongnu.org, ankur.a.arora@oracle.com
Subject: Re: [RFC] ich9:cpuhp: add support for cpu hot-unplug with SMI broadcast enabled
Date: Fri, 27 Nov 2020 16:07:42 +0100	[thread overview]
Message-ID: <20201127160742.4c81cb3d@redhat.com> (raw)
In-Reply-To: <cae2bede-1177-df14-07a6-dc3be75b7edd@redhat.com>

On Fri, 27 Nov 2020 15:48:34 +0100
Laszlo Ersek <lersek@redhat.com> wrote:

> On 11/26/20 21:38, Igor Mammedov wrote:
> > On Thu, 26 Nov 2020 12:17:27 +0100
> > Laszlo Ersek <lersek@redhat.com> wrote:
> >   
> >> On 11/24/20 13:25, Igor Mammedov wrote:  
> 
> >>> diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
> >>> index 9bb22d1270..f68ef6e06c 100644
> >>> --- a/docs/specs/acpi_cpu_hotplug.txt
> >>> +++ b/docs/specs/acpi_cpu_hotplug.txt
> >>> @@ -57,7 +57,11 @@ read access:
> >>>                It's valid only when bit 0 is set.
> >>>             2: Device remove event, used to distinguish device for which
> >>>                no device eject request to OSPM was issued.
> >>> -           3-7: reserved and should be ignored by OSPM
> >>> +           3: reserved and should be ignored by OSPM
> >>> +           4: if set to 1, OSPM requests firmware to perform device eject,
> >>> +              firmware shall clear this event by writing 1 into it before    
> >>
> >> (1) s/clear this event/clear this event bit/
> >>  
> >>> +              performing device eject.    
> >>
> >> (2) move the second and third lines ("firmware shall clear....") over to
> >> the write documentation, below? In particular:
> >>  
> >>> +           5-7: reserved and should be ignored by OSPM
> >>>      [0x5-0x7] reserved
> >>>      [0x8] Command data: (DWORD access)
> >>>            contains 0 unless value last stored in 'Command field' is one of:
> >>> @@ -82,7 +86,10 @@ write access:
> >>>                 selected CPU device
> >>>              3: if set to 1 initiates device eject, set by OSPM when it
> >>>                 triggers CPU device removal and calls _EJ0 method
> >>> -            4-7: reserved, OSPM must clear them before writing to register
> >>> +            4: if set to 1 OSPM hands over device eject to firmware,
> >>> +               Firmware shall issue device eject request as described above
> >>> +               (bit #3) and OSPM should not touch device eject bit (#3),    
> >>
> >> (3) it would be clearer if we documented the exact bit writing order
> >> here:
> >> - clear bit#4, *then* set bit#3 (two write accesses)
> >> - versus clear bit#4 *and* set bit#3 (single access)  
> > 
> > I was thinking that FW should not bother with clearing bit #4,
> > and QEMU should clear it when handling write to bit #3.
> > (it looks like I forgot to actually do that)  
> 
> That should work fine too, as long as it's clearly documented.
> 
> 
> >>> @@ -332,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
> >>>  #define CPU_INSERT_EVENT  "CINS"
> >>>  #define CPU_REMOVE_EVENT  "CRMV"
> >>>  #define CPU_EJECT_EVENT   "CEJ0"
> >>> +#define CPU_FW_EJECT_EVENT "CEJF"
> >>>
> >>>  void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >>>                      hwaddr io_base,
> >>> @@ -384,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >>>          aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
> >>>          /* initiates device eject, write only */
> >>>          aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
> >>> -        aml_append(field, aml_reserved_field(4));
> >>> +        aml_append(field, aml_reserved_field(1));
> >>> +        /* tell firmware to do device eject, write only */
> >>> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
> >>> +        aml_append(field, aml_reserved_field(2));
> >>>          aml_append(field, aml_named_field(CPU_COMMAND, 8));
> >>>          aml_append(cpu_ctrl_dev, field);
> >>>
> >>> @@ -419,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >>>          Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
> >>>          Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
> >>>          Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
> >>> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
> >>>
> >>>          aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
> >>>          aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
> >>> @@ -461,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >>>
> >>>              aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
> >>>              aml_append(method, aml_store(idx, cpu_selector));
> >>> -            aml_append(method, aml_store(one, ej_evt));
> >>> +            if (opts.fw_unplugs_cpu) {
> >>> +                aml_append(method, aml_store(one, fw_ej_evt));
> >>> +                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
> >>> +                           aml_name("%s", opts.smi_path)));
> >>> +            } else {
> >>> +                aml_append(method, aml_store(one, ej_evt));
> >>> +            }
> >>>              aml_append(method, aml_release(ctrl_lock));
> >>>          }
> >>>          aml_append(cpus_dev, method);    
> >>
> >> Hmmm, OK, let me parse this.
> >>
> >> Assume there is a big bunch of device_del QMP commands, QEMU marks the
> >> "remove" event pending on the corresponding set of CPUs, plus also makes
> >> the ACPI interrupt pending. The ACPI interrupt handler in the OS runs,
> >> and calls CSCN. CSCN runs a loop, and for each CPU where the remove
> >> event is pending, notifies the OS one by one. The OS in turn forgets
> >> about the subject CPU, and calls the _EJ0 method on the affected CPU
> >> ACPI object. The _EJ0 method on the CPU ACPI object calls CEJ0, passing
> >> in the affected CPU's identifier.
> >>
> >> The above hunk modifies the CEJ0 method.
> >>
> >> (5) Question: pre-patch, both the CSCN method and the CEJ0 method
> >> acquire the CPLK lock, but CEJ0 is actually called within CSCN
> >> (indirectly, with the OS's cooperation). Is CPLK a recursive lock?  
> > Theoretically scep supports recursive mutexes but I don't think it's the case here.
> > 
> > Considering it works currently, I think OS implements Notify event as async.
> > hence no clash wrt mutex. If EJ0 were handled within CSCN context,
> > EJ0 would mess cpu_selector value that CSCN is also using.  
> 
> Ah indeed. Yes, making Notify pending at first, and then delivering it
> inside the kernel only after the current AML call stack returns -- that
> seems to make sense. Otherwise we could get unbounded recursion (the
> notify handler calls another AML method, which could contain another
> notify ...)
> 
> 
> >> Anyway, let's see the CEJ0 modification. After the OS is done forgetting
> >> about the CPU, the CEJ0 method no longer unplugs the CPU, instead it
> >> sets the new bit#4 in the register block, and raises an SMI.
> >>
> >> (6) So that's one SMI per CPU being removed. Is that OK?  
> > 
> > I guess it has performance penalty but there is nothing we can do about it,
> > OSPM does EJ0 calls asynchronously.  
> 
> OK. Hot-unplug is not a frequent operation.
> 
> 
> >    
> >> (7) What if there are asynchronous plugs going on, and the firmware
> >> notices them in the register block? ... Hm, I hope that should be OK,
> >> because ultimately the CSCN method will learn about those too, and
> >> inform the OS. On plug, the firmware doesn't modify the register block.  
> > shouldn't be issue (modulo bugs, I haven't tried to hot add and hot remove
> > the same CPU at the same time)
> > 
> > i.e. 
> > (QEMU) pause
> > (QEMU) device_add
> > (QEMU) device_del
> > (QEMU) cont
> >   
> >> Ah! OK. I think I understand why bit#4 is important. The firmware may
> >> notice pending remove events, but it must not act upon them -- it must
> >> simply ignore them -- unless bit#4 is also set. Bit#2 set with bit#4
> >> clear means the event is pending (QEMU got a device_del), but the OS has
> >> not forgotten about the CPU yet -- so the firmware must not unplug it
> >> yet. When the modified CEJ0 method runs, it sets bit#4 in addition to
> >> the already set bit#2, advertising that the OS has *already* abandoned
> >> the CPU.  
> > firmware should ignore bit #2, it doesn't mean anything to it, OSPM might
> > ignore or nonsupport CPU removal. What firmware must care about is bit #4,
> > which tells it that OSPM is done with CPU and asks for to be removed by firmware.  
> 
> Makes sense, especially in combination with the idea that clearing the
> fw_remove bit should clear is_removing too.
> 
> The firmware logic needs to be aware of is_removing though, at least
> understand the existence of this bit, as the "get pending" command will
> report such CPUs too that only have is_removing set. Shouldn't be a
> problem, we just have to recognize it.

firmware shouldn't see bit #2 normally, it's cleared in AML during CSCN,
right after remove Notify is sent to OSPM. I don't see a reason for
firmware to use it, I'd just mask it out on firmware side if it messes logic.

potentially if we have concurrent plug/unplug for several CPUs, firmware
might see bit #2 which it should ignore, it's for OSPM consumption only.


> 
> [...]
> 
> 
> Thanks!
> Laszlo



  reply	other threads:[~2020-11-27 15:18 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-24 12:25 [RFC] ich9:cpuhp: add support for cpu hot-unplug with SMI broadcast enabled Igor Mammedov
2020-11-24 22:58 ` Laszlo Ersek
2020-11-26 10:24 ` Ankur Arora
2020-11-26 12:46   ` Laszlo Ersek
2020-11-26 19:50     ` Igor Mammedov
2020-11-27  3:39       ` Ankur Arora
2020-11-27  3:35     ` Ankur Arora
2020-11-27 11:33       ` Igor Mammedov
2020-11-27 15:02         ` Laszlo Ersek
2020-11-27 23:48           ` Ankur Arora
2020-11-30 16:58             ` Laszlo Ersek
2020-11-30 19:45               ` Ankur Arora
2020-11-26 20:45   ` Igor Mammedov
2020-11-26 11:17 ` Laszlo Ersek
2020-11-26 20:38   ` Igor Mammedov
2020-11-27  4:10     ` Ankur Arora
2020-11-27 11:47       ` Igor Mammedov
2020-11-27 23:49         ` Ankur Arora
2020-11-27 15:19       ` Laszlo Ersek
2020-11-28  0:43         ` Ankur Arora
2020-11-30 17:00           ` Laszlo Ersek
2020-11-27 14:48     ` Laszlo Ersek
2020-11-27 15:07       ` Igor Mammedov [this message]
2020-11-27 16:52         ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201127160742.4c81cb3d@redhat.com \
    --to=imammedo@redhat.com \
    --cc=ankur.a.arora@oracle.com \
    --cc=lersek@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.