On Fri, Mar 12, 2021 at 08:38:47AM -0500, Laine Stump wrote: > On 3/11/21 8:19 PM, David Gibson wrote: > > On Thu, Mar 11, 2021 at 05:50:42PM -0300, Daniel Henrique Barboza wrote: > > > > > > > > > On 3/9/21 3:22 AM, Markus Armbruster wrote: > > > > Cc: Paolo and Julia in addition to Igor, because the thread is wandering > > > > towards DeviceState member pending_deleted_event. > > > > > > > > Cc: Laine for libvirt expertise. Laine, if you're not the right person, > > > > please loop in the right person. > > > > > > > > David Gibson writes: > > > > > > > > > On Mon, Mar 08, 2021 at 03:01:53PM -0300, Daniel Henrique Barboza wrote: > > > > > > > > > > > > > > > > > > On 3/8/21 2:04 PM, Markus Armbruster wrote: > > > > > > > Daniel Henrique Barboza writes: > > > > > > > > > > > > > > > On 3/6/21 3:57 AM, Markus Armbruster wrote: > > > > [...] > > > > > > > > > We should document the event's reliability. Are there unplug operations > > > > > > > > > where we can't detect failure? Are there unplug operations where we > > > > > > > > > could, but haven't implemented the event? > > > > > > > > > > > > > > > > The current version of the PowerPC spec that the pseries machine implements > > > > > > > > (LOPAR) does not predict a way for the virtual machine to report a hotunplug > > > > > > > > error back to the hypervisor. If something fails, the hypervisor is left > > > > > > > > in the dark. > > > > > > > > > > > > > > > > What happened in the 6.0.0 dev cycle is that we faced a reliable way of > > > > > > > > > > > > > > Do you mean "unreliable way"? > > > > > > > > > > > > I guess a better word would be 'reproducible', as in we discovered a reproducible > > > > > > way of getting the guest kernel to refuse the CPU hotunplug. > > > > > > > > > > Right. It's worth noting here that in the PAPR model, there are no > > > > > "forced" unplugs. Unplugs always consist of a request to the guest, > > > > > which is then resposible for offlining the device and signalling back > > > > > to the hypervisor that it's done with it. > > > > > > > > > > > > > making CPU hotunplug fail in the guest (trying to hotunplug the last online > > > > > > > > CPU) and the pseries machine was unprepared for dealing with it. We ended up > > > > > > > > implementing a hotunplug timeout and, if the timeout kicks in, we're assuming > > > > > > > > that the CPU hotunplug failed in the guest. This is the first scenario we have > > > > > > > > today where we want to send a QAPI event informing the CPU hotunplug error, > > > > > > > > but this event does not exist in QEMU ATM. > > > > > > > > > > > > > > When the timeout kicks in, how can you know the operation failed? You > > > > > > > better be sure when you send an error event. In other words: what > > > > > > > prevents the scenario where the operation is much slower than you > > > > > > > expect, the timeout expires, the error event is sent, and then the > > > > > > > operation completes successfully? > > > > > > > > > > > > A CPU hotunplug in a pseries guest takes no more than a couple of seconds, even > > > > > > if the guest is under heavy load. The timeout is set to 15 seconds. > > > > > > > > > > Right. We're well aware that a timeout is an ugly hack, since it's > > > > > not really possible to distinguish it from a guest that's just being > > > > > really slow. > > > > > > > > As long as unplug failure cannot be detected reliably, we need a timeout > > > > *somewhere*. I vaguely recall libvirt has one. Laine? > > > > > > Yeah, Libvirt has a timeout for hotunplug operations. I agree that QEMU doing > > > the timeout makes more sense since it has more information about the > > > conditions/mechanics involved. > > > > Right. In particular, I can't really see how libvirt can fully > > implement that timeout. AFAIK qemu has no way of listing or > > cancelling "in flight" unplug requests, so it's entirely possible that > > the unplug could still complete after libvirt's has "timed out". > > (I'm purposefully not trying to understand the full context of all of this, > as I'm mostly unconnected right now, and only popped in at the mention of my > name and CC. So forgive me if I'm missing some of the details of the > conversation - I'm only here to give context about libvirt's timeout during > unplug) > > I didn't remember there being any sort of timeout for unplugs in libvirt, so > I went back into the code and found that there *is* a timeout (I'd just > forgotten that I'd ever seen it), but (for PCI devices, which is the only > hotplug/unplug code I have any real familiarity with) it is just a short (10 > seconds for PPC, 5 seconds for other platforms) to see if the unplug can > complete before the public API returns; if there is a "timeout" then we > still return success (after logging a warning message) but the device stays > in the domain's device list, and nothing else is changed unless/until we > receive a DEVICE_DELETED event from qemu (completely asynchronous - > libvirt's API has long ago returned success) - only then do we remove the > device from libvirt's domain status. libvirt won't/can't ever go back and > retroactively fail the API that's already completed successfully though :-) Huh. That seems... kinda bogus. It really seems to me you need to provide either a synchronous interface: wait indefinitely for completion - or an asynchronous one: always return quickly and user has to wait for the completion event. Having this hybrid just seems confusing. > For VCPUs (which I guess is what you're exclusively talking about here) it > looks like libvirt waits for the same 5-10 seconds (during the unplug API > call) and if it hasn't received DEVICE_DELETED, then it returns an error: > > if ((rc = qemuDomainWaitForDeviceRemoval(vm)) <= 0) { > if (rc == 0) > virReportError(VIR_ERR_OPERATION_TIMEOUT, "%s", > _("vcpu unplug request timed out. Unplug result " > "must be manually inspected in the domain")); > > goto cleanup; > } > > Here's what Peter says in the commit log when he replaced old-useless VCPU > unplug code with new-working code in 2016: > > As the new code is using device_del all the implications of using it > are present. Contrary to the device deletion code, the vcpu deletion > code fails if the unplug request is not executed in time. > > I have no clue why in the case of PCI devices libvirt is logging a warning > and returning success, while in the case of VCPUs it is logging an error and > returning failure. I think there may have originally been a stated/unstated > assumption that the libvirt unplug APIs were synchronous, and both > situations were just the result of later trying to cope with the reality > that the operation is actually asynchronous. That seems pretty plausible. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson