From: David Gibson <david@gibson.dropbear.id.au>
To: Laine Stump <laine@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
michael.roth@amd.com,
Daniel Henrique Barboza <danielhb413@gmail.com>,
Julia Suvorova <jusual@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Markus Armbruster <armbru@redhat.com>,
"qemu-ppc@nongnu.org" <qemu-ppc@nongnu.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Igor Mammedov <imammedo@redhat.com>
Subject: Re: [RFC] adding a generic QAPI event for failed device hotunplug
Date: Mon, 22 Mar 2021 17:43:24 +1100 [thread overview]
Message-ID: <YFg8jCvFBcRUOrWS@yekko.fritz.box> (raw)
In-Reply-To: <70a596e0-102c-60ce-ccf7-6c961fa5eec3@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 7262 bytes --]
On Fri, Mar 12, 2021 at 08:38:47AM -0500, Laine Stump wrote:
> On 3/11/21 8:19 PM, David Gibson wrote:
> > On Thu, Mar 11, 2021 at 05:50:42PM -0300, Daniel Henrique Barboza wrote:
> > >
> > >
> > > On 3/9/21 3:22 AM, Markus Armbruster wrote:
> > > > Cc: Paolo and Julia in addition to Igor, because the thread is wandering
> > > > towards DeviceState member pending_deleted_event.
> > > >
> > > > Cc: Laine for libvirt expertise. Laine, if you're not the right person,
> > > > please loop in the right person.
> > > >
> > > > David Gibson <david@gibson.dropbear.id.au> writes:
> > > >
> > > > > On Mon, Mar 08, 2021 at 03:01:53PM -0300, Daniel Henrique Barboza wrote:
> > > > > >
> > > > > >
> > > > > > On 3/8/21 2:04 PM, Markus Armbruster wrote:
> > > > > > > Daniel Henrique Barboza <danielhb413@gmail.com> writes:
> > > > > > >
> > > > > > > > On 3/6/21 3:57 AM, Markus Armbruster wrote:
> > > > [...]
> > > > > > > > > We should document the event's reliability. Are there unplug operations
> > > > > > > > > where we can't detect failure? Are there unplug operations where we
> > > > > > > > > could, but haven't implemented the event?
> > > > > > > >
> > > > > > > > The current version of the PowerPC spec that the pseries machine implements
> > > > > > > > (LOPAR) does not predict a way for the virtual machine to report a hotunplug
> > > > > > > > error back to the hypervisor. If something fails, the hypervisor is left
> > > > > > > > in the dark.
> > > > > > > >
> > > > > > > > What happened in the 6.0.0 dev cycle is that we faced a reliable way of
> > > > > > >
> > > > > > > Do you mean "unreliable way"?
> > > > > >
> > > > > > I guess a better word would be 'reproducible', as in we discovered a reproducible
> > > > > > way of getting the guest kernel to refuse the CPU hotunplug.
> > > > >
> > > > > Right. It's worth noting here that in the PAPR model, there are no
> > > > > "forced" unplugs. Unplugs always consist of a request to the guest,
> > > > > which is then resposible for offlining the device and signalling back
> > > > > to the hypervisor that it's done with it.
> > > > >
> > > > > > > > making CPU hotunplug fail in the guest (trying to hotunplug the last online
> > > > > > > > CPU) and the pseries machine was unprepared for dealing with it. We ended up
> > > > > > > > implementing a hotunplug timeout and, if the timeout kicks in, we're assuming
> > > > > > > > that the CPU hotunplug failed in the guest. This is the first scenario we have
> > > > > > > > today where we want to send a QAPI event informing the CPU hotunplug error,
> > > > > > > > but this event does not exist in QEMU ATM.
> > > > > > >
> > > > > > > When the timeout kicks in, how can you know the operation failed? You
> > > > > > > better be sure when you send an error event. In other words: what
> > > > > > > prevents the scenario where the operation is much slower than you
> > > > > > > expect, the timeout expires, the error event is sent, and then the
> > > > > > > operation completes successfully?
> > > > > >
> > > > > > A CPU hotunplug in a pseries guest takes no more than a couple of seconds, even
> > > > > > if the guest is under heavy load. The timeout is set to 15 seconds.
> > > > >
> > > > > Right. We're well aware that a timeout is an ugly hack, since it's
> > > > > not really possible to distinguish it from a guest that's just being
> > > > > really slow.
> > > >
> > > > As long as unplug failure cannot be detected reliably, we need a timeout
> > > > *somewhere*. I vaguely recall libvirt has one. Laine?
> > >
> > > Yeah, Libvirt has a timeout for hotunplug operations. I agree that QEMU doing
> > > the timeout makes more sense since it has more information about the
> > > conditions/mechanics involved.
> >
> > Right. In particular, I can't really see how libvirt can fully
> > implement that timeout. AFAIK qemu has no way of listing or
> > cancelling "in flight" unplug requests, so it's entirely possible that
> > the unplug could still complete after libvirt's has "timed out".
>
> (I'm purposefully not trying to understand the full context of all of this,
> as I'm mostly unconnected right now, and only popped in at the mention of my
> name and CC. So forgive me if I'm missing some of the details of the
> conversation - I'm only here to give context about libvirt's timeout during
> unplug)
>
> I didn't remember there being any sort of timeout for unplugs in libvirt, so
> I went back into the code and found that there *is* a timeout (I'd just
> forgotten that I'd ever seen it), but (for PCI devices, which is the only
> hotplug/unplug code I have any real familiarity with) it is just a short (10
> seconds for PPC, 5 seconds for other platforms) to see if the unplug can
> complete before the public API returns; if there is a "timeout" then we
> still return success (after logging a warning message) but the device stays
> in the domain's device list, and nothing else is changed unless/until we
> receive a DEVICE_DELETED event from qemu (completely asynchronous -
> libvirt's API has long ago returned success) - only then do we remove the
> device from libvirt's domain status. libvirt won't/can't ever go back and
> retroactively fail the API that's already completed successfully though :-)
Huh. That seems... kinda bogus. It really seems to me you need to
provide either a synchronous interface: wait indefinitely for
completion - or an asynchronous one: always return quickly and user
has to wait for the completion event. Having this hybrid just seems
confusing.
> For VCPUs (which I guess is what you're exclusively talking about here) it
> looks like libvirt waits for the same 5-10 seconds (during the unplug API
> call) and if it hasn't received DEVICE_DELETED, then it returns an error:
>
> if ((rc = qemuDomainWaitForDeviceRemoval(vm)) <= 0) {
> if (rc == 0)
> virReportError(VIR_ERR_OPERATION_TIMEOUT, "%s",
> _("vcpu unplug request timed out. Unplug result "
> "must be manually inspected in the domain"));
>
> goto cleanup;
> }
>
> Here's what Peter says in the commit log when he replaced old-useless VCPU
> unplug code with new-working code in 2016:
>
> As the new code is using device_del all the implications of using it
> are present. Contrary to the device deletion code, the vcpu deletion
> code fails if the unplug request is not executed in time.
>
> I have no clue why in the case of PCI devices libvirt is logging a warning
> and returning success, while in the case of VCPUs it is logging an error and
> returning failure. I think there may have originally been a stated/unstated
> assumption that the libvirt unplug APIs were synchronous, and both
> situations were just the result of later trying to cope with the reality
> that the operation is actually asynchronous.
That seems pretty plausible.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
prev parent reply other threads:[~2021-03-22 7:58 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-05 18:16 [RFC] adding a generic QAPI event for failed device hotunplug Daniel Henrique Barboza
2021-03-06 6:57 ` Markus Armbruster
2021-03-08 14:22 ` Daniel Henrique Barboza
2021-03-08 17:04 ` Markus Armbruster
2021-03-08 18:01 ` Daniel Henrique Barboza
2021-03-09 3:22 ` David Gibson
2021-03-09 6:22 ` Markus Armbruster
2021-03-11 20:50 ` Daniel Henrique Barboza
2021-03-12 1:19 ` David Gibson
2021-03-12 8:12 ` Markus Armbruster
2021-03-19 7:55 ` Markus Armbruster
2021-03-22 6:39 ` David Gibson
2021-03-22 12:06 ` Paolo Bonzini
2021-03-23 3:33 ` David Gibson
2021-03-23 13:06 ` Igor Mammedov
2021-03-29 5:35 ` David Gibson
2021-03-29 9:38 ` Paolo Bonzini
2021-03-12 13:38 ` Laine Stump
2021-03-22 6:43 ` David Gibson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YFg8jCvFBcRUOrWS@yekko.fritz.box \
--to=david@gibson.dropbear.id.au \
--cc=armbru@redhat.com \
--cc=danielhb413@gmail.com \
--cc=imammedo@redhat.com \
--cc=jusual@redhat.com \
--cc=laine@redhat.com \
--cc=michael.roth@amd.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).