qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	michael.roth@amd.com,
	Daniel Henrique Barboza <danielhb413@gmail.com>,
	Julia Suvorova <jusual@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Markus Armbruster <armbru@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	"qemu-ppc@nongnu.org" <qemu-ppc@nongnu.org>,
	Laine Stump <laine@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	jfreimann@redhat.com
Subject: Re: [RFC] adding a generic QAPI event for failed device hotunplug
Date: Tue, 23 Mar 2021 14:06:36 +0100	[thread overview]
Message-ID: <20210323140636.1a89eaab@redhat.com> (raw)
In-Reply-To: <YFlhiNorrttIslFf@yekko.fritz.box>

On Tue, 23 Mar 2021 14:33:28 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, Mar 22, 2021 at 01:06:53PM +0100, Paolo Bonzini wrote:
> > On 22/03/21 07:39, David Gibson wrote:  
> > > > QEMU doesn't really keep track of "in flight" unplug requests, and as
> > > > long as that's the case, its timeout even will have the same issue.  
> > > Not generically, maybe.  In the PAPR code we effectively do, by means
> > > of the 'unplug_requested' boolean in the DRC structure.  Maybe that's
> > > a mistake, but at the time I couldn't see how else to handle things.  
> > 
> > No, that's good.  x86 also tracks it in some registers that are accessible
> > from the ACPI firmware.  See "PCI slot removal notification" in
> > docs/specs/acpi_pci_hotplug.txt.
> >   
> > > Currently we will resolve all "in flight" requests at machine reset
> > > time, effectively completing those requests.  Does that differ from
> > > x86 behaviour?  
> > 
> > IIRC on x86 the requests are instead cancelled, but I'm not 100%
> > sure.  
> 
> Ah... we'd better check that and try to make ppc consistent with
> whatever it does.
> 

Sorry for being late to discussion, I can't say much for all possible ways to unplug
PCI device (aside that it's a complicated mess), but hopefully I can shed some light on
state/behavior of ACPI based methods.

* x86 - ACPI based PCI hotplug
 Its sole existence was dictated by Widows not supporting SHPC (conventional PCI),
 and it looks like 'thanks' to Windows buggy drivers we would have to use it for
 PCI-E  as well (Julia works on it).
 HW registers described in docs/specs/acpi_pci_hotplug.txt are our own invention,
 they help to raise standard ACPI 'device check' and 'eject request' events when
 guest executes AML bytecode. Potentially there is possibility for guest to report
 plug/unplug progress via ACPI _OST method (including failure/completion) but given
 my experience with how Windows PCI core worked so far that may be not used by it
 (honestly I haven't tried to explore possibility, due to lack of interest in it).
 
 regarding unplug - on device_del QEMU raises SCI interrupt, after this the process is
 asynchronous. When ACPI interpreter gets SCI it sends a respective _EJ0 event to
 devices mentioned in PCI_DOWN_BASE register. After getting the event, guest OS may
 decide to eject PCI device (which causes clearing of device's bit in PCI_DOWN_BASE)
 or refuse to do it. There is no any progress tracking in QEMU for failure and device's
 bit in PCI_DOWN_BASE is kept set. On the next device_(add|del) (for any PCI device)
 guest will see it again and will retry removal.
 Also if guest reboots with any bits in PCI_DOWN_BASE set, respective devices will
 be deleted on QEMU side.
 There is no other way to cancel removal request in PCI_DOWN_BASE, aside of explicitly
 ejecting device on guest request or implicitly on reboot.
 IMHO:
     Sticky nature of PCI_(UP|DOWN)_BASE is more trouble than help but its there since
     SeaBios times so it's ABI we are stuck with. If I were re-implementing it now,
     I would use one shot event that's cleared once guest read it and if possible
     implement _OST status reporting (if it works on Windows).
 As it stands now, once device_del is issued one user can't know when PCI device will be
 removed. No timeout will help with it.
 
* ACPI CPU/Memory hotplug
 Events triggered by device_del are one shot, then guest may report progress to QEMU using
 _OST method (qapi_event_send_acpi_device_ost) (I know that libvirt were aware of it,
 but I don't recall what it does with it). So QEMU might send '_UNPLUG_ERROR' event to
 user if guest decides so. But instead of duplicating all possible events from spec
 QEMU will pass _OST arguments [1] as is for user to interpret as described by standard.
 Though I'd say _OST is not 100% reliable, depending used Windows or linux kernel version
 they might skip on reporting some events. But I don't recall exact state at the time I've
 been testing it. So I'd call status reporting support as 'best effort'.
 Also it doesn't feature pending removal on reboot, that our ACPI PCI hotplug code has.
 So with well behaving guest user will get notified about failure or device removal (when
 guest is able to run its code), for broken guests I'm more inclined to say 'use fixed guest'
 to get sane behavior.
 Policy for user is to retry on failure (there is no bad side effects on retry).

I think that any kind of timeout here is inherently racy, in async hot[un]plug usecase,
all user has to do is just sufficiently over-commit host (or run it nested).
So it's just a question of how long it will take for user to come back with a bug report. 

* As far as I'm aware mentioned 'pending_deleted_event' is there to make transparent
  failover magic happen (CCing Jens, also Michael might know how it works)

* SHCP & PCI-E has its own set of unplug quirks, which I know little about but Julia worked
  with Michael on fixing PCI-E bugs (mostly related how Windows drivers handle unplug,
  some are not possible to fix, hence decision to add ACPI based hotplug to Q35 as workaround).
  So they might know specifics.

1) ACPI spec: _OST (OSPM Status Indication)



  reply	other threads:[~2021-03-23 13:10 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-05 18:16 [RFC] adding a generic QAPI event for failed device hotunplug Daniel Henrique Barboza
2021-03-06  6:57 ` Markus Armbruster
2021-03-08 14:22   ` Daniel Henrique Barboza
2021-03-08 17:04     ` Markus Armbruster
2021-03-08 18:01       ` Daniel Henrique Barboza
2021-03-09  3:22         ` David Gibson
2021-03-09  6:22           ` Markus Armbruster
2021-03-11 20:50             ` Daniel Henrique Barboza
2021-03-12  1:19               ` David Gibson
2021-03-12  8:12                 ` Markus Armbruster
2021-03-19  7:55                   ` Markus Armbruster
2021-03-22  6:39                   ` David Gibson
2021-03-22 12:06                     ` Paolo Bonzini
2021-03-23  3:33                       ` David Gibson
2021-03-23 13:06                         ` Igor Mammedov [this message]
2021-03-29  5:35                           ` David Gibson
2021-03-29  9:38                             ` Paolo Bonzini
2021-03-12 13:38                 ` Laine Stump
2021-03-22  6:43                   ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210323140636.1a89eaab@redhat.com \
    --to=imammedo@redhat.com \
    --cc=armbru@redhat.com \
    --cc=danielhb413@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=jfreimann@redhat.com \
    --cc=jusual@redhat.com \
    --cc=laine@redhat.com \
    --cc=michael.roth@amd.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).