All of lore.kernel.org
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: qemu-devel@nongnu.org, kraxel@redhat.com
Subject: Re: [PATCH 2/4] pcie: update slot power status only is power control is enabled
Date: Fri, 25 Feb 2022 09:18:30 +0100	[thread overview]
Message-ID: <20220225091830.2f684997@redhat.com> (raw)
In-Reply-To: <20220224125928-mutt-send-email-mst@kernel.org>

On Thu, 24 Feb 2022 13:05:07 -0500
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Feb 24, 2022 at 12:44:09PM -0500, Igor Mammedov wrote:
> > on creation a PCIDevice has power turned on at the end of pci_qdev_realize()
> > however later on if PCIe slot isn't populated with any children
> > it's power is turned off. It's fine if native hotplug is used
> > as plug callback will power slot on among other things.
> > However when ACPI hotplug is enabled it replaces native PCIe plug
> > callbacks with ACPI specific ones (acpi_pcihp_device_*plug_cb) and
> > as result slot stays powered off. It works fine as ACPI hotplug
> > on guest side takes care of enumerating/initializing hotplugged
> > device. But when later guest is migrated, call chain introduced by [1]
> > 
> >    pcie_cap_slot_post_load()  
> >        -> pcie_cap_update_power()
> >            -> pcie_set_power_device()
> >                -> pci_set_power()
> >                    -> pci_update_mappings()  
> > 
> > will disable earlier initialized BARs for the hotplugged device
> > in powered off slot due to commit [2] which disables BARs if
> > power is off. As result guest OS after migration will be very
> > much confused [3], still thinking that it has working device,
> > which isn't true anymore due to disabled BARs.
> > 
> > Fix it by honoring PCI_EXP_SLTCAP_PCP and updating power status
> > only if capability is enabled. Follow up patch will disable
> > PCI_EXP_SLTCAP_PCP overriding COMPAT_PROP_PCP property when
> > PCIe slot is under ACPI PCI hotplug control.
> > 
> > See [3] for reproducer.
> > 
> > 1)
> > Fixes: commit d5daff7d312 (pcie: implement slot power control for pcie root ports)
> > 2)
> >        commit 23786d13441 (pci: implement power state)
> > 3)
> > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2053584
> >   
> 
> 
> Correct format for the last paragraph:
> 
> 
> Fixes: d5daff7d312 ("pcie: implement slot power control for pcie root ports")
> Fixes: 23786d13441 ("pci: implement power state")
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2053584

ok, will fix it up on respin like this to have references:

1)
Fixes: d5daff7d312 ("pcie: implement slot power control for pcie root ports")
2)
Fixes: 23786d13441 ("pci: implement power state")
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2053584

> 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  hw/pci/pcie.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > index d7d73a31e4..2339729a7c 100644
> > --- a/hw/pci/pcie.c
> > +++ b/hw/pci/pcie.c
> > @@ -383,10 +383,9 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev)
> >  
> >      if (sltcap & PCI_EXP_SLTCAP_PCP) {
> >          power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON;
> > +        pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> > +                            pcie_set_power_device, &power);
> >      }
> > -
> > -    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> > -                        pcie_set_power_device, &power);  
> 
> I think this is correct. However, I wonder whether for 6.2 compatiblity
> as a hack we should sometimes skip the power update even when
> PCI_EXP_SLTCAP_PCP exists. Will that not work around the issue for
> these machine types?

pc-q35-6.2 is broken utterly.
With pc-q35-6.1, it's a mess. Here is a ping-pong migration matrix for it
            
      v6.1   |  v6.2   | Fix
v6.1   ok    | broken  | ok (#1)
v6.2         | broken  | broken (#2)

[1] has PCI_EXP_SLTCAP_PCP due to x-pcihp-enable-pcie-pcp-cap=on
    i.e. pci_config is exactly the same as in qemu-v6.1
[2] PCI_EXP_SLTCAP_PCP is enabled + empty slot is powered off
    (+ state is migrated)

there are some invariants that might work in one direction,
but it won't survive ping-pong migration. And more importantly
for upstream we care mostly care for old -> new working,
and it's direction that is broken in v6.2.

> And assuming we want bug for bug compat anyway, why not just put
> it here? It seems easier to reason about frankly ...

It should be possible hack PCI core to fixup broken power state
on incoming migration at (at postload time), but that would just
create more confusion, where in some cases migration would work
and in some would not (depending on used qemu versions).

Lets just declare v6.2 qemu broken, with upgrade/downgrade to
(7.0/6.1) as suggested solution.

PS:
I'd very much prefer avoid adding hacks for ACPI pcihp sake to
PCI core, and let PCI code behave as it's supposed to per spec.
It's already bad enough with pcihp layered on top of PCI,
making PCI code depend on pcihp will just make it more fragile.
 
> >  }
> >  
> >  /*
> > -- 
> > 2.31.1  
> 



  reply	other threads:[~2022-02-25  9:17 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-24 17:44 [PATCH 0/4] Fix broken PCIe device after migration Igor Mammedov
2022-02-24 17:44 ` [PATCH 1/4] pci: expose TYPE_XIO3130_DOWNSTREAM name Igor Mammedov
2022-02-24 17:44 ` [PATCH 2/4] pcie: update slot power status only is power control is enabled Igor Mammedov
2022-02-24 18:05   ` Michael S. Tsirkin
2022-02-25  8:18     ` Igor Mammedov [this message]
2022-02-25  9:51       ` Michael S. Tsirkin
2022-02-25 10:05   ` Michael S. Tsirkin
2022-02-25 10:12   ` Gerd Hoffmann
2022-02-25 10:35     ` Michael S. Tsirkin
2022-02-25 13:02     ` Igor Mammedov
2022-02-25 13:08       ` Michael S. Tsirkin
2022-02-25 13:35         ` Igor Mammedov
2022-02-25 13:48           ` Michael S. Tsirkin
2022-02-25 15:39             ` Igor Mammedov
2022-02-28  7:39               ` Gerd Hoffmann
2022-02-28  8:55                 ` Igor Mammedov
2022-02-24 17:44 ` [PATCH 3/4] acpi: pcihp: disable power control on PCIe slot Igor Mammedov
2022-02-24 17:44 ` [PATCH 4/4] q35: compat: keep hotplugged PCIe device broken after migration for 6.2-older machine types Igor Mammedov
2022-02-24 18:11   ` Michael S. Tsirkin
2022-02-25  8:25     ` Igor Mammedov
2022-02-24 18:08 ` [PATCH 0/4] Fix broken PCIe device after migration Michael S. Tsirkin
2022-02-25  9:01   ` Igor Mammedov
2022-02-25  9:58 ` Michael S. Tsirkin
2022-02-25 13:18   ` Igor Mammedov
2022-02-25 13:50     ` Michael S. Tsirkin
2022-02-25 15:50       ` Igor Mammedov
2022-02-27 10:22         ` Michael S. Tsirkin
2022-02-28  7:49       ` Gerd Hoffmann
2022-02-25 14:32     ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220225091830.2f684997@redhat.com \
    --to=imammedo@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.