All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: David Gibson <dgibson@redhat.com>,
	Julia Suvorova <jusual@redhat.com>,
	qemu devel list <qemu-devel@nongnu.org>
Subject: Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Date: Thu, 22 Oct 2020 17:50:51 +0300	[thread overview]
Message-ID: <CAC_L=vX0+H-SfQHneVPd-Mc3wFxHBSbkKHt3SpNOBOY_JsYDUA@mail.gmail.com> (raw)
In-Reply-To: <20201022102857-mutt-send-email-mst@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 6053 bytes --]

On Thu, Oct 22, 2020 at 5:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Thu, Oct 22, 2020 at 05:10:43PM +0300, Marcel Apfelbaum wrote:
> >
> >
> > On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> >
> >     On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote:
> >     > Hi David, Michael,
> >     >
> >     > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgibson@redhat.com>
> wrote:
> >     >
> >     >     On Thu, 22 Oct 2020 08:06:55 -0400
> >     >     "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >     >
> >     >     > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum
> wrote:
> >     >     > > From: Marcel Apfelbaum <marcel@redhat.com>
> >     >     > >
> >     >     > > During PCIe Root Port's transition from Power-Off to
> Power-ON (or
> >     >     vice-versa)
> >     >     > > the "Slot Control Register" has the "Power Indicator
> Control"
> >     >     > > set to "Blinking" expressing a "power transition" mode.
> >     >     > >
> >     >     > > Any hotplug operation during the "power transition" mode
> is not
> >     >     permitted
> >     >     > > or at least not expected by the Guest OS leading to strange
> >     failures.
> >     >     > >
> >     >     > > Detect and refuse hotplug operations in such case.
> >     >     > >
> >     >     > > Signed-off-by: Marcel Apfelbaum <
> marcel.apfelbaum@gmail.com>
> >     >     > > ---
> >     >     > >  hw/pci/pcie.c | 7 +++++++
> >     >     > >  1 file changed, 7 insertions(+)
> >     >     > >
> >     >     > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> >     >     > > index 5b48bae0f6..2fe5c1473f 100644
> >     >     > > --- a/hw/pci/pcie.c
> >     >     > > +++ b/hw/pci/pcie.c
> >     >     > > @@ -410,6 +410,7 @@ void
> pcie_cap_slot_pre_plug_cb(HotplugHandler
> >     >     *hotplug_dev, DeviceState *dev,
> >     >     > >      PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
> >     >     > >      uint8_t *exp_cap = hotplug_pdev->config +
> hotplug_pdev->
> >     >     exp.exp_cap;
> >     >     > >      uint32_t sltcap = pci_get_word(exp_cap +
> PCI_EXP_SLTCAP);
> >     >     > > +    uint32_t sltctl = pci_get_word(exp_cap +
> PCI_EXP_SLTCTL);
> >     >     > >
> >     >     > >      /* Check if hot-plug is disabled on the slot */
> >     >     > >      if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC)
> == 0) {
> >     >     > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb
> >     (HotplugHandler
> >     >     *hotplug_dev, DeviceState *dev,
> >     >     > >          return;
> >     >     > >      }
> >     >     > >
> >     >     > > +    if ((sltctl & PCI_EXP_SLTCTL_PIC) ==
> >     PCI_EXP_SLTCTL_PWR_IND_BLINK)
> >     >     {
> >     >     > > +        error_setg(errp, "Hot-plug failed: %s is in Power
> >     Transition",
> >     >     > > +                   DEVICE(hotplug_pdev)->id);
> >     >     > > +        return;
> >     >     > > +    }
> >     >     > > +
> >     >     > >      pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev),
> dev,
> >     errp);
> >     >     > >  }
> >     >     >
> >     >     > Probably the only way to handle for existing machine types.
> >     >
> >     >
> >     > I agree
> >     >
> >     >
> >     >     > For new ones, can't we queue it in host memory somewhere?
> >     >
> >     >
> >     >
> >     > I am not sure I understand what will be the flow.
> >     >   - The user asks for a hotplug operation.
> >     >   -  QEMU deferred operation.
> >     > After that the operation may still fail, how would the user know
> if the
> >     > operation
> >     > succeeded or not?
> >
> >
> >     How can it fail? It's just a button press ...
> >
> >
> >
> > Currently we have "Hotplug unsupported."
> > With this change we have "Guest/System not ready"
>
>
> Hotplug unsupported is not an error that can trigger with
> a well behaved management such as libvirt.
>
>
> >
> >
> >     >
> >     >
> >     >     I'm not actually convinced we can't do that even for existing
> machine
> >     >     types.
> >     >
> >     >
> >     > Is a Guest visible change, I don't think we can do it.
> >     >
> >     >
> >     >     So I'm a bit hesitant to suggest going ahead with this without
> >     >     looking a bit closer at whether we can implement a
> wait-for-ready in
> >     >     qemu, rather than forcing every user of qemu (human or
> machine) to do
> >     >     so.
> >     >
> >     >
> >     > While I agree it is a pain from the usability point of view,
> hotplug
> >     operations
> >     > are allowed to fail. This is not more than a corner case, ensuring
> the
> >     right
> >     > response (gracefully erroring out) may be enough.
> >     >
> >     > Thanks,
> >     > Marcel
> >     >
> >
> >
> >     I don't think they ever failed in the past so management is unlikely
> >     to handle the failure by retrying ...
> >
> >
> > That would require some management handling, yes.
> > But even without a "retry", failing is better than strange OS behavior.
> >
> > Trying a better alternative like deferring the operation for new machines
> > would make sense, however is out of the scope of this patch
>
> Expand the scope please. The scope should be "solve a problem xx" not
> "solve a problem xx by doing abc".
>
>
The scope is detecting a hotplug error early instead
passing to the Guest OS a hotplug operation that we know it will fail.



> > that simply
> > detects the error leaving us in a slightly better state than today.
> >
> > Thanks,
> > Marcel
>
> Not applying a patch is the only tool we maintainers have to influence
> people to solve the problem fully.

That's why I'm not inclined to apply
> "slightly better" patches generally.
>
>
The patch is a proposal following some offline discussions on this matter.
I personally see the value of it versus what we have today.

Thanks,
Marcel


>
> >
> >
> >     >
> >     >
> >     >
> >     >     --
> >     >     David Gibson <dgibson@redhat.com>
> >     >     Principal Software Engineer, Virtualization, Red Hat
> >     >
> >
> >
>
>

[-- Attachment #2: Type: text/html, Size: 9507 bytes --]

  reply	other threads:[~2020-10-22 14:57 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-22 11:40 [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready Marcel Apfelbaum
2020-10-22 12:06 ` Michael S. Tsirkin
2020-10-22 12:56   ` David Gibson
2020-10-22 13:15     ` Michael S. Tsirkin
2020-10-23  3:30       ` David Gibson
2020-10-22 13:55     ` Marcel Apfelbaum
2020-10-22 14:01       ` Michael S. Tsirkin
2020-10-22 14:10         ` Marcel Apfelbaum
2020-10-22 14:32           ` Michael S. Tsirkin
2020-10-22 14:50             ` Marcel Apfelbaum [this message]
2020-10-22 15:01               ` Michael S. Tsirkin
2020-10-23  3:49                 ` David Gibson
2020-10-23  6:47                   ` Marcel Apfelbaum
2020-10-23 15:54                     ` Michael S. Tsirkin
2020-10-23 17:27                       ` Igor Mammedov
2020-10-26  6:38                         ` David Gibson
2020-10-26  9:17                         ` Peter Krempa
2020-10-26  6:35                     ` David Gibson
2020-10-23  6:26                 ` Marcel Apfelbaum
2020-10-26  6:45                   ` David Gibson
2020-10-27 11:26                     ` Michael S. Tsirkin
2020-10-27 12:54                       ` Igor Mammedov
2020-10-27 13:02                         ` Michael S. Tsirkin
2020-10-28  3:34                           ` David Gibson
2020-10-28  3:31                         ` David Gibson
2020-10-28 15:39                           ` Igor Mammedov
2020-10-28 17:49                             ` Michael S. Tsirkin
2020-10-27 11:30                   ` Michael S. Tsirkin
2020-10-23  3:31       ` David Gibson
2020-11-11 12:35 ` Michael S. Tsirkin
2020-11-15 16:48   ` Marcel Apfelbaum
2020-11-11 16:09 ` Roman Kagan
2020-11-15 16:43   ` Marcel Apfelbaum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAC_L=vX0+H-SfQHneVPd-Mc3wFxHBSbkKHt3SpNOBOY_JsYDUA@mail.gmail.com' \
    --to=marcel.apfelbaum@gmail.com \
    --cc=dgibson@redhat.com \
    --cc=jusual@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.