All of lore.kernel.org
 help / color / mirror / Atom feed
From: Niklas Schnelle <schnelle@linux.ibm.com>
To: linasvepstas@gmail.com
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	"Oliver O'Halloran" <oohall@gmail.com>,
	Russell Currey <ruscur@russell.cc>,
	linuxppc-dev@lists.ozlabs.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linux-s390@vger.kernel.org,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	Pierre Morel <pmorel@linux.ibm.com>
Subject: Re: [PATCH 0/5] s390/pci: automatic error recovery
Date: Tue, 07 Sep 2021 09:49:09 +0200	[thread overview]
Message-ID: <bddf2d1867585427680cb093cb10d5d15d7aa8d3.camel@linux.ibm.com> (raw)
In-Reply-To: <CAHrUA34TK6U4TB34FHejott9TdFvSgAedOpmro-Uj2ZwnvzecQ@mail.gmail.com>

On Mon, 2021-09-06 at 21:05 -0500, Linas Vepstas wrote:
> On Mon, Sep 6, 2021 at 4:49 AM Niklas Schnelle <schnelle@linux.ibm.com>
> wrote:
> 
> >  I believe we might be the first
> > implementation of PCI device recovery in a virtualized setting requiring
> > us to
> > coordinate the device reset with the hypervisor platform by issuing a
> > disable
> > and re-enable to the platform as well as starting the recovery following
> > a platform event.
> > 
> 
> I recall none of the details, but SRIOV is a standardized system for
> sharing a PCI device across multiple virtual machines. It has detailed info
> on what the hypervisor must do, and what the local OS instance must do to
> accomplish this.  

Yes and in fact on s390 we make heavy use of SR-IOV.

> It's part of the PCI standard, and its more than a decade
> old now, maybe two. Being a part of the PCI standard, it was interoperable
> with error recovery, to the best of my recollection. 

Maybe I worded things with a bit too much sensationalism and it might
even be that POWER supports error recovery also with virtualization,
though I'm not sure how far that goes.

I believe you are right in that SR-IOV supports the error recovery,
after all this patch set also has to work together with SRIOV enabled
devices. At least on s390 though until this patch set the error
recovery performed by the hypervisor stopped in the hypervisor.

The missing part added by this patch set is coordinating with device
drivers in Linux to determine where use of a recovered device can pick
up after the PCIe level error recovery is done.

As for virtualization this coordination of course needs to cross the
hypervisor/guest boundary and at least for KVM+QEMU I know for a fact
that reporting a PCI error to the guest is currently just a stub that
actually completely stops the guest, so you definitely don't get smooth
error recovery there yet.

> At the time it was
> introduced, it got pushed very aggressively.  The x86 hypervisor vendors
> were aiming at the heart of zseries, and were militant about it.

And yet we're still here, use SR-IOV ourselves and even support Linux +
KVM as a hypervisor you can use just the same on a mainframe, an x86,
POWER, or ARM system.

> 
> -- Linas
> 


WARNING: multiple messages have this Message-ID (diff)
From: Niklas Schnelle <schnelle@linux.ibm.com>
To: linasvepstas@gmail.com
Cc: linux-s390@vger.kernel.org, Pierre Morel <pmorel@linux.ibm.com>,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Oliver O'Halloran <oohall@gmail.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 0/5] s390/pci: automatic error recovery
Date: Tue, 07 Sep 2021 09:49:09 +0200	[thread overview]
Message-ID: <bddf2d1867585427680cb093cb10d5d15d7aa8d3.camel@linux.ibm.com> (raw)
In-Reply-To: <CAHrUA34TK6U4TB34FHejott9TdFvSgAedOpmro-Uj2ZwnvzecQ@mail.gmail.com>

On Mon, 2021-09-06 at 21:05 -0500, Linas Vepstas wrote:
> On Mon, Sep 6, 2021 at 4:49 AM Niklas Schnelle <schnelle@linux.ibm.com>
> wrote:
> 
> >  I believe we might be the first
> > implementation of PCI device recovery in a virtualized setting requiring
> > us to
> > coordinate the device reset with the hypervisor platform by issuing a
> > disable
> > and re-enable to the platform as well as starting the recovery following
> > a platform event.
> > 
> 
> I recall none of the details, but SRIOV is a standardized system for
> sharing a PCI device across multiple virtual machines. It has detailed info
> on what the hypervisor must do, and what the local OS instance must do to
> accomplish this.  

Yes and in fact on s390 we make heavy use of SR-IOV.

> It's part of the PCI standard, and its more than a decade
> old now, maybe two. Being a part of the PCI standard, it was interoperable
> with error recovery, to the best of my recollection. 

Maybe I worded things with a bit too much sensationalism and it might
even be that POWER supports error recovery also with virtualization,
though I'm not sure how far that goes.

I believe you are right in that SR-IOV supports the error recovery,
after all this patch set also has to work together with SRIOV enabled
devices. At least on s390 though until this patch set the error
recovery performed by the hypervisor stopped in the hypervisor.

The missing part added by this patch set is coordinating with device
drivers in Linux to determine where use of a recovered device can pick
up after the PCIe level error recovery is done.

As for virtualization this coordination of course needs to cross the
hypervisor/guest boundary and at least for KVM+QEMU I know for a fact
that reporting a PCI error to the guest is currently just a stub that
actually completely stops the guest, so you definitely don't get smooth
error recovery there yet.

> At the time it was
> introduced, it got pushed very aggressively.  The x86 hypervisor vendors
> were aiming at the heart of zseries, and were militant about it.

And yet we're still here, use SR-IOV ourselves and even support Linux +
KVM as a hypervisor you can use just the same on a mainframe, an x86,
POWER, or ARM system.

> 
> -- Linas
> 


  parent reply	other threads:[~2021-09-07  7:49 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-06  9:49 [PATCH 0/5] s390/pci: automatic error recovery Niklas Schnelle
2021-09-06  9:49 ` Niklas Schnelle
2021-09-06  9:49 ` [PATCH 1/5] s390/pci: refresh function handle in iomap Niklas Schnelle
2021-09-06  9:49   ` Niklas Schnelle
2021-09-06  9:49 ` [PATCH 2/5] s390/pci: implement reset_slot for hotplug slot Niklas Schnelle
2021-09-06  9:49   ` Niklas Schnelle
2021-09-06  9:49 ` [PATCH 3/5] PCI: Move pci_dev_is/assign_added() to pci.h Niklas Schnelle
2021-09-06  9:49   ` Niklas Schnelle
2021-09-07  0:22   ` kernel test robot
2021-09-07  0:22     ` kernel test robot
2021-09-07  0:22     ` kernel test robot
2021-09-07  0:25   ` kernel test robot
2021-09-07  0:25     ` kernel test robot
2021-09-07  0:25     ` kernel test robot
2021-09-07  7:51     ` Andy Shevchenko
2021-09-07  7:51       ` Andy Shevchenko
2021-09-07  7:51       ` Andy Shevchenko
2021-09-07  8:14       ` Niklas Schnelle
2021-09-07  8:14         ` Niklas Schnelle
2021-09-07  8:14         ` Niklas Schnelle
2021-09-06  9:49 ` [PATCH 4/5] PCI: Export pci_dev_lock() Niklas Schnelle
2021-09-06  9:49   ` Niklas Schnelle
2021-09-06  9:49 ` [PATCH 5/5] s390/pci: implement minimal PCI error recovery Niklas Schnelle
2021-09-06  9:49   ` Niklas Schnelle
2021-09-07  2:04 ` [PATCH 0/5] s390/pci: automatic " Oliver O'Halloran
2021-09-07  2:04   ` Oliver O'Halloran
2021-09-07  8:45   ` Niklas Schnelle
2021-09-07  8:45     ` Niklas Schnelle
2021-09-07 12:21     ` Niklas Schnelle
2021-09-07 12:21       ` Niklas Schnelle
2021-09-08  1:37       ` Oliver O'Halloran
2021-09-08  1:37         ` Oliver O'Halloran
2021-09-08  8:09         ` Niklas Schnelle
2021-09-08  8:09           ` Niklas Schnelle
2021-09-07  2:05 ` Linas Vepstas
2021-09-07  2:10   ` Fwd: " Linas Vepstas
2021-09-07  7:49   ` Niklas Schnelle [this message]
2021-09-07  7:49     ` Niklas Schnelle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bddf2d1867585427680cb093cb10d5d15d7aa8d3.camel@linux.ibm.com \
    --to=schnelle@linux.ibm.com \
    --cc=bhelgaas@google.com \
    --cc=linasvepstas@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=oohall@gmail.com \
    --cc=pmorel@linux.ibm.com \
    --cc=ruscur@russell.cc \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.