On Mon, Sep 6, 2021 at 4:49 AM Niklas Schnelle <schnelle@linux.ibm.com> wrote:
 I believe we might be the first
implementation of PCI device recovery in a virtualized setting requiring us to
coordinate the device reset with the hypervisor platform by issuing a disable
and re-enable to the platform as well as starting the recovery following
a platform event.

I recall none of the details, but SRIOV is a standardized system for sharing a PCI device across multiple virtual machines. It has detailed info on what the hypervisor must do, and what the local OS instance must do to accomplish this.  It's part of the PCI standard, and its more than a decade old now, maybe two. Being a part of the PCI standard, it was interoperable with error recovery, to the best of my recollection. At the time it was introduced, it got pushed very aggressively.  The x86 hypervisor vendors were aiming at the heart of zseries, and were militant about it.

-- Linas

--
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.