From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34223) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aL7Jz-0008TT-Uo for qemu-devel@nongnu.org; Mon, 18 Jan 2016 05:46:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aL7Jw-0007KU-NC for qemu-devel@nongnu.org; Mon, 18 Jan 2016 05:46:03 -0500 Received: from mail-wm0-x244.google.com ([2a00:1450:400c:c09::244]:35798) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aL7Jw-0007KG-Cn for qemu-devel@nongnu.org; Mon, 18 Jan 2016 05:46:00 -0500 Received: by mail-wm0-x244.google.com with SMTP id 123so8514627wmz.2 for ; Mon, 18 Jan 2016 02:46:00 -0800 (PST) References: <19b185dd9e150c23d452a4af1db37288e919c9ba.1452564770.git.chen.fan.fnst@cn.fujitsu.com> From: Marcel Apfelbaum Message-ID: <569CC265.3050508@gmail.com> Date: Mon, 18 Jan 2016 12:45:57 +0200 MIME-Version: 1.0 In-Reply-To: <19b185dd9e150c23d452a4af1db37288e919c9ba.1452564770.git.chen.fan.fnst@cn.fujitsu.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v16 13/14] vfio-pci: pass the aer error to guest Reply-To: marcel@redhat.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Cao jin , qemu-devel@nongnu.org Cc: chen.fan.fnst@cn.fujitsu.com, izumi.taku@jp.fujitsu.com, alex.williamson@redhat.com, mst@redhat.com On 01/12/2016 04:43 AM, Cao jin wrote: > From: Chen Fan > > when the vfio device encounters an uncorrectable error in host, > the vfio_pci driver will signal the eventfd registered by this > vfio device, the results in the qemu eventfd handler getting Maybe "the results in" -> resulting in > invoked. > > this patch is to pass the error to guest and have the guest driver > recover from the error. Maybe "Pass the error to... and let the ... " > > Signed-off-by: Chen Fan > --- > hw/vfio/pci.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 47 insertions(+), 6 deletions(-) > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > index da4815e..efa5e01 100644 > --- a/hw/vfio/pci.c > +++ b/hw/vfio/pci.c > @@ -2553,18 +2553,59 @@ static void vfio_put_device(VFIOPCIDevice *vdev) > static void vfio_err_notifier_handler(void *opaque) > { > VFIOPCIDevice *vdev = opaque; > + PCIDevice *dev = &vdev->pdev; > + PCIEAERMsg msg = { > + .severity = 0, > + .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn, > + }; > > if (!event_notifier_test_and_clear(&vdev->err_notifier)) { > return; > } > > /* > - * TBD. Retrieve the error details and decide what action > - * needs to be taken. One of the actions could be to pass > - * the error to the guest and have the guest driver recover > - * from the error. This requires that PCIe capabilities be > - * exposed to the guest. For now, we just terminate the > - * guest to contain the error. > + * in case the real hardware configration has been changed, configration -> configuration > + * here we should recheck the bus reset capability. > + */ > + if ((vdev->features & VFIO_FEATURE_ENABLE_AER) && > + vfio_check_host_bus_reset(vdev)) { > + goto stop; > + } > + /* > + * we should read the error details from the real hardware > + * configuration spaces, here we only need to do is signaling > + * to guest an uncorrectable error has occurred. > + */ > + if ((vdev->features & VFIO_FEATURE_ENABLE_AER) && > + dev->exp.aer_cap) { Why do we need dev->exp.aer_cap check here? In patch 7/14 we fail the device init process if this happens, right? > + uint8_t *aer_cap = dev->config + dev->exp.aer_cap; > + uint32_t uncor_status; > + bool isfatal; > + > + uncor_status = vfio_pci_read_config(dev, > + dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4); > + > + /* > + * if we receive the error signal but not this device, we can maybe "if the error is not emitted by this device..." Thanks, Marcel > + * just ignore it. > + */ > + if (!(uncor_status & ~0UL)) { > + return; > + } > + > + isfatal = uncor_status & pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER); > + > + msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN : > + PCI_ERR_ROOT_CMD_NONFATAL_EN; > + > + pcie_aer_msg(dev, &msg); > + return; > + } > + > +stop: > + /* > + * If the aer capability is not exposed to the guest. we just > + * terminate the guest to contain the error. > */ > > error_report("%s(%04x:%02x:%02x.%x) Unrecoverable error detected. " >