From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from userp2130.oracle.com ([156.151.31.86]:38286 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728293AbeHMTee (ORCPT ); Mon, 13 Aug 2018 15:34:34 -0400 From: Thomas Tai To: bhelgaas@google.com, keith.busch@intel.com, poza@codeaurora.org, thomas.tai@oracle.com Cc: linux-pci@vger.kernel.org Subject: [PATCH 1/1] PCI/AER: prevent pcie_do_fatal_recovery from using device after it is removed Date: Mon, 13 Aug 2018 10:51:28 -0600 Message-Id: <1534179088-44219-2-git-send-email-thomas.tai@oracle.com> In-Reply-To: <1534179088-44219-1-git-send-email-thomas.tai@oracle.com> References: <1534179088-44219-1-git-send-email-thomas.tai@oracle.com> Sender: linux-pci-owner@vger.kernel.org List-ID: In order to prevent the pcie_do_fatal_recovery() from using the device after it is removed, the device's domain:bus:devfn is stored at the entry of pcie_do_fatal_recovery(). After rescanning the bus, the stored domain:bus:devfn is used to find the device and uses to report pci_info. The original issue only happens on an non-bridge device, a local variable is used instead of checking the device's header type. Signed-off-by: Thomas Tai --- drivers/pci/pcie/err.c | 33 +++++++++++++++++++++++---------- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index f02e334..3414445 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -287,15 +287,20 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service) struct pci_bus *parent; struct pci_dev *pdev, *temp; pci_ers_result_t result; + bool is_bridge_device = false; + u16 domain = pci_domain_nr(dev->bus); + u8 bus = dev->bus->number; + u8 devfn = dev->devfn; - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { + is_bridge_device = true; udev = dev; - else + } else { udev = dev->bus->self; + } parent = udev->subordinate; pci_lock_rescan_remove(); - pci_dev_get(dev); list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, bus_list) { pci_dev_get(pdev); @@ -309,27 +314,35 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service) result = reset_link(udev, service); - if ((service == PCIE_PORT_SERVICE_AER) && - (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)) { + if (service == PCIE_PORT_SERVICE_AER && is_bridge_device) { /* * If the error is reported by a bridge, we think this error * is related to the downstream link of the bridge, so we * do error recovery on all subordinates of the bridge instead * of the bridge and clear the error status of the bridge. */ - pci_cleanup_aer_uncorrect_error_status(dev); + pci_cleanup_aer_uncorrect_error_status(udev); } if (result == PCI_ERS_RESULT_RECOVERED) { if (pcie_wait_for_link(udev, true)) pci_rescan_bus(udev->bus); - pci_info(dev, "Device recovery from fatal error successful\n"); + /* find the pci_dev after rescanning the bus */ + dev = pci_get_domain_bus_and_slot(domain, bus, devfn); + if (dev) + pci_info(dev, "Device recovery from fatal error successful\n"); + else + pr_err("AER: Can not find pci_dev for %04x:%02x:%02x.%x\n", + domain, bus, + PCI_SLOT(devfn), PCI_FUNC(devfn)); + pci_dev_put(dev); } else { - pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); - pci_info(dev, "Device recovery from fatal error failed\n"); + if (is_bridge_device) + pci_uevent_ers(udev, PCI_ERS_RESULT_DISCONNECT); + pr_err("AER: Device %04x:%02x:%02x.%x recovery from fatal error failed\n", + domain, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); } - pci_dev_put(dev); pci_unlock_rescan_remove(); } -- 1.8.3.1