From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CDBCC43331 for ; Sun, 10 Nov 2019 03:31:28 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D263121655 for ; Sun, 10 Nov 2019 03:31:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="KukGrOqU" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D263121655 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 479fgJ748KzF6My for ; Sun, 10 Nov 2019 14:31:24 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=kernel.org (client-ip=198.145.29.99; helo=mail.kernel.org; envelope-from=sashal@kernel.org; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=kernel.org Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.b="KukGrOqU"; dkim-atps=neutral Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 479dd74TQQzF414 for ; Sun, 10 Nov 2019 13:44:27 +1100 (AEDT) Received: from sasha-vm.mshome.net (c-73-47-72-35.hsd1.nh.comcast.net [73.47.72.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 5ED1F21848; Sun, 10 Nov 2019 02:44:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1573353865; bh=Ths97q+qE2uN8VtqEniJVy9I5rsNu3PZl7dyEi1Iplw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=KukGrOqUICyiJSbW/pQWDH84grI1aIFC2B2qwPumaZEtS8YS45ycgm+pu09ctTyD7 otADzKyAZkvMx/2h8LrFPmJlIXLis/yAguktAowUtGGkjGUJSDOAa3Z0uTSHAjjo09 1p/HBZ4ki4xx64wE5PfnzOxL2jtYm8TkpQg822lw= From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH AUTOSEL 4.19 145/191] PCI/ERR: Run error recovery callbacks for all affected devices Date: Sat, 9 Nov 2019 21:39:27 -0500 Message-Id: <20191110024013.29782-145-sashal@kernel.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191110024013.29782-1-sashal@kernel.org> References: <20191110024013.29782-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sasha Levin , linux-pci@vger.kernel.org, Sinan Kaya , Keith Busch , Bjorn Helgaas , linuxppc-dev@lists.ozlabs.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" From: Keith Busch [ Upstream commit bfcb79fca19d267712e425af1dd48812c40dec0c ] If an Endpoint reported an error with ERR_FATAL, we previously ran driver error recovery callbacks only for the Endpoint's driver. But if we reset a Link to recover from the error, all downstream components are affected, including the Endpoint, any multi-function peers, and children of those peers. Initiate the Link reset from the deepest Downstream Port that is reliable, and call the error recovery callbacks for all its children. If a Downstream Port (including a Root Port) reports an error, we assume the Port itself is reliable and we need to reset its downstream Link. In all other cases (Switch Upstream Ports, Endpoints, Bridges, etc), we assume the Link leading to the component needs to be reset, so we initiate the reset at the parent Downstream Port. This allows two other clean-ups. First, we currently only use a Link reset, which can only be initiated using a Downstream Port, so we can remove checks for Endpoints. Second, the Downstream Port where we initiate the Link reset is reliable (unlike components downstream from it), so the special cases for error detect and resume are no longer necessary. Signed-off-by: Keith Busch [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas Reviewed-by: Sinan Kaya Signed-off-by: Sasha Levin --- drivers/pci/pcie/err.c | 85 +++++++++++------------------------------- 1 file changed, 21 insertions(+), 64 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 12c1205e1d804..2c3b5bd59b18f 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -63,30 +63,12 @@ static int report_error_detected(struct pci_dev *dev, void *data) if (!dev->driver || !dev->driver->err_handler || !dev->driver->err_handler->error_detected) { - if (result_data->state == pci_channel_io_frozen && - dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) { - /* - * In case of fatal recovery, if one of down- - * stream device has no driver. We might be - * unable to recover because a later insmod - * of a driver for this device is unaware of - * its hw state. - */ - pci_printk(KERN_DEBUG, dev, "device has %s\n", - dev->driver ? - "no AER-aware driver" : "no driver"); - } - /* - * If there's any device in the subtree that does not - * have an error_detected callback, returning - * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of - * the subsequent mmio_enabled/slot_reset/resume - * callbacks of "any" device in the subtree. All the - * devices in the subtree are left in the error state - * without recovery. + * If any device in the subtree does not have an error_detected + * callback, PCI_ERS_RESULT_NO_AER_DRIVER prevents subsequent + * error callbacks of "any" device in the subtree, and will + * exit in the disconnected error state. */ - if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) vote = PCI_ERS_RESULT_NO_AER_DRIVER; else @@ -184,34 +166,23 @@ static pci_ers_result_t default_reset_link(struct pci_dev *dev) static pci_ers_result_t reset_link(struct pci_dev *dev, u32 service) { - struct pci_dev *udev; pci_ers_result_t status; struct pcie_port_service_driver *driver = NULL; - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { - /* Reset this port for all subordinates */ - udev = dev; - } else { - /* Reset the upstream component (likely downstream port) */ - udev = dev->bus->self; - } - - /* Use the aer driver of the component firstly */ - driver = pcie_port_find_service(udev, service); - + driver = pcie_port_find_service(dev, service); if (driver && driver->reset_link) { - status = driver->reset_link(udev); - } else if (udev->has_secondary_link) { - status = default_reset_link(udev); + status = driver->reset_link(dev); + } else if (dev->has_secondary_link) { + status = default_reset_link(dev); } else { pci_printk(KERN_DEBUG, dev, "no link-reset support at upstream device %s\n", - pci_name(udev)); + pci_name(dev)); return PCI_ERS_RESULT_DISCONNECT; } if (status != PCI_ERS_RESULT_RECOVERED) { pci_printk(KERN_DEBUG, dev, "link reset at upstream device %s failed\n", - pci_name(udev)); + pci_name(dev)); return PCI_ERS_RESULT_DISCONNECT; } @@ -243,31 +214,7 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev, else result_data.result = PCI_ERS_RESULT_RECOVERED; - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { - /* - * If the error is reported by a bridge, we think this error - * is related to the downstream link of the bridge, so we - * do error recovery on all subordinates of the bridge instead - * of the bridge and clear the error status of the bridge. - */ - if (cb == report_error_detected) - dev->error_state = state; - pci_walk_bus(dev->subordinate, cb, &result_data); - if (cb == report_resume) { - pci_aer_clear_device_status(dev); - pci_cleanup_aer_uncorrect_error_status(dev); - dev->error_state = pci_channel_io_normal; - } - } else { - /* - * If the error is reported by an end point, we think this - * error is related to the upstream link of the end point. - * The error is non fatal so the bus is ok; just invoke - * the callback for the function that logged the error. - */ - cb(dev, &result_data); - } - + pci_walk_bus(dev->subordinate, cb, &result_data); return result_data.result; } @@ -347,6 +294,14 @@ void pcie_do_nonfatal_recovery(struct pci_dev *dev) state = pci_channel_io_normal; + /* + * Error recovery runs on all subordinates of the first downstream port. + * If the downstream port detected the error, it is cleared at the end. + */ + if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || + pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) + dev = dev->bus->self; + status = broadcast_error_message(dev, state, "error_detected", @@ -378,6 +333,8 @@ void pcie_do_nonfatal_recovery(struct pci_dev *dev) "resume", report_resume); + pci_aer_clear_device_status(dev); + pci_cleanup_aer_uncorrect_error_status(dev); pci_info(dev, "AER: Device recovery successful\n"); return; -- 2.20.1