From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EF0BC433E0 for ; Tue, 28 Jul 2020 11:30:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F2C662065C for ; Tue, 28 Jul 2020 11:30:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729081AbgG1Lak (ORCPT ); Tue, 28 Jul 2020 07:30:40 -0400 Received: from mailout1.hostsharing.net ([83.223.95.204]:57679 "EHLO mailout1.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729079AbgG1Lak (ORCPT ); Tue, 28 Jul 2020 07:30:40 -0400 X-Greylist: delayed 355 seconds by postgrey-1.27 at vger.kernel.org; Tue, 28 Jul 2020 07:30:38 EDT Received: from h08.hostsharing.net (h08.hostsharing.net [IPv6:2a01:37:1000::53df:5f1c:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (not verified)) by mailout1.hostsharing.net (Postfix) with ESMTPS id 52AED101903A3; Tue, 28 Jul 2020 13:24:39 +0200 (CEST) Received: from localhost (unknown [87.130.102.138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by h08.hostsharing.net (Postfix) with ESMTPSA id 0DC7B6040B87; Tue, 28 Jul 2020 13:24:39 +0200 (CEST) X-Mailbox-Line: From b45e46fd8a6aa6930aaac9d7718c2e4b787a4e5e Mon Sep 17 00:00:00 2001 Message-Id: From: Lukas Wunner Date: Tue, 28 Jul 2020 13:24:10 +0200 Subject: [PATCH] PCI: pciehp: Reduce noisiness on hot removal To: Bjorn Helgaas Cc: Stuart Hayes , Dongdong Liu , John Garry , Mika Westerberg , Keith Busch , linux-pci@vger.kernel.org Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org When a PCIe card is hot-removed, the Presence Detect State and Data Link Layer Link Active bits often do not clear simultaneously. I've seen delays of up to 244 msec between the two events with Thunderbolt. After pciehp has brought down the slot in response to the first event, the other bit may still be set. It's not discernible whether it's set because a new card is already in the slot or if it will soon clear. So pciehp tries to bring up the slot and in the latter case fails with a bunch of messages, some of them at KERN_ERR severity. The messages are false positives if the slot is no longer occupied and annoy users. Stuart Hayes reports the following splat on hot removal: KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Link Up KERN_INFO pcieport 0000:3c:06.0: pciehp: Timeout waiting for Presence Detect KERN_ERR pcieport 0000:3c:06.0: pciehp: link training error: status 0x0001 KERN_ERR pcieport 0000:3c:06.0: pciehp: Failed to check link status Dongdong Liu complains about a similar splat: KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Link Down KERN_INFO iommu: Removing device 0000:87:00.0 from group 12 KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Card present KERN_INFO pcieport 0000:80:10.0: Data Link Layer Link Active not set in 1000 msec KERN_ERR pciehp 0000:80:10.0:pcie004: Failed to check link status Users are particularly irritated to see a bringup attempt even though the slot was explicitly brought down via sysfs. In a perfect world, we could avoid this by setting Link Disable on slot bringdown and re-enabling it upon a Presence Detect State change. In reality however, there are broken hotplug ports which hardwire Presence Detect to zero, see 80696f991424 ("PCI: pciehp: Tolerate Presence Detect hardwired to zero"). Conversely, PCIe r1.0 hotplug ports hardwire Link Active to zero because Link Active Reporting wasn't specified before PCIe r1.1. On unplug, some ports first clear Presence then Link (see Stuart Hayes' splat) whereas others use the inverse order (see Dongdong Liu's splat). To top it off, there are hotplug ports which flap the Presence and Link bits on slot bringup, see 6c35a1ac3da6 ("PCI: pciehp: Tolerate initially unstable link"). pciehp is designed to work with all of these variants. Surplus attempts at slot bringup are a lesser evil than not being able to bring up slots at all. Although we could try to perfect the behavior for specific hotplug controllers, we'd risk breaking others or increasing code complexity. But we can certainly minimize annoyance by emitting only a single message with KERN_INFO severity if bringup is unsuccessful: * Drop the "Timeout waiting for Presence Detect" message in pcie_wait_for_presence(). The sole caller of that function, pciehp_check_link_status(), ignores the timeout and carries on. It emits error messages of its own and I don't think this particular message adds much value. * There's a single error condition in pciehp_check_link_status() which does not emit a message. Adding one allows dropping the "Failed to check link status" message emitted by board_added() if pciehp_check_link_status() returns a non-zero integer. * Tone down all messages in pciehp_check_link_status() to KERN_INFO severity and rephrase them to look as innocuous as possible. To this end, move the message emitted by pcie_wait_for_link_delay() to its callers. As a result, Stuart Hayes' splat becomes: KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Link Up KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Cannot train link: status 0x0001 Dongdong Liu's splat becomes: KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Card present KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): No link The messages now merely serve as information that presence or link bits were set a little longer than expected. Bringup failures which are not false positives are still reported, albeit no longer at KERN_ERR severity. Link: https://lore.kernel.org/linux-pci/20200310182100.102987-1-stuart.w.hayes@gmail.com/ Link: https://lore.kernel.org/linux-pci/1547649064-19019-1-git-send-email-liudongdong3@huawei.com/ Reported-by: Stuart Hayes Reported-by: Dongdong Liu Signed-off-by: Lukas Wunner --- drivers/pci/hotplug/pciehp_ctrl.c | 4 +--- drivers/pci/hotplug/pciehp_hpc.c | 15 +++++++++------ drivers/pci/pci.c | 5 ++--- drivers/pci/pcie/dpc.c | 7 +++++-- 4 files changed, 17 insertions(+), 14 deletions(-) diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c index 6503d15effbbd..2f5f4bb42dcc6 100644 --- a/drivers/pci/hotplug/pciehp_ctrl.c +++ b/drivers/pci/hotplug/pciehp_ctrl.c @@ -73,10 +73,8 @@ static int board_added(struct controller *ctrl) /* Check link training status */ retval = pciehp_check_link_status(ctrl); - if (retval) { - ctrl_err(ctrl, "Failed to check link status\n"); + if (retval) goto err_exit; - } /* Check for a power fault */ if (ctrl->power_fault_detected || pciehp_query_power_fault(ctrl)) { diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 53433b37e1812..fb3840e222add 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -283,8 +283,6 @@ static void pcie_wait_for_presence(struct pci_dev *pdev) msleep(10); timeout -= 10; } while (timeout > 0); - - pci_info(pdev, "Timeout waiting for Presence Detect\n"); } int pciehp_check_link_status(struct controller *ctrl) @@ -293,8 +291,10 @@ int pciehp_check_link_status(struct controller *ctrl) bool found; u16 lnk_status; - if (!pcie_wait_for_link(pdev, true)) + if (!pcie_wait_for_link(pdev, true)) { + ctrl_info(ctrl, "Slot(%s): No link\n", slot_name(ctrl)); return -1; + } if (ctrl->inband_presence_disabled) pcie_wait_for_presence(pdev); @@ -311,15 +311,18 @@ int pciehp_check_link_status(struct controller *ctrl) ctrl_dbg(ctrl, "%s: lnk_status = %x\n", __func__, lnk_status); if ((lnk_status & PCI_EXP_LNKSTA_LT) || !(lnk_status & PCI_EXP_LNKSTA_NLW)) { - ctrl_err(ctrl, "link training error: status %#06x\n", - lnk_status); + ctrl_info(ctrl, "Slot(%s): Cannot train link: status %#06x\n", + slot_name(ctrl), lnk_status); return -1; } pcie_update_link_speed(ctrl->pcie->port->subordinate, lnk_status); - if (!found) + if (!found) { + ctrl_info(ctrl, "Slot(%s): No device found\n", + slot_name(ctrl)); return -1; + } return 0; } diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 501035cc0409e..f369934867a06 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4702,9 +4702,7 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active, } if (active && ret && delay) msleep(delay); - else if (ret != active) - pci_info(pdev, "Data Link Layer Link Active not %s in 1000 msec\n", - active ? "set" : "cleared"); + return ret == active; } @@ -4838,6 +4836,7 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev) pci_dbg(dev, "waiting for link to train\n"); if (!pcie_wait_for_link_delay(dev, true, 0)) { /* Did not train, no need to wait any further */ + pci_info(dev, "Data Link Layer Link Active not set in 1000 msec\n"); return; } } diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index daa9a4153776c..e05aba86a3179 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -103,7 +103,8 @@ pci_ers_result_t dpc_reset_link(struct pci_dev *pdev) * Wait until the Link is inactive, then clear DPC Trigger Status * to allow the Port to leave DPC. */ - pcie_wait_for_link(pdev, false); + if (!pcie_wait_for_link(pdev, false)) + pci_info(pdev, "Data Link Layer Link Active not cleared in 1000 msec\n"); if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev)) return PCI_ERS_RESULT_DISCONNECT; @@ -111,8 +112,10 @@ pci_ers_result_t dpc_reset_link(struct pci_dev *pdev) pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS, PCI_EXP_DPC_STATUS_TRIGGER); - if (!pcie_wait_for_link(pdev, true)) + if (!pcie_wait_for_link(pdev, true)) { + pci_info(pdev, "Data Link Layer Link Active not set in 1000 msec\n"); return PCI_ERS_RESULT_DISCONNECT; + } return PCI_ERS_RESULT_RECOVERED; } -- 2.27.0