From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFD53C433E6 for ; Wed, 27 Jan 2021 17:35:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A4E0360187 for ; Wed, 27 Jan 2021 17:35:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235007AbhA0RfB (ORCPT ); Wed, 27 Jan 2021 12:35:01 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:41370 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343855AbhA0Rb7 (ORCPT ); Wed, 27 Jan 2021 12:31:59 -0500 Received: from 1.general.khfeng.uk.vpn ([10.172.196.174] helo=localhost) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1l4oer-0001Pm-Dw; Wed, 27 Jan 2021 17:31:10 +0000 From: Kai-Heng Feng To: bhelgaas@google.com Cc: Kai-Heng Feng , Russell Currey , "Oliver O'Halloran" , Mika Westerberg , Lalithambika Krishnakumar , Lu Baolu , Joerg Roedel , linuxppc-dev@lists.ozlabs.org (open list:PCI ENHANCED ERROR HANDLING (EEH) FOR POWERPC), linux-pci@vger.kernel.org (open list:PCI SUBSYSTEM), linux-kernel@vger.kernel.org (open list) Subject: [PATCH 1/2] PCI/AER: Disable AER interrupt during suspend Date: Thu, 28 Jan 2021 01:31:00 +0800 Message-Id: <20210127173101.446940-1-kai.heng.feng@canonical.com> X-Mailer: git-send-email 2.29.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint") enables ACS, and some platforms lose its NVMe after resume from firmware: [ 50.947816] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f01 source:0x0000 [ 50.947817] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error detected [ 50.947829] pcieport 0000:00:1b.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID) [ 50.947830] pcieport 0000:00:1b.0: device [8086:06ac] error status/mask=00200000/00010000 [ 50.947831] pcieport 0000:00:1b.0: [21] ACSViol (First) [ 50.947841] pcieport 0000:00:1b.0: AER: broadcast error_detected message [ 50.947843] nvme nvme0: frozen state error detected, reset controller It happens right after ACS gets enabled during resume. To prevent that from happening, disable AER interrupt and enable it on system suspend and resume, respectively. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209149 Fixes: 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint") Signed-off-by: Kai-Heng Feng --- drivers/pci/pcie/aer.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 77b0f2c45bc0..0e9a85530ae6 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1365,6 +1365,22 @@ static int aer_probe(struct pcie_device *dev) return 0; } +static int aer_suspend(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + + aer_disable_rootport(rpc); + return 0; +} + +static int aer_resume(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + + aer_enable_rootport(rpc); + return 0; +} + /** * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP * @dev: pointer to Root Port, RCEC, or RCiEP @@ -1437,6 +1453,8 @@ static struct pcie_port_service_driver aerdriver = { .service = PCIE_PORT_SERVICE_AER, .probe = aer_probe, + .suspend = aer_suspend, + .resume = aer_resume, .remove = aer_remove, }; -- 2.29.2