From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C376C4338F for ; Fri, 23 Jul 2021 07:05:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F321460EAF for ; Fri, 23 Jul 2021 07:05:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234089AbhGWGZE (ORCPT ); Fri, 23 Jul 2021 02:25:04 -0400 Received: from smtp-relay-canonical-1.canonical.com ([185.125.188.121]:54212 "EHLO smtp-relay-canonical-1.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234253AbhGWGY6 (ORCPT ); Fri, 23 Jul 2021 02:24:58 -0400 Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-1.canonical.com (Postfix) with ESMTPS id CDED83F342 for ; Fri, 23 Jul 2021 07:05:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1627023926; bh=nfkPx7NC5zaeXceVlyVGlTHRL5ArI5NO9jvFKmnisTY=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=uvrH6AvZOq1wRvT1Bd8/SfOxzDlUfB1BHR1wawlqi/NdJeikPUd7XLty6QwnQUU/Q KmPDnkxPkGWG8o7m7YVSZ9yov8oVMmMbaMxqMDdPRtUxWxHler3Dg0QtiyFPZs9t2I KYDfwoC/KWiYJBeTQEqPndVz1lU3p8GiHIOFquUtupJn6wznZccBqT3HenWSRlEtb+ uG2Mq+Sm4zb+GHhs+XCC+p9gqWZywaj9V2jL3uhGvvTtNCA/XnMQRlw7XifC4NwTb0 bcVXU4j//wcTgoOMtvRwINl1NGUaKvUxSWvfdaGDmyI7pPaa/5jMgcklFVGrmjTIcQ HQZzr8iCv7dWw== Received: by mail-ej1-f71.google.com with SMTP id hb18-20020a1709071612b029052883e9de3eso296096ejc.19 for ; Fri, 23 Jul 2021 00:05:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=nfkPx7NC5zaeXceVlyVGlTHRL5ArI5NO9jvFKmnisTY=; b=XSboy6UVfT6jDFRLFOaJBNFaC/etHSWFkCrVuA67JGB4Z5OhJmKzqomLnwoXnsu1H8 nLmLL+WTjIta9rDT+RSp0FPzAQjYHqKmKMGw4xwkEV3ldEuid0PU4e/DNSBbHX+KlD6K k5rXmp4BqEpmOMEjXYjJ70FoPkQFzza5N2tF98fSw1CcaFYX0vJuAknsg3Di3m1xmmSY UajwdI5UoVmCwfyMcOM9blxv93d1IZhZxs+/Y1pTpVHHp2LJ+GFJ7NAjnx5jkeTpwdUm rARlzl9SD2PNqZA3Y5ZOKEetKAKYVTHjF2EKC+XSMdH2JR4BcCb4MxN/4KYF1puyWHA6 Rq1w== X-Gm-Message-State: AOAM531NFOnXC2T/rKar9DEsWO9kq3hYjEGEzCzS163k25+XDh/WbpMd s63+T2zmRXXJio0n9UH2CDO2TpjicH/htPLSbSlFag8qVNii5g35qN1idU/sERynJNl4pMucrB2 6ag5ARP0OipYB+lHsTTieOoJOMXYHgnsprRCAB7Q3FmtilYdIGFZIPA== X-Received: by 2002:a17:906:f0d8:: with SMTP id dk24mr3430024ejb.432.1627023926310; Fri, 23 Jul 2021 00:05:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwnwrE3fibwAyd5puLQhhkBI3cJ/iZwBYnuPqRykg2M2+jR/0WQUB0/5z1/f5dPMa0hH/lPTm/THgVMa/6SzKQ= X-Received: by 2002:a17:906:f0d8:: with SMTP id dk24mr3430007ejb.432.1627023926038; Fri, 23 Jul 2021 00:05:26 -0700 (PDT) MIME-Version: 1.0 References: <20210722222351.GA354095@bjorn-Precision-5520> In-Reply-To: From: Kai-Heng Feng Date: Fri, 23 Jul 2021 15:05:12 +0800 Message-ID: Subject: Re: [PATCH 1/2] PCI/AER: Disable AER interrupt during suspend To: Christoph Hellwig Cc: Bjorn Helgaas , Joerg Roedel , "open list:PCI ENHANCED ERROR HANDLING (EEH) FOR POWERPC" , "open list:PCI SUBSYSTEM" , open list , Lalithambika Krishnakumar , Alex Williamson , "Oliver O'Halloran" , Bjorn Helgaas , Mika Westerberg , Lu Baolu Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org On Fri, Jul 23, 2021 at 1:24 PM Christoph Hellwig wrote: > > On Thu, Jul 22, 2021 at 05:23:51PM -0500, Bjorn Helgaas wrote: > > Marking both of these as "not applicable" for now because I don't > > think we really understand what's going on. > > > > Apparently a DMA occurs during suspend or resume and triggers an ACS > > violation. I don't think think such a DMA should occur in the first > > place. > > > > Or maybe, since you say the problem happens right after ACS is enabled > > during resume, we're doing the ACS enable incorrectly? Although I > > would think we should not be doing DMA at the same time we're enabling > > ACS, either. > > > > If this really is a system firmware issue, both HP and Dell should > > have the knowledge and equipment to figure out what's going on. > > DMA on resume sounds really odd. OTOH the below mentioned case of > a DMA during suspend seems very like in some setup. NVMe has the > concept of a host memory buffer (HMB) that allows the PCIe device > to use arbitrary host memory for internal purposes. Combine this > with the "Storage D3" misfeature in modern x86 platforms that force > a slot into d3cold without consulting the driver first and you'd see > symptoms like this. Another case would be the NVMe equivalent of the > AER which could lead to a completion without host activity. The issue can also be observed on non-HMB NVMe. > > We now have quirks in the ACPI layer and NVMe to fully shut down the > NVMe controllers on these messed up systems with the "Storage D3" > misfeature which should avoid such "spurious" DMAs at the cost of > wearning out the device much faster. Since the issue is on S3, I think the NVMe always fully shuts down. Kai-Heng