archive mirror
 help / color / mirror / Atom feed
From: Kai-Heng Feng <>
To: Christoph Hellwig <>
Cc: Bjorn Helgaas <>,
	Joerg Roedel <>,
	"open list:PCI SUBSYSTEM" <>,
	open list <>,
	Lalithambika Krishnakumar <>,
	Alex Williamson <>,
	"Oliver O'Halloran" <>,
	Bjorn Helgaas <>,
	Mika Westerberg <>,
	Lu Baolu <>
Subject: Re: [PATCH 1/2] PCI/AER: Disable AER interrupt during suspend
Date: Fri, 23 Jul 2021 15:05:12 +0800	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Fri, Jul 23, 2021 at 1:24 PM Christoph Hellwig <> wrote:
> On Thu, Jul 22, 2021 at 05:23:51PM -0500, Bjorn Helgaas wrote:
> > Marking both of these as "not applicable" for now because I don't
> > think we really understand what's going on.
> >
> > Apparently a DMA occurs during suspend or resume and triggers an ACS
> > violation.  I don't think think such a DMA should occur in the first
> > place.
> >
> > Or maybe, since you say the problem happens right after ACS is enabled
> > during resume, we're doing the ACS enable incorrectly?  Although I
> > would think we should not be doing DMA at the same time we're enabling
> > ACS, either.
> >
> > If this really is a system firmware issue, both HP and Dell should
> > have the knowledge and equipment to figure out what's going on.
> DMA on resume sounds really odd.  OTOH the below mentioned case of
> a DMA during suspend seems very like in some setup.  NVMe has the
> concept of a host memory buffer (HMB) that allows the PCIe device
> to use arbitrary host memory for internal purposes.  Combine this
> with the "Storage D3" misfeature in modern x86 platforms that force
> a slot into d3cold without consulting the driver first and you'd see
> symptoms like this.  Another case would be the NVMe equivalent of the
> AER which could lead to a completion without host activity.

The issue can also be observed on non-HMB NVMe.

> We now have quirks in the ACPI layer and NVMe to fully shut down the
> NVMe controllers on these messed up systems with the "Storage D3"
> misfeature which should avoid such "spurious" DMAs at the cost of
> wearning out the device much faster.

Since the issue is on S3, I think the NVMe always fully shuts down.


      reply	other threads:[~2021-07-23  7:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-27 17:31 [PATCH 1/2] PCI/AER: Disable AER interrupt during suspend Kai-Heng Feng
2021-01-27 17:31 ` [PATCH 2/2] PCI/DPC: Disable DPC " Kai-Heng Feng
2021-01-27 20:50 ` [PATCH 1/2] PCI/AER: Disable AER " Bjorn Helgaas
2021-01-28  4:09   ` Kai-Heng Feng
2021-02-04 23:27     ` Bjorn Helgaas
2021-02-05 15:17       ` Kai-Heng Feng
2021-07-22 22:23         ` Bjorn Helgaas
2021-07-23  5:24           ` Christoph Hellwig
2021-07-23  7:05             ` Kai-Heng Feng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='' \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).