linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: "Julian Groß" <julian.g@posteo.de>
Cc: Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@fb.com>,
	Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Linux regressions mailing list <regressions@lists.linux.dev>
Subject: Re: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"
Date: Thu, 12 Jan 2023 10:42:04 -0600	[thread overview]
Message-ID: <20230112164204.GA1768006@bhelgaas> (raw)
In-Reply-To: <9d46a35f-5830-9761-ca2c-eaa640e9cc86@leemhuis.info>

On Thu, Jan 12, 2023 at 03:48:46PM +0100, Linux kernel regression tracking (Thorsten Leemhuis) wrote:
> ...
> On 11.01.23 23:11, Julian Groß wrote:
> > Dear Maintainer,
> > 
> > when running Linux Kernel version 6.0.12, 6.0.10, 6.0-rc7, or 6.1.4, my
> > system seemingly randomly freezes due to the file system being set to
> > read-only due to an issue with my NVMe controller.
> > The issue does *not* appear on Linux Kernel version 5.19.11 or lower.
> > 
> > Through network logging I am able to catch the issue:
> > ```
> > Jan  8 14:50:16 x299-desktop kernel: [ 1461.259288] nvme nvme0:
> > controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
> > Jan  8 14:50:16 x299-desktop kernel: [ 1461.259293] nvme nvme0: Does
> > your device have a faulty power saving mode enabled?
> > Jan  8 14:50:16 x299-desktop kernel: [ 1461.259293] nvme nvme0: Try
> > "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
> > Jan  8 14:50:16 x299-desktop kernel: [ 1461.331360] nvme 0000:01:00.0:
> > enabling device (0000 -> 0002)
> > ...
> > 
> > I have tried the suggestion in the log without luck.
> > 
> > Attached is a log that includes two system freezes, as well as a list of
> > PCI(e) devices created by Debian reportbug.
> > The first freeze happens at "Jan  8 04:26:28" and the second freeze
> > happens at "Jan  8 14:50:16".
> > 
> > Currently, I am using git bisect to narrow down the window of possible
> > commits, but since the issue appears seemingly random, it will take many
> > months to identify the offending commit this way.
> > 
> > The original Debian bug report is here:
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1028309

For some reason the log [1] has very little of the kernel dmesg log.
It does seem like the freeze is partial (I see messages for hundreds
or thousands of seconds after the nvme reset), but requires a reboot
to recover.

The lspci information [2] shows the 00:1b.0 Root Port leading to the
01:00.0 NVMe device.

Is it possible to collect lspci output after the nvme freeze?  If so,
please save the output of:

  sudo lspci -vv -s00:1b.0
  sudo lspci -vv -s01:00.0

Make sure to run lspci as root so we can see the error logging
registers for these devices.

If you can collect more of the dmesg log after the freeze, e.g., via
the "dmesg" command, that might be helpful, too.

Bjorn

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1028309;filename=x299-desktop_crash.log.xz;msg=5
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=0;bug=1028309;msg=5

  reply	other threads:[~2023-01-12 17:11 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <d5d8d106-acce-e20c-827d-1b37de2b2188@posteo.de>
     [not found] ` <0d3206be-fae8-4bbd-4b6c-a5d1f038356d@posteo.de>
2023-01-12 14:48   ` Regression in Kernel 6.0: System partially freezes with "nvme controller is down" Linux kernel regression tracking (Thorsten Leemhuis)
2023-01-12 16:42     ` Bjorn Helgaas [this message]
2023-02-17 12:39       ` Linux regression tracking (Thorsten Leemhuis)
2023-02-17 15:01     ` Linux regression tracking #update (Thorsten Leemhuis)
2023-01-12 16:37   ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230112164204.GA1768006@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=julian.g@posteo.de \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).