linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ilia Mirkin <imirkin@alum.mit.edu>
To: Marc MERLIN <marc_nouveau@merlins.org>
Cc: nouveau <nouveau@lists.freedesktop.org>,
	Mika Westerberg <mika.westerberg@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux PCI <linux-pci@vger.kernel.org>
Subject: Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
Date: Tue, 29 Dec 2020 11:33:16 -0500	[thread overview]
Message-ID: <CAKb7UviFP_YVxC4PO7MDNnw6NDrD=3BCGF37umwAfaimjbX9Pw@mail.gmail.com> (raw)
In-Reply-To: <20201229155159.GG23389@merlins.org>

On Tue, Dec 29, 2020 at 10:52 AM Marc MERLIN <marc_nouveau@merlins.org> wrote:
>
> On Sat, Dec 26, 2020 at 03:12:09AM -0800, Ilia Mirkin wrote:
> > > after boot, when it gets the right trigger (not sure which ones), it
> > > loops on this evern 2 seconds, mostly forever.
> >
> > The gpu suspends with runtime pm. And then gets woken up for some
> > reason (could be something quite silly, like lspci, or could be
> > something explicitly checking connectors, etc). Repeat.
>
> Ah, fair point.  Could it be powertop even?
> How would I go towards tracing that?
> Sounds like this would be a problem with all chips if userspace is able
> to wake them up every second or two with a probe. Now I wonder what
> broken userspace I have that could be doing this.

Well, it's a theory. Some userspace helpfully prevents the GPU from
suspending entirely, unfortunately I don't remember its name though by
messing with the attached audio device. It's very common and meant to
help... oh well.

>
> > Display offload usually requires acceleration -- the copies are done
> > using the DMA engine. Please make sure that you have firmware
> > available (and a new enough mesa). The errors suggest that you don't
> > have firmware available at the time that nouveau loads. Depending on
> > your setup, that might mean the firmware has to be built into the
> > kernel, or available in initramfs. (Or just regular filesystem if you
> > don't use a complicated boot sequence. But many people go with distro
> > defaults, which do have this complexity.)
>
> Hi Ilia, thanks for your answer.
>
> Do you think that could be a reason why the boot would hang for 2 full minutes at every
> boot ever since I upgraded to 5.5?

I'd have to check, but I'm guessing TU104 acceleration became a thing
in 5.5. I would also not be very surprised if the code didn't handle
failure extremely gracefully - there definitely have been problems
with that in the past.

>
> Also, without wanting to sound like a full newbie, where is that
> firmware you're talking about? In my kernel source?
>
> Here's what I do have:
> sauron:/usr/local/bin# dpkggrep nouveau
> libdrm-nouveau2:amd64                           install
> xserver-xorg-video-nouveau                      install
>
> no nouveau-firmware package in debian:
> sauron:/usr/local/bin# apt-cache search nouveau
> bumblebee - NVIDIA Optimus support for Linux
> libdrm-nouveau2 - Userspace interface to nouveau-specific kernel DRM services -- runtime
> xfonts-jmk - Jim Knoble's character-cell fonts for X
> xserver-xorg-video-nouveau - X.Org X server -- Nouveau display driver
>
> No firmware file on my disk:
> sauron:/usr/local/bin# find /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/ /lib/firmware/ |grep nouveau
> /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau
> /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko
> sauron:/usr/local/bin#
>
> The kernel module is in my initrd:
> sauron:/usr/local/bin# dd if=/boot/initrd.img-5.9.11-amd64-preempt-sysrq-20190817 bs=2966528  skip=1 | gunzip | cpio -tdv | grep nouveau
> drwxr-xr-x   1 root     root            0 Nov 30 15:40 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau
> -rw-r--r--   1 root     root      3691385 Nov 30 15:35 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko
> 17+1 records in
> 17+1 records out
> 52566778 bytes (53 MB, 50 MiB) copied, 1.69708 s, 31.0 MB/s

I think that gets you out of "full newbie" land...

>
> What am I supposed to do/check next?
>
> Note that ultimately I only need nouveau not to hang my boot 2mn and do
> PM so that the nvidia chip goes to sleep since I don't use it.

I'm not extremely familiar with debian packaging, but the firmware is
provided by NVIDIA and shipped as part of linux-firmware:

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/nvidia

This needs to be available at /lib/firmware/nvidia when nouveau loads.
Based on your email above, it's most likely that it would load from
the initrd - so make sure it's in there.

Of course now that I read your email a bit more carefully, it seems
your issue is with the "saving config space" messages. I'm not sure
I've seen those before. Perhaps you have some sort of debug enabled.
I'd find where in the kernel they are being produced, and what the
conditions for it are. But the failure to load firmware isn't great --
not 100% sure if it impacts runpm or not.

I just double-checked, TU10x accel came in via
afa3b96b058d87c2c44d1c83dadb2ba6998d03ce, which was first in v5.6.
Initial TU10x support came in v5.0. So that doesn't line up with your
timeline.

Anyways, I'd definitely sort the firmware situation out, but it may
not be the cause of your problem.

Cheers,

  -ilia

  reply	other threads:[~2020-12-29 16:34 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-04 12:39 [PATCH v2 0/2] PCI: Add missing link delays Mika Westerberg
2019-10-04 12:39 ` [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay() Mika Westerberg
2020-08-08 20:22   ` Marc MERLIN
2020-08-08 20:23     ` Marc MERLIN
2020-08-09 16:31     ` Marc MERLIN
2020-09-06 18:18     ` pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73) Marc MERLIN
2020-09-06 18:26       ` Matthias Andree
2020-09-07 19:14       ` [Nouveau] " Karol Herbst
2020-09-07 20:58         ` Marc MERLIN
2020-09-07 23:51           ` Karol Herbst
2020-09-08  0:29             ` Marc MERLIN
2020-09-13 20:15               ` Marc MERLIN
     [not found]               ` <20200529180315.GA18804@merlins.org>
2020-12-26 11:12                 ` 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Marc MERLIN
2020-12-27 18:28                   ` [Nouveau] " Ilia Mirkin
2021-01-27 21:33                   ` Bjorn Helgaas
2021-01-28 20:59                     ` Bjorn Helgaas
2021-01-29  0:56                     ` Marc MERLIN
2021-01-29 21:20                       ` Bjorn Helgaas
2021-01-30  2:04                         ` Marc MERLIN
2020-12-29 15:51                 ` Marc MERLIN
2020-12-29 16:33                   ` Ilia Mirkin [this message]
2019-10-04 12:39 ` [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec Mika Westerberg
2019-10-26 14:19   ` Bjorn Helgaas
2019-10-28 11:28     ` Mika Westerberg
2019-10-28 13:42       ` Bjorn Helgaas
2019-10-28 18:06         ` Mika Westerberg
2019-10-28 20:16           ` Bjorn Helgaas
2019-10-29 11:15             ` Mika Westerberg
2019-10-29 20:27               ` Bjorn Helgaas
2019-10-30 11:15                 ` Mika Westerberg
2019-10-31 22:31                   ` Bjorn Helgaas
2019-11-01 11:19                     ` Mika Westerberg
2019-11-05  0:00                       ` Bjorn Helgaas
2019-11-05  9:54                         ` Mika Westerberg
2019-11-05 12:58                           ` Mika Westerberg
2019-11-05 20:01                             ` Bjorn Helgaas
2019-11-06 13:31                               ` Mika Westerberg
2019-11-05 15:00                           ` Bjorn Helgaas
2019-11-05 15:28                             ` Mika Westerberg
2019-11-05 16:10                               ` Bjorn Helgaas
2019-11-06 13:29                                 ` Mika Westerberg
2019-10-29 20:54   ` Bjorn Helgaas
2019-10-30 11:33     ` Mika Westerberg
2019-10-04 12:57 ` [PATCH v2 0/2] PCI: Add missing link delays Matthias Andree
2019-10-04 13:06   ` Mika Westerberg
2019-10-05  7:34     ` Matthias Andree
2019-10-07  9:32       ` Mika Westerberg
2019-10-07 15:15         ` Matthias Andree
2019-10-08  9:05           ` Mika Westerberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKb7UviFP_YVxC4PO7MDNnw6NDrD=3BCGF37umwAfaimjbX9Pw@mail.gmail.com' \
    --to=imirkin@alum.mit.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=marc_nouveau@merlins.org \
    --cc=mika.westerberg@linux.intel.com \
    --cc=nouveau@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).