From: Ilia Mirkin <imirkin@alum.mit.edu> To: Marc MERLIN <marc_nouveau@merlins.org> Cc: nouveau <nouveau@lists.freedesktop.org>, Mika Westerberg <mika.westerberg@linux.intel.com>, LKML <linux-kernel@vger.kernel.org>, Linux PCI <linux-pci@vger.kernel.org> Subject: Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Date: Tue, 29 Dec 2020 11:33:16 -0500 [thread overview] Message-ID: <CAKb7UviFP_YVxC4PO7MDNnw6NDrD=3BCGF37umwAfaimjbX9Pw@mail.gmail.com> (raw) In-Reply-To: <20201229155159.GG23389@merlins.org> On Tue, Dec 29, 2020 at 10:52 AM Marc MERLIN <marc_nouveau@merlins.org> wrote: > > On Sat, Dec 26, 2020 at 03:12:09AM -0800, Ilia Mirkin wrote: > > > after boot, when it gets the right trigger (not sure which ones), it > > > loops on this evern 2 seconds, mostly forever. > > > > The gpu suspends with runtime pm. And then gets woken up for some > > reason (could be something quite silly, like lspci, or could be > > something explicitly checking connectors, etc). Repeat. > > Ah, fair point. Could it be powertop even? > How would I go towards tracing that? > Sounds like this would be a problem with all chips if userspace is able > to wake them up every second or two with a probe. Now I wonder what > broken userspace I have that could be doing this. Well, it's a theory. Some userspace helpfully prevents the GPU from suspending entirely, unfortunately I don't remember its name though by messing with the attached audio device. It's very common and meant to help... oh well. > > > Display offload usually requires acceleration -- the copies are done > > using the DMA engine. Please make sure that you have firmware > > available (and a new enough mesa). The errors suggest that you don't > > have firmware available at the time that nouveau loads. Depending on > > your setup, that might mean the firmware has to be built into the > > kernel, or available in initramfs. (Or just regular filesystem if you > > don't use a complicated boot sequence. But many people go with distro > > defaults, which do have this complexity.) > > Hi Ilia, thanks for your answer. > > Do you think that could be a reason why the boot would hang for 2 full minutes at every > boot ever since I upgraded to 5.5? I'd have to check, but I'm guessing TU104 acceleration became a thing in 5.5. I would also not be very surprised if the code didn't handle failure extremely gracefully - there definitely have been problems with that in the past. > > Also, without wanting to sound like a full newbie, where is that > firmware you're talking about? In my kernel source? > > Here's what I do have: > sauron:/usr/local/bin# dpkggrep nouveau > libdrm-nouveau2:amd64 install > xserver-xorg-video-nouveau install > > no nouveau-firmware package in debian: > sauron:/usr/local/bin# apt-cache search nouveau > bumblebee - NVIDIA Optimus support for Linux > libdrm-nouveau2 - Userspace interface to nouveau-specific kernel DRM services -- runtime > xfonts-jmk - Jim Knoble's character-cell fonts for X > xserver-xorg-video-nouveau - X.Org X server -- Nouveau display driver > > No firmware file on my disk: > sauron:/usr/local/bin# find /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/ /lib/firmware/ |grep nouveau > /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau > /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko > sauron:/usr/local/bin# > > The kernel module is in my initrd: > sauron:/usr/local/bin# dd if=/boot/initrd.img-5.9.11-amd64-preempt-sysrq-20190817 bs=2966528 skip=1 | gunzip | cpio -tdv | grep nouveau > drwxr-xr-x 1 root root 0 Nov 30 15:40 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau > -rw-r--r-- 1 root root 3691385 Nov 30 15:35 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko > 17+1 records in > 17+1 records out > 52566778 bytes (53 MB, 50 MiB) copied, 1.69708 s, 31.0 MB/s I think that gets you out of "full newbie" land... > > What am I supposed to do/check next? > > Note that ultimately I only need nouveau not to hang my boot 2mn and do > PM so that the nvidia chip goes to sleep since I don't use it. I'm not extremely familiar with debian packaging, but the firmware is provided by NVIDIA and shipped as part of linux-firmware: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/nvidia This needs to be available at /lib/firmware/nvidia when nouveau loads. Based on your email above, it's most likely that it would load from the initrd - so make sure it's in there. Of course now that I read your email a bit more carefully, it seems your issue is with the "saving config space" messages. I'm not sure I've seen those before. Perhaps you have some sort of debug enabled. I'd find where in the kernel they are being produced, and what the conditions for it are. But the failure to load firmware isn't great -- not 100% sure if it impacts runpm or not. I just double-checked, TU10x accel came in via afa3b96b058d87c2c44d1c83dadb2ba6998d03ce, which was first in v5.6. Initial TU10x support came in v5.0. So that doesn't line up with your timeline. Anyways, I'd definitely sort the firmware situation out, but it may not be the cause of your problem. Cheers, -ilia
WARNING: multiple messages have this Message-ID (diff)
From: Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> To: Marc MERLIN <marc_nouveau-xnduUnryOU1AfugRpC6u6w@public.gmane.org> Cc: nouveau <nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>, Mika Westerberg <mika.westerberg-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>, LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, Linux PCI <linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> Subject: Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Date: Tue, 29 Dec 2020 11:33:16 -0500 [thread overview] Message-ID: <CAKb7UviFP_YVxC4PO7MDNnw6NDrD=3BCGF37umwAfaimjbX9Pw@mail.gmail.com> (raw) In-Reply-To: <20201229155159.GG23389-xnduUnryOU1AfugRpC6u6w@public.gmane.org> On Tue, Dec 29, 2020 at 10:52 AM Marc MERLIN <marc_nouveau-xnduUnryOU1AfugRpC6u6w@public.gmane.org> wrote: > > On Sat, Dec 26, 2020 at 03:12:09AM -0800, Ilia Mirkin wrote: > > > after boot, when it gets the right trigger (not sure which ones), it > > > loops on this evern 2 seconds, mostly forever. > > > > The gpu suspends with runtime pm. And then gets woken up for some > > reason (could be something quite silly, like lspci, or could be > > something explicitly checking connectors, etc). Repeat. > > Ah, fair point. Could it be powertop even? > How would I go towards tracing that? > Sounds like this would be a problem with all chips if userspace is able > to wake them up every second or two with a probe. Now I wonder what > broken userspace I have that could be doing this. Well, it's a theory. Some userspace helpfully prevents the GPU from suspending entirely, unfortunately I don't remember its name though by messing with the attached audio device. It's very common and meant to help... oh well. > > > Display offload usually requires acceleration -- the copies are done > > using the DMA engine. Please make sure that you have firmware > > available (and a new enough mesa). The errors suggest that you don't > > have firmware available at the time that nouveau loads. Depending on > > your setup, that might mean the firmware has to be built into the > > kernel, or available in initramfs. (Or just regular filesystem if you > > don't use a complicated boot sequence. But many people go with distro > > defaults, which do have this complexity.) > > Hi Ilia, thanks for your answer. > > Do you think that could be a reason why the boot would hang for 2 full minutes at every > boot ever since I upgraded to 5.5? I'd have to check, but I'm guessing TU104 acceleration became a thing in 5.5. I would also not be very surprised if the code didn't handle failure extremely gracefully - there definitely have been problems with that in the past. > > Also, without wanting to sound like a full newbie, where is that > firmware you're talking about? In my kernel source? > > Here's what I do have: > sauron:/usr/local/bin# dpkggrep nouveau > libdrm-nouveau2:amd64 install > xserver-xorg-video-nouveau install > > no nouveau-firmware package in debian: > sauron:/usr/local/bin# apt-cache search nouveau > bumblebee - NVIDIA Optimus support for Linux > libdrm-nouveau2 - Userspace interface to nouveau-specific kernel DRM services -- runtime > xfonts-jmk - Jim Knoble's character-cell fonts for X > xserver-xorg-video-nouveau - X.Org X server -- Nouveau display driver > > No firmware file on my disk: > sauron:/usr/local/bin# find /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/ /lib/firmware/ |grep nouveau > /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau > /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko > sauron:/usr/local/bin# > > The kernel module is in my initrd: > sauron:/usr/local/bin# dd if=/boot/initrd.img-5.9.11-amd64-preempt-sysrq-20190817 bs=2966528 skip=1 | gunzip | cpio -tdv | grep nouveau > drwxr-xr-x 1 root root 0 Nov 30 15:40 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau > -rw-r--r-- 1 root root 3691385 Nov 30 15:35 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko > 17+1 records in > 17+1 records out > 52566778 bytes (53 MB, 50 MiB) copied, 1.69708 s, 31.0 MB/s I think that gets you out of "full newbie" land... > > What am I supposed to do/check next? > > Note that ultimately I only need nouveau not to hang my boot 2mn and do > PM so that the nvidia chip goes to sleep since I don't use it. I'm not extremely familiar with debian packaging, but the firmware is provided by NVIDIA and shipped as part of linux-firmware: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/nvidia This needs to be available at /lib/firmware/nvidia when nouveau loads. Based on your email above, it's most likely that it would load from the initrd - so make sure it's in there. Of course now that I read your email a bit more carefully, it seems your issue is with the "saving config space" messages. I'm not sure I've seen those before. Perhaps you have some sort of debug enabled. I'd find where in the kernel they are being produced, and what the conditions for it are. But the failure to load firmware isn't great -- not 100% sure if it impacts runpm or not. I just double-checked, TU10x accel came in via afa3b96b058d87c2c44d1c83dadb2ba6998d03ce, which was first in v5.6. Initial TU10x support came in v5.0. So that doesn't line up with your timeline. Anyways, I'd definitely sort the firmware situation out, but it may not be the cause of your problem. Cheers, -ilia
next prev parent reply other threads:[~2020-12-29 16:34 UTC|newest] Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-10-04 12:39 [PATCH v2 0/2] PCI: Add missing link delays Mika Westerberg 2019-10-04 12:39 ` [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay() Mika Westerberg 2020-08-08 20:22 ` Marc MERLIN 2020-08-08 20:23 ` Marc MERLIN 2020-08-09 16:31 ` Marc MERLIN 2020-09-06 18:18 ` pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73) Marc MERLIN 2020-09-06 18:18 ` Marc MERLIN 2020-09-06 18:26 ` Matthias Andree 2020-09-07 19:14 ` [Nouveau] " Karol Herbst 2020-09-07 19:14 ` Karol Herbst 2020-09-07 20:58 ` [Nouveau] " Marc MERLIN 2020-09-07 20:58 ` Marc MERLIN 2020-09-07 23:51 ` [Nouveau] " Karol Herbst 2020-09-07 23:51 ` Karol Herbst 2020-09-08 0:29 ` [Nouveau] " Marc MERLIN 2020-05-29 18:03 ` 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics? Marc MERLIN [not found] ` <20200529180315.GA18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org> 2020-05-29 18:53 ` Ilia Mirkin [not found] ` <CAKb7Uvhw2EYo1RR-=NGgLO3CU9QTRWchcAw1injffybZbJ-zOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2020-05-29 19:46 ` Marc MERLIN [not found] ` <20200529194605.GB18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org> 2020-05-30 17:32 ` Karol Herbst 2023-04-19 6:49 ` [Nouveau] 6.1 still cannot get display on Thinkpad P73Quadro " Marc MERLIN 2023-04-21 5:46 ` [Nouveau] 6.2 still cannot get hdmi display out on Thinkpad P73 Quadro RTX 4000 Mobile/TU104 Marc MERLIN [not found] ` <CACO55tsvY0t_z986VVoYCvxuBASdZ+rQcDtZ_dAtQR60NLmQQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2020-05-31 18:31 ` 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics? Marc MERLIN 2020-12-26 11:12 ` 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Marc MERLIN 2020-12-26 11:12 ` Marc MERLIN 2020-12-27 18:28 ` [Nouveau] " Ilia Mirkin 2020-12-27 18:28 ` Ilia Mirkin 2021-01-27 21:33 ` Bjorn Helgaas 2021-01-27 21:33 ` Bjorn Helgaas 2021-01-28 20:59 ` Bjorn Helgaas 2021-01-28 20:59 ` [Nouveau] " Bjorn Helgaas 2021-01-29 0:56 ` Marc MERLIN 2021-01-29 0:56 ` [Nouveau] " Marc MERLIN 2021-01-29 21:20 ` Bjorn Helgaas 2021-01-29 21:20 ` [Nouveau] " Bjorn Helgaas 2021-01-30 2:04 ` Marc MERLIN 2021-01-30 2:04 ` [Nouveau] " Marc MERLIN 2021-05-05 21:42 ` [Nouveau] 5.12.1 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau] Marc MERLIN 2021-05-06 14:50 ` Bjorn Helgaas 2021-05-25 3:13 ` Ben Skeggs 2020-12-29 15:51 ` 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Marc MERLIN 2020-12-29 15:51 ` Marc MERLIN 2020-12-29 16:33 ` Ilia Mirkin [this message] 2020-12-29 16:33 ` Ilia Mirkin [not found] ` <CAKb7UviFP_YVxC4PO7MDNnw6NDrD=3BCGF37umwAfaimjbX9Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2020-12-29 17:47 ` Marc MERLIN [not found] ` <20201229174750.GI23389-xnduUnryOU1AfugRpC6u6w@public.gmane.org> 2021-01-04 11:49 ` Marc MERLIN [not found] ` <20210104114955.GM32533-xnduUnryOU1AfugRpC6u6w@public.gmane.org> 2021-01-04 13:28 ` Karol Herbst [not found] ` <CACO55tsdG37YKv7FV2er4hRnXk9vmwMbPuPptA+=ZtziWXC2+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2021-01-07 11:49 ` Marc MERLIN 2020-12-30 12:16 ` ael 2020-09-13 20:15 ` [Nouveau] pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73) Marc MERLIN 2020-09-13 20:15 ` Marc MERLIN [not found] ` <20200913201545.GL2622-xnduUnryOU1AfugRpC6u6w@public.gmane.org> 2020-09-19 23:18 ` Marc MERLIN 2019-10-04 12:39 ` [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec Mika Westerberg 2019-10-26 14:19 ` Bjorn Helgaas 2019-10-28 11:28 ` Mika Westerberg 2019-10-28 13:42 ` Bjorn Helgaas 2019-10-28 18:06 ` Mika Westerberg 2019-10-28 20:16 ` Bjorn Helgaas 2019-10-29 11:15 ` Mika Westerberg 2019-10-29 20:27 ` Bjorn Helgaas 2019-10-30 11:15 ` Mika Westerberg 2019-10-31 22:31 ` Bjorn Helgaas 2019-11-01 11:19 ` Mika Westerberg 2019-11-05 0:00 ` Bjorn Helgaas 2019-11-05 9:54 ` Mika Westerberg 2019-11-05 12:58 ` Mika Westerberg 2019-11-05 20:01 ` Bjorn Helgaas 2019-11-06 13:31 ` Mika Westerberg 2019-11-05 15:00 ` Bjorn Helgaas 2019-11-05 15:28 ` Mika Westerberg 2019-11-05 16:10 ` Bjorn Helgaas 2019-11-06 13:29 ` Mika Westerberg 2019-10-29 20:54 ` Bjorn Helgaas 2019-10-30 11:33 ` Mika Westerberg 2019-10-04 12:57 ` [PATCH v2 0/2] PCI: Add missing link delays Matthias Andree 2019-10-04 13:06 ` Mika Westerberg 2019-10-05 7:34 ` Matthias Andree 2019-10-07 9:32 ` Mika Westerberg 2019-10-07 15:15 ` Matthias Andree 2019-10-08 9:05 ` Mika Westerberg
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAKb7UviFP_YVxC4PO7MDNnw6NDrD=3BCGF37umwAfaimjbX9Pw@mail.gmail.com' \ --to=imirkin@alum.mit.edu \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-pci@vger.kernel.org \ --cc=marc_nouveau@merlins.org \ --cc=mika.westerberg@linux.intel.com \ --cc=nouveau@lists.freedesktop.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.