All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Marc MERLIN <marc_nouveau@merlins.org>
Cc: nouveau@lists.freedesktop.org,
	Mika Westerberg <mika.westerberg@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux PCI <linux-pci@vger.kernel.org>
Subject: Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
Date: Fri, 29 Jan 2021 15:20:32 -0600	[thread overview]
Message-ID: <20210129212032.GA99457@bjorn-Precision-5520> (raw)
In-Reply-To: <20210129005626.GP29348@merlins.org>

On Thu, Jan 28, 2021 at 04:56:26PM -0800, Marc MERLIN wrote:
> On Wed, Jan 27, 2021 at 03:33:00PM -0600, Bjorn Helgaas wrote:
> > Hi Marc, I appreciate your persistence on this.  I am frankly
> > surprised that you've put up with this so long.
>  
> Well, been using linux for 27 years, but also it's not like I have much
> of a choice outside of switching to windows, as tempting as it's getting
> sometimes ;)
> 
> > > after boot, when it gets the right trigger (not sure which ones), it
> > > loops on this evern 2 seconds, mostly forever.
> > > 
> > > I'm not sure if it's nouveau's fault or the kernel's PCI PME's fault, or something else.
> > 
> > IIUC there are basically two problems:
> > 
> >   1) A 2 minute delay during boot
> > Another random thought: is there any chance the boot delay could be
> > related to crypto waiting for entropy?
> 
> So, the 2mn hang went away after I added the nouveau firwmare in initrd.
> The only problem is that the nouveau driver does not give a very good
> clue as to what's going on and what to do.
>
> For comparison the intel iwlwifi driver is very clear about firmware
> it's trying to load, if it can't and what exact firmware you need to
> find on the internet (filename)

I guess you're referring to this in iwl_request_firmware()?

  IWL_ERR(drv, "check git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git\n"); 

How can we fix this in nouveau so we don't have the debug this again?
I don't really know how firmware loading works, but "git grep -A5
request_firmware drivers/gpu/drm/nouveau/" shows that we generally
print something when request_firmware() fails.

But I didn't notice those messages in your logs, so I'm probably
barking up the wrong tree.

> >   2) Some sort of event every 2 seconds that kills your battery life
> > Your machine doesn't sound unusual, and I haven't seen a flood of
> > similar reports, so maybe there's something unusual about your config.
> > But I really don't have any guesses for either one.
> 
> Honestly, there are not too many thinpad P73 running linux out there. I
> wouldn't be surprised if it's only a handful or two.
> 
> > It sounds like v5.5 worked fine and you first noticed the slow boot
> > problem in v5.8.  We *could* try to bisect it, but I know that's a lot
> > of work on your part.
> 
> I've done that in the past, to be honest now that it works after I added
> the firmware that nouveau started needing, and didn't need before, the
> hang at boot is gone for sure.
> The PCI PM wakeup issues on batteries happen sometimes still, but they
> are much more rare now.

So maybe the wakeups are related to having vs not having the nouveau
firmware?  I'm still curious about that, and it smells like a bug to
me, but probably something to do with nouveau where I have no hope of
debugging it.

> > Grasping for any ideas for the boot delay; could you boot with
> > "initcall_debug" and collect your "lsmod" output?  I notice async_tx
> > in some of your logs, but I have no idea what it is.  It's from
> > crypto, so possibly somewhat unusual?
> 
> Is this still neeeded? I think of nouveau does a better job of helping
> the user correct the issue if firmware is missing (I think intel even
> gives a URL in printk), that would probably be what's needed for the
> most part.

Nope, don't bother with this, thanks.

Bjorn

WARNING: multiple messages have this Message-ID (diff)
From: Bjorn Helgaas <helgaas@kernel.org>
To: Marc MERLIN <marc_nouveau@merlins.org>
Cc: nouveau@lists.freedesktop.org,
	Mika Westerberg <mika.westerberg@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux PCI <linux-pci@vger.kernel.org>
Subject: Re: [Nouveau] 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
Date: Fri, 29 Jan 2021 15:20:32 -0600	[thread overview]
Message-ID: <20210129212032.GA99457@bjorn-Precision-5520> (raw)
In-Reply-To: <20210129005626.GP29348@merlins.org>

On Thu, Jan 28, 2021 at 04:56:26PM -0800, Marc MERLIN wrote:
> On Wed, Jan 27, 2021 at 03:33:00PM -0600, Bjorn Helgaas wrote:
> > Hi Marc, I appreciate your persistence on this.  I am frankly
> > surprised that you've put up with this so long.
>  
> Well, been using linux for 27 years, but also it's not like I have much
> of a choice outside of switching to windows, as tempting as it's getting
> sometimes ;)
> 
> > > after boot, when it gets the right trigger (not sure which ones), it
> > > loops on this evern 2 seconds, mostly forever.
> > > 
> > > I'm not sure if it's nouveau's fault or the kernel's PCI PME's fault, or something else.
> > 
> > IIUC there are basically two problems:
> > 
> >   1) A 2 minute delay during boot
> > Another random thought: is there any chance the boot delay could be
> > related to crypto waiting for entropy?
> 
> So, the 2mn hang went away after I added the nouveau firwmare in initrd.
> The only problem is that the nouveau driver does not give a very good
> clue as to what's going on and what to do.
>
> For comparison the intel iwlwifi driver is very clear about firmware
> it's trying to load, if it can't and what exact firmware you need to
> find on the internet (filename)

I guess you're referring to this in iwl_request_firmware()?

  IWL_ERR(drv, "check git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git\n"); 

How can we fix this in nouveau so we don't have the debug this again?
I don't really know how firmware loading works, but "git grep -A5
request_firmware drivers/gpu/drm/nouveau/" shows that we generally
print something when request_firmware() fails.

But I didn't notice those messages in your logs, so I'm probably
barking up the wrong tree.

> >   2) Some sort of event every 2 seconds that kills your battery life
> > Your machine doesn't sound unusual, and I haven't seen a flood of
> > similar reports, so maybe there's something unusual about your config.
> > But I really don't have any guesses for either one.
> 
> Honestly, there are not too many thinpad P73 running linux out there. I
> wouldn't be surprised if it's only a handful or two.
> 
> > It sounds like v5.5 worked fine and you first noticed the slow boot
> > problem in v5.8.  We *could* try to bisect it, but I know that's a lot
> > of work on your part.
> 
> I've done that in the past, to be honest now that it works after I added
> the firmware that nouveau started needing, and didn't need before, the
> hang at boot is gone for sure.
> The PCI PM wakeup issues on batteries happen sometimes still, but they
> are much more rare now.

So maybe the wakeups are related to having vs not having the nouveau
firmware?  I'm still curious about that, and it smells like a bug to
me, but probably something to do with nouveau where I have no hope of
debugging it.

> > Grasping for any ideas for the boot delay; could you boot with
> > "initcall_debug" and collect your "lsmod" output?  I notice async_tx
> > in some of your logs, but I have no idea what it is.  It's from
> > crypto, so possibly somewhat unusual?
> 
> Is this still neeeded? I think of nouveau does a better job of helping
> the user correct the issue if firmware is missing (I think intel even
> gives a URL in printk), that would probably be what's needed for the
> most part.

Nope, don't bother with this, thanks.

Bjorn
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

  reply	other threads:[~2021-01-29 21:21 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-04 12:39 [PATCH v2 0/2] PCI: Add missing link delays Mika Westerberg
2019-10-04 12:39 ` [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay() Mika Westerberg
2020-08-08 20:22   ` Marc MERLIN
2020-08-08 20:23     ` Marc MERLIN
2020-08-09 16:31     ` Marc MERLIN
2020-09-06 18:18     ` pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73) Marc MERLIN
2020-09-06 18:18       ` Marc MERLIN
2020-09-06 18:26       ` Matthias Andree
2020-09-07 19:14       ` [Nouveau] " Karol Herbst
2020-09-07 19:14         ` Karol Herbst
2020-09-07 20:58         ` [Nouveau] " Marc MERLIN
2020-09-07 20:58           ` Marc MERLIN
2020-09-07 23:51           ` [Nouveau] " Karol Herbst
2020-09-07 23:51             ` Karol Herbst
2020-09-08  0:29             ` [Nouveau] " Marc MERLIN
2020-05-29 18:03               ` 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics? Marc MERLIN
     [not found]                 ` <20200529180315.GA18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-05-29 18:53                   ` Ilia Mirkin
     [not found]                     ` <CAKb7Uvhw2EYo1RR-=NGgLO3CU9QTRWchcAw1injffybZbJ-zOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-05-29 19:46                       ` Marc MERLIN
     [not found]                         ` <20200529194605.GB18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-05-30 17:32                           ` Karol Herbst
2023-04-19  6:49                         ` [Nouveau] 6.1 still cannot get display on Thinkpad P73Quadro " Marc MERLIN
2023-04-21  5:46                           ` [Nouveau] 6.2 still cannot get hdmi display out on Thinkpad P73 Quadro RTX 4000 Mobile/TU104 Marc MERLIN
     [not found]                       ` <CACO55tsvY0t_z986VVoYCvxuBASdZ+rQcDtZ_dAtQR60NLmQQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-05-31 18:31                         ` 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics? Marc MERLIN
2020-12-26 11:12                 ` 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Marc MERLIN
2020-12-26 11:12                   ` Marc MERLIN
2020-12-27 18:28                   ` [Nouveau] " Ilia Mirkin
2020-12-27 18:28                     ` Ilia Mirkin
2021-01-27 21:33                   ` Bjorn Helgaas
2021-01-27 21:33                     ` Bjorn Helgaas
2021-01-28 20:59                     ` Bjorn Helgaas
2021-01-28 20:59                       ` [Nouveau] " Bjorn Helgaas
2021-01-29  0:56                     ` Marc MERLIN
2021-01-29  0:56                       ` [Nouveau] " Marc MERLIN
2021-01-29 21:20                       ` Bjorn Helgaas [this message]
2021-01-29 21:20                         ` Bjorn Helgaas
2021-01-30  2:04                         ` Marc MERLIN
2021-01-30  2:04                           ` [Nouveau] " Marc MERLIN
2021-05-05 21:42                           ` [Nouveau] 5.12.1 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau] Marc MERLIN
2021-05-06 14:50                             ` Bjorn Helgaas
2021-05-25  3:13                               ` Ben Skeggs
2020-12-29 15:51                 ` 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Marc MERLIN
2020-12-29 15:51                   ` Marc MERLIN
2020-12-29 16:33                   ` Ilia Mirkin
2020-12-29 16:33                     ` Ilia Mirkin
     [not found]                     ` <CAKb7UviFP_YVxC4PO7MDNnw6NDrD=3BCGF37umwAfaimjbX9Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-12-29 17:47                       ` Marc MERLIN
     [not found]                         ` <20201229174750.GI23389-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2021-01-04 11:49                           ` Marc MERLIN
     [not found]                             ` <20210104114955.GM32533-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2021-01-04 13:28                               ` Karol Herbst
     [not found]                                 ` <CACO55tsdG37YKv7FV2er4hRnXk9vmwMbPuPptA+=ZtziWXC2+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-01-07 11:49                                   ` Marc MERLIN
2020-12-30 12:16                       ` ael
2020-09-13 20:15               ` [Nouveau] pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73) Marc MERLIN
2020-09-13 20:15                 ` Marc MERLIN
     [not found]                 ` <20200913201545.GL2622-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-09-19 23:18                   ` Marc MERLIN
2019-10-04 12:39 ` [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec Mika Westerberg
2019-10-26 14:19   ` Bjorn Helgaas
2019-10-28 11:28     ` Mika Westerberg
2019-10-28 13:42       ` Bjorn Helgaas
2019-10-28 18:06         ` Mika Westerberg
2019-10-28 20:16           ` Bjorn Helgaas
2019-10-29 11:15             ` Mika Westerberg
2019-10-29 20:27               ` Bjorn Helgaas
2019-10-30 11:15                 ` Mika Westerberg
2019-10-31 22:31                   ` Bjorn Helgaas
2019-11-01 11:19                     ` Mika Westerberg
2019-11-05  0:00                       ` Bjorn Helgaas
2019-11-05  9:54                         ` Mika Westerberg
2019-11-05 12:58                           ` Mika Westerberg
2019-11-05 20:01                             ` Bjorn Helgaas
2019-11-06 13:31                               ` Mika Westerberg
2019-11-05 15:00                           ` Bjorn Helgaas
2019-11-05 15:28                             ` Mika Westerberg
2019-11-05 16:10                               ` Bjorn Helgaas
2019-11-06 13:29                                 ` Mika Westerberg
2019-10-29 20:54   ` Bjorn Helgaas
2019-10-30 11:33     ` Mika Westerberg
2019-10-04 12:57 ` [PATCH v2 0/2] PCI: Add missing link delays Matthias Andree
2019-10-04 13:06   ` Mika Westerberg
2019-10-05  7:34     ` Matthias Andree
2019-10-07  9:32       ` Mika Westerberg
2019-10-07 15:15         ` Matthias Andree
2019-10-08  9:05           ` Mika Westerberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210129212032.GA99457@bjorn-Precision-5520 \
    --to=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=marc_nouveau@merlins.org \
    --cc=mika.westerberg@linux.intel.com \
    --cc=nouveau@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.