stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Regression/boot failure on 5.16.3
@ 2022-02-04  0:19 Jason Self
  2022-02-04  7:00 ` Greg KH
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jason Self @ 2022-02-04  0:19 UTC (permalink / raw)
  To: stable

[-- Attachment #1: Type: text/plain, Size: 1557 bytes --]

The computer (amd64) fails to boot. The init was stuck at the
synchronization of the time through the network. This began between
5.16.2 (good) and 5.16.3 (bad.) This continues on 5.16.4 and 5.16.5.
Git bisect revealed the following. In this case the nonfree firmwre is
not present on the system. Blacklisting the iwflwifi module works as a
workaround for now.

6b5ad4bd0d78fef6bbe0ecdf96e09237c9c52cc1 is the first bad commit
commit 6b5ad4bd0d78fef6bbe0ecdf96e09237c9c52cc1
Author: Johannes Berg <johannes.berg@intel.com>
Date:   Fri Dec 10 11:12:42 2021 +0200

    iwlwifi: fix leaks/bad data after failed firmware load
    
    [ Upstream commit ab07506b0454bea606095951e19e72c282bfbb42 ]
    
    If firmware load fails after having loaded some parts of the
    firmware, e.g. the IML image, then this would leak. For the
    host command list we'd end up running into a WARN on the next
    attempt to load another firmware image.
    
    Fix this by calling iwl_dealloc_ucode() on failures, and make
    that also clear the data so we start fresh on the next round.
    
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
    Link:
    https://lore.kernel.org/r/iwlwifi.20211210110539.1f742f0eb58a.I1315f22f6aa632d94ae2069f85e1bca5e734dce0@changeid
    Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 drivers/net/wireless/intel/iwlwifi/iwl-drv.c | 8 ++++++++
 1 file changed, 8 insertions(+)

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression/boot failure on 5.16.3
  2022-02-04  0:19 Regression/boot failure on 5.16.3 Jason Self
@ 2022-02-04  7:00 ` Greg KH
  2022-02-04  8:48 ` Thorsten Leemhuis
  2022-02-08  8:50 ` Stefan Agner
  2 siblings, 0 replies; 6+ messages in thread
From: Greg KH @ 2022-02-04  7:00 UTC (permalink / raw)
  To: Jason Self; +Cc: stable

On Thu, Feb 03, 2022 at 04:19:59PM -0800, Jason Self wrote:
> The computer (amd64) fails to boot. The init was stuck at the
> synchronization of the time through the network. This began between
> 5.16.2 (good) and 5.16.3 (bad.) This continues on 5.16.4 and 5.16.5.
> Git bisect revealed the following. In this case the nonfree firmwre is
> not present on the system. Blacklisting the iwflwifi module works as a
> workaround for now.
> 
> 6b5ad4bd0d78fef6bbe0ecdf96e09237c9c52cc1 is the first bad commit
> commit 6b5ad4bd0d78fef6bbe0ecdf96e09237c9c52cc1
> Author: Johannes Berg <johannes.berg@intel.com>
> Date:   Fri Dec 10 11:12:42 2021 +0200
> 
>     iwlwifi: fix leaks/bad data after failed firmware load
>     
>     [ Upstream commit ab07506b0454bea606095951e19e72c282bfbb42 ]
>     
>     If firmware load fails after having loaded some parts of the
>     firmware, e.g. the IML image, then this would leak. For the
>     host command list we'd end up running into a WARN on the next
>     attempt to load another firmware image.
>     
>     Fix this by calling iwl_dealloc_ucode() on failures, and make
>     that also clear the data so we start fresh on the next round.
>     
>     Signed-off-by: Johannes Berg <johannes.berg@intel.com>
>     Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
>     Link:
>     https://lore.kernel.org/r/iwlwifi.20211210110539.1f742f0eb58a.I1315f22f6aa632d94ae2069f85e1bca5e734dce0@changeid
>     Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
>     Signed-off-by: Sasha Levin <sashal@kernel.org>
> 
>  drivers/net/wireless/intel/iwlwifi/iwl-drv.c | 8 ++++++++
>  1 file changed, 8 insertions(+)

Please cc: the authors of this commit, and the upstream wireless
developers so they can help you out here as I think the same issue shows
up in 5.17-rc2, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression/boot failure on 5.16.3
  2022-02-04  0:19 Regression/boot failure on 5.16.3 Jason Self
  2022-02-04  7:00 ` Greg KH
@ 2022-02-04  8:48 ` Thorsten Leemhuis
  2022-02-08  8:50 ` Stefan Agner
  2 siblings, 0 replies; 6+ messages in thread
From: Thorsten Leemhuis @ 2022-02-04  8:48 UTC (permalink / raw)
  To: Jason Self, stable; +Cc: regressions

[TLDR: I'm adding this regression to regzbot, the Linux kernel
regression tracking bot; most text you find below is compiled from a few
templates paragraphs some of you might have seen already.]

Hi, this is your Linux kernel regression tracker speaking.

Adding the regression mailing list to the list of recipients, as it
should be in the loop for all regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

On 04.02.22 01:19, Jason Self wrote:
> The computer (amd64) fails to boot. The init was stuck at the
> synchronization of the time through the network. This began between
> 5.16.2 (good) and 5.16.3 (bad.) This continues on 5.16.4 and 5.16.5.
> Git bisect revealed the following. In this case the nonfree firmwre is
> not present on the system. Blacklisting the iwflwifi module works as a
> workaround for now.
> 
> 6b5ad4bd0d78fef6bbe0ecdf96e09237c9c52cc1 is the first bad commit
> commit 6b5ad4bd0d78fef6bbe0ecdf96e09237c9c52cc1
> Author: Johannes Berg <johannes.berg@intel.com>
> Date:   Fri Dec 10 11:12:42 2021 +0200

To be sure this issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced 6b5ad4bd0d78fef6bbe0ecdf96e09237c9c52cc1
#regzbot title net: iwlwifi: system fails to boot since 5.16.3
#regzbot ignore-activity

Reminder: when fixing the issue, please add a 'Link:' tag with the URL
to the report (the parent of this mail) using the kernel.org redirector,
as explained in 'Documentation/process/submitting-patches.rst'. Regzbot
then will automatically mark the regression as resolved once the fix
lands in the appropriate tree. For more details about regzbot see footer.

Sending this to everyone that got the initial report, to make all aware
of the tracking. I also hope that messages like this motivate people to
directly get at least the regression mailing list and ideally even
regzbot involved when dealing with regressions, as messages like this
wouldn't be needed then.

Don't worry, I'll send further messages wrt to this regression just to
the lists (with a tag in the subject so people can filter them away), as
long as they are intended just for regzbot. With a bit of luck no such
messages will be needed anyway.

Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat)

P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply, that's in everyone's interest.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.


>     iwlwifi: fix leaks/bad data after failed firmware load
>     
>     [ Upstream commit ab07506b0454bea606095951e19e72c282bfbb42 ]
>     
>     If firmware load fails after having loaded some parts of the
>     firmware, e.g. the IML image, then this would leak. For the
>     host command list we'd end up running into a WARN on the next
>     attempt to load another firmware image.
>     
>     Fix this by calling iwl_dealloc_ucode() on failures, and make
>     that also clear the data so we start fresh on the next round.
>     
>     Signed-off-by: Johannes Berg <johannes.berg@intel.com>
>     Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
>     Link:
>     https://lore.kernel.org/r/iwlwifi.20211210110539.1f742f0eb58a.I1315f22f6aa632d94ae2069f85e1bca5e734dce0@changeid
>     Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
>     Signed-off-by: Sasha Levin <sashal@kernel.org>
> 
>  drivers/net/wireless/intel/iwlwifi/iwl-drv.c | 8 ++++++++
>  1 file changed, 8 insertions(+)

---
Additional information about regzbot:

If you want to know more about regzbot, check out its web-interface, the
getting start guide, and/or the references documentation:

https://linux-regtracking.leemhuis.info/regzbot/
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

The last two documents will explain how you can interact with regzbot
yourself if your want to.

Hint for reporters: when reporting a regression it's in your interest to
tell #regzbot about it in the report, as that will ensure the regression
gets on the radar of regzbot and the regression tracker. That's in your
interest, as they will make sure the report won't fall through the
cracks unnoticed.

Hint for developers: you normally don't need to care about regzbot once
it's involved. Fix the issue as you normally would, just remember to
include a 'Link:' tag to the report in the commit message, as explained
in Documentation/process/submitting-patches.rst
That aspect was recently was made more explicit in commit 1f57bd42b77c:
https://git.kernel.org/linus/1f57bd42b77c

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression/boot failure on 5.16.3
  2022-02-04  0:19 Regression/boot failure on 5.16.3 Jason Self
  2022-02-04  7:00 ` Greg KH
  2022-02-04  8:48 ` Thorsten Leemhuis
@ 2022-02-08  8:50 ` Stefan Agner
  2022-02-08 18:05   ` Jason Self
  2 siblings, 1 reply; 6+ messages in thread
From: Stefan Agner @ 2022-02-08  8:50 UTC (permalink / raw)
  To: Jason Self, Greg KH, Johannes Berg; +Cc: stable, regressions

On 2022-02-04 01:19, Jason Self wrote:
> The computer (amd64) fails to boot. The init was stuck at the
> synchronization of the time through the network. This began between
> 5.16.2 (good) and 5.16.3 (bad.) This continues on 5.16.4 and 5.16.5.
> Git bisect revealed the following. In this case the nonfree firmwre is
> not present on the system. Blacklisting the iwflwifi module works as a
> workaround for now.

I have several reports of Intel NUC 10th/11th gen not booting/crashing
during boot after updating to 5.10.96 (from 5.10.91). At least one stack
trace shows iwl_dealloc_ucode in the call path. The below commit is part
of 5.10.96 So this regression seems to not only affect 5.16 series.

Link:
https://github.com/home-assistant/operating-system/issues/1739#issuecomment-1032013069

--
Stefan


> 
> 6b5ad4bd0d78fef6bbe0ecdf96e09237c9c52cc1 is the first bad commit
> commit 6b5ad4bd0d78fef6bbe0ecdf96e09237c9c52cc1
> Author: Johannes Berg <johannes.berg@intel.com>
> Date:   Fri Dec 10 11:12:42 2021 +0200
> 
>     iwlwifi: fix leaks/bad data after failed firmware load
>     
>     [ Upstream commit ab07506b0454bea606095951e19e72c282bfbb42 ]
>     
>     If firmware load fails after having loaded some parts of the
>     firmware, e.g. the IML image, then this would leak. For the
>     host command list we'd end up running into a WARN on the next
>     attempt to load another firmware image.
>     
>     Fix this by calling iwl_dealloc_ucode() on failures, and make
>     that also clear the data so we start fresh on the next round.
>     
>     Signed-off-by: Johannes Berg <johannes.berg@intel.com>
>     Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
>     Link:
>    
> https://lore.kernel.org/r/iwlwifi.20211210110539.1f742f0eb58a.I1315f22f6aa632d94ae2069f85e1bca5e734dce0@changeid
>     Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
>     Signed-off-by: Sasha Levin <sashal@kernel.org>
> 
>  drivers/net/wireless/intel/iwlwifi/iwl-drv.c | 8 ++++++++
>  1 file changed, 8 insertions(+)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression/boot failure on 5.16.3
  2022-02-08  8:50 ` Stefan Agner
@ 2022-02-08 18:05   ` Jason Self
  2022-02-08 18:22     ` Thorsten Leemhuis
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Self @ 2022-02-08 18:05 UTC (permalink / raw)
  To: Stefan Agner, Greg KH, Johannes Berg, stable, regressions

[-- Attachment #1: Type: text/plain, Size: 948 bytes --]

On Tue, 08 Feb 2022 09:50:59 +0100
Stefan Agner <stefan@agner.ch> wrote:

> On 2022-02-04 01:19, Jason Self wrote:
>  [...]  
> 
> I have several reports of Intel NUC 10th/11th gen not booting/crashing
> during boot after updating to 5.10.96 (from 5.10.91). At least one
> stack trace shows iwl_dealloc_ucode in the call path. The below
> commit is part of 5.10.96 So this regression seems to not only affect
> 5.16 series.
> 
> Link:
> https://github.com/home-assistant/operating-system/issues/1739#issuecomment-1032013069

Yes, it does appear to affect multiple versions; at least 5.17-rc2,
5.16, 5.15, and as you say 5.10.

I can confirm that this patch addresses it on 5.16:
https://lore.kernel.org/stable/YgJSEEmRDKKG+3lT@mail-itl/T/#t

It appears desirable to apply the patch to all of the stable versions
that need it, after it's gone into Linus's tree to also address the
matter with the upcoming 5.17 series.




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression/boot failure on 5.16.3
  2022-02-08 18:05   ` Jason Self
@ 2022-02-08 18:22     ` Thorsten Leemhuis
  0 siblings, 0 replies; 6+ messages in thread
From: Thorsten Leemhuis @ 2022-02-08 18:22 UTC (permalink / raw)
  To: Jason Self, Stefan Agner, Greg KH, Johannes Berg, stable, regressions

On 08.02.22 19:05, Jason Self wrote:
> On Tue, 08 Feb 2022 09:50:59 +0100
> Stefan Agner <stefan@agner.ch> wrote:
> 
>> On 2022-02-04 01:19, Jason Self wrote:
>>  [...]  
>>
>> I have several reports of Intel NUC 10th/11th gen not booting/crashing
>> during boot after updating to 5.10.96 (from 5.10.91). At least one
>> stack trace shows iwl_dealloc_ucode in the call path. The below
>> commit is part of 5.10.96 So this regression seems to not only affect
>> 5.16 series.
>>
>> Link:
>> https://github.com/home-assistant/operating-system/issues/1739#issuecomment-1032013069
> 
> Yes, it does appear to affect multiple versions; at least 5.17-rc2,
> 5.16, 5.15, and as you say 5.10.
> 
> I can confirm that this patch addresses it on 5.16:
> https://lore.kernel.org/stable/YgJSEEmRDKKG+3lT@mail-itl/T/#t

Thx for pointing to the thread!

#regzbot monitor: https://lore.kernel.org/stable/YgJSEEmRDKKG+3lT@mail-itl/

> It appears desirable to apply the patch to all of the stable versions
> that need it, after it's gone into Linus's tree to also address the
> matter with the upcoming 5.17 series.

FWIW, the patch is marked for backporting already, it just needs to get
merged to mainline first.

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-02-08 18:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-04  0:19 Regression/boot failure on 5.16.3 Jason Self
2022-02-04  7:00 ` Greg KH
2022-02-04  8:48 ` Thorsten Leemhuis
2022-02-08  8:50 ` Stefan Agner
2022-02-08 18:05   ` Jason Self
2022-02-08 18:22     ` Thorsten Leemhuis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).