All of lore.kernel.org
 help / color / mirror / Atom feed
From: Heiner Kallweit <hkallweit1@gmail.com>
To: Jakub Kicinski <kuba@kernel.org>,
	Jesse Brandeburg <jesse.brandeburg@intel.com>,
	Tony Nguyen <anthony.l.nguyen@intel.com>,
	intel-wired-lan <intel-wired-lan@lists.osuosl.org>,
	Kalle Valo <kvalo@codeaurora.org>
Cc: netdev@vger.kernel.org, ath10k@lists.infradead.org,
	Stephen Hemminger <stephen@networkplumber.org>
Subject: Re: [Bug 215129] New: Linux kernel hangs during power down
Date: Thu, 25 Nov 2021 08:32:18 +0100	[thread overview]
Message-ID: <1849b7a3-cdfe-f9dc-e4d1-172e8b1667d2@gmail.com> (raw)
In-Reply-To: <20211124164648.43c354f4@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On 25.11.2021 01:46, Jakub Kicinski wrote:
> Adding Kalle and Hainer.
> 
> On Wed, 24 Nov 2021 14:45:05 -0800 Stephen Hemminger wrote:
>> Begin forwarded message:
>>
>> Date: Wed, 24 Nov 2021 21:14:53 +0000
>> From: bugzilla-daemon@bugzilla.kernel.org
>> To: stephen@networkplumber.org
>> Subject: [Bug 215129] New: Linux kernel hangs during power down
>>
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=215129
>>
>>             Bug ID: 215129
>>            Summary: Linux kernel hangs during power down
>>            Product: Networking
>>            Version: 2.5
>>     Kernel Version: 5.15
>>           Hardware: All
>>                 OS: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Other
>>           Assignee: stephen@networkplumber.org
>>           Reporter: martin.stolpe@gmail.com
>>         Regression: No
>>
>> Created attachment 299703
>>   --> https://bugzilla.kernel.org/attachment.cgi?id=299703&action=edit    
>> Kernel log after timeout occured
>>
>> On my system the kernel is waiting for a task during shutdown which doesn't
>> complete.
>>
>> The commit which causes this behavior is:
>> [f32a213765739f2a1db319346799f130a3d08820] ethtool: runtime-resume netdev
>> parent before ethtool ioctl ops
>>
>> This bug causes also that the system gets unresponsive after starting Steam:
>> https://steamcommunity.com/app/221410/discussions/2/3194736442566303600/
>>
> 

I think the reference to ath10k_pci is misleading, Kalle isn't needed here.
The actual issue is a RTNL deadlock in igb_resume(). See log snippet:

Nov 24 18:56:19 MartinsPc kernel:  igb_resume+0xff/0x1e0 [igb 21bf6a00cb1f20e9b0e8434f7f8748a0504e93f8]
Nov 24 18:56:19 MartinsPc kernel:  pci_pm_runtime_resume+0xa7/0xd0
Nov 24 18:56:19 MartinsPc kernel:  ? pci_pm_freeze_noirq+0x110/0x110
Nov 24 18:56:19 MartinsPc kernel:  __rpm_callback+0x41/0x120
Nov 24 18:56:19 MartinsPc kernel:  ? pci_pm_freeze_noirq+0x110/0x110
Nov 24 18:56:19 MartinsPc kernel:  rpm_callback+0x35/0x70
Nov 24 18:56:19 MartinsPc kernel:  rpm_resume+0x567/0x810
Nov 24 18:56:19 MartinsPc kernel:  __pm_runtime_resume+0x4a/0x80
Nov 24 18:56:19 MartinsPc kernel:  dev_ethtool+0xd4/0x2d80

We have at least two places in net core where runtime_resume() is called
under RTNL. This conflicts with the current structure in few Intel drivers
that have something like the following in their resume path.

	rtnl_lock();
	if (!err && netif_running(netdev))
		err = __igb_open(netdev, true);

	if (!err)
		netif_device_attach(netdev);
	rtnl_unlock();

Other drivers don't do this, so it's the question whether it's actually
needed here to take RTNL. Some discussion was started [0], but it ended
w/o tangible result and since then it has been surprisingly quiet.

[0] https://www.spinics.net/lists/netdev/msg736880.html

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

WARNING: multiple messages have this Message-ID (diff)
From: Heiner Kallweit <hkallweit1@gmail.com>
To: Jakub Kicinski <kuba@kernel.org>,
	Jesse Brandeburg <jesse.brandeburg@intel.com>,
	Tony Nguyen <anthony.l.nguyen@intel.com>,
	intel-wired-lan <intel-wired-lan@lists.osuosl.org>,
	Kalle Valo <kvalo@codeaurora.org>
Cc: netdev@vger.kernel.org, ath10k@lists.infradead.org,
	Stephen Hemminger <stephen@networkplumber.org>
Subject: Re: [Bug 215129] New: Linux kernel hangs during power down
Date: Thu, 25 Nov 2021 08:32:18 +0100	[thread overview]
Message-ID: <1849b7a3-cdfe-f9dc-e4d1-172e8b1667d2@gmail.com> (raw)
In-Reply-To: <20211124164648.43c354f4@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On 25.11.2021 01:46, Jakub Kicinski wrote:
> Adding Kalle and Hainer.
> 
> On Wed, 24 Nov 2021 14:45:05 -0800 Stephen Hemminger wrote:
>> Begin forwarded message:
>>
>> Date: Wed, 24 Nov 2021 21:14:53 +0000
>> From: bugzilla-daemon@bugzilla.kernel.org
>> To: stephen@networkplumber.org
>> Subject: [Bug 215129] New: Linux kernel hangs during power down
>>
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=215129
>>
>>             Bug ID: 215129
>>            Summary: Linux kernel hangs during power down
>>            Product: Networking
>>            Version: 2.5
>>     Kernel Version: 5.15
>>           Hardware: All
>>                 OS: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Other
>>           Assignee: stephen@networkplumber.org
>>           Reporter: martin.stolpe@gmail.com
>>         Regression: No
>>
>> Created attachment 299703
>>   --> https://bugzilla.kernel.org/attachment.cgi?id=299703&action=edit    
>> Kernel log after timeout occured
>>
>> On my system the kernel is waiting for a task during shutdown which doesn't
>> complete.
>>
>> The commit which causes this behavior is:
>> [f32a213765739f2a1db319346799f130a3d08820] ethtool: runtime-resume netdev
>> parent before ethtool ioctl ops
>>
>> This bug causes also that the system gets unresponsive after starting Steam:
>> https://steamcommunity.com/app/221410/discussions/2/3194736442566303600/
>>
> 

I think the reference to ath10k_pci is misleading, Kalle isn't needed here.
The actual issue is a RTNL deadlock in igb_resume(). See log snippet:

Nov 24 18:56:19 MartinsPc kernel:  igb_resume+0xff/0x1e0 [igb 21bf6a00cb1f20e9b0e8434f7f8748a0504e93f8]
Nov 24 18:56:19 MartinsPc kernel:  pci_pm_runtime_resume+0xa7/0xd0
Nov 24 18:56:19 MartinsPc kernel:  ? pci_pm_freeze_noirq+0x110/0x110
Nov 24 18:56:19 MartinsPc kernel:  __rpm_callback+0x41/0x120
Nov 24 18:56:19 MartinsPc kernel:  ? pci_pm_freeze_noirq+0x110/0x110
Nov 24 18:56:19 MartinsPc kernel:  rpm_callback+0x35/0x70
Nov 24 18:56:19 MartinsPc kernel:  rpm_resume+0x567/0x810
Nov 24 18:56:19 MartinsPc kernel:  __pm_runtime_resume+0x4a/0x80
Nov 24 18:56:19 MartinsPc kernel:  dev_ethtool+0xd4/0x2d80

We have at least two places in net core where runtime_resume() is called
under RTNL. This conflicts with the current structure in few Intel drivers
that have something like the following in their resume path.

	rtnl_lock();
	if (!err && netif_running(netdev))
		err = __igb_open(netdev, true);

	if (!err)
		netif_device_attach(netdev);
	rtnl_unlock();

Other drivers don't do this, so it's the question whether it's actually
needed here to take RTNL. Some discussion was started [0], but it ended
w/o tangible result and since then it has been surprisingly quiet.

[0] https://www.spinics.net/lists/netdev/msg736880.html

WARNING: multiple messages have this Message-ID (diff)
From: Heiner Kallweit <hkallweit1@gmail.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [Bug 215129] New: Linux kernel hangs during power down
Date: Thu, 25 Nov 2021 08:32:18 +0100	[thread overview]
Message-ID: <1849b7a3-cdfe-f9dc-e4d1-172e8b1667d2@gmail.com> (raw)
In-Reply-To: <20211124164648.43c354f4@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On 25.11.2021 01:46, Jakub Kicinski wrote:
> Adding Kalle and Hainer.
> 
> On Wed, 24 Nov 2021 14:45:05 -0800 Stephen Hemminger wrote:
>> Begin forwarded message:
>>
>> Date: Wed, 24 Nov 2021 21:14:53 +0000
>> From: bugzilla-daemon at bugzilla.kernel.org
>> To: stephen at networkplumber.org
>> Subject: [Bug 215129] New: Linux kernel hangs during power down
>>
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=215129
>>
>>             Bug ID: 215129
>>            Summary: Linux kernel hangs during power down
>>            Product: Networking
>>            Version: 2.5
>>     Kernel Version: 5.15
>>           Hardware: All
>>                 OS: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Other
>>           Assignee: stephen at networkplumber.org
>>           Reporter: martin.stolpe at gmail.com
>>         Regression: No
>>
>> Created attachment 299703
>>   --> https://bugzilla.kernel.org/attachment.cgi?id=299703&action=edit    
>> Kernel log after timeout occured
>>
>> On my system the kernel is waiting for a task during shutdown which doesn't
>> complete.
>>
>> The commit which causes this behavior is:
>> [f32a213765739f2a1db319346799f130a3d08820] ethtool: runtime-resume netdev
>> parent before ethtool ioctl ops
>>
>> This bug causes also that the system gets unresponsive after starting Steam:
>> https://steamcommunity.com/app/221410/discussions/2/3194736442566303600/
>>
> 

I think the reference to ath10k_pci is misleading, Kalle isn't needed here.
The actual issue is a RTNL deadlock in igb_resume(). See log snippet:

Nov 24 18:56:19 MartinsPc kernel:  igb_resume+0xff/0x1e0 [igb 21bf6a00cb1f20e9b0e8434f7f8748a0504e93f8]
Nov 24 18:56:19 MartinsPc kernel:  pci_pm_runtime_resume+0xa7/0xd0
Nov 24 18:56:19 MartinsPc kernel:  ? pci_pm_freeze_noirq+0x110/0x110
Nov 24 18:56:19 MartinsPc kernel:  __rpm_callback+0x41/0x120
Nov 24 18:56:19 MartinsPc kernel:  ? pci_pm_freeze_noirq+0x110/0x110
Nov 24 18:56:19 MartinsPc kernel:  rpm_callback+0x35/0x70
Nov 24 18:56:19 MartinsPc kernel:  rpm_resume+0x567/0x810
Nov 24 18:56:19 MartinsPc kernel:  __pm_runtime_resume+0x4a/0x80
Nov 24 18:56:19 MartinsPc kernel:  dev_ethtool+0xd4/0x2d80

We have at least two places in net core where runtime_resume() is called
under RTNL. This conflicts with the current structure in few Intel drivers
that have something like the following in their resume path.

	rtnl_lock();
	if (!err && netif_running(netdev))
		err = __igb_open(netdev, true);

	if (!err)
		netif_device_attach(netdev);
	rtnl_unlock();

Other drivers don't do this, so it's the question whether it's actually
needed here to take RTNL. Some discussion was started [0], but it ended
w/o tangible result and since then it has been surprisingly quiet.

[0] https://www.spinics.net/lists/netdev/msg736880.html

  reply	other threads:[~2021-11-25  7:32 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-24 22:45 Fw: [Bug 215129] New: Linux kernel hangs during power down Stephen Hemminger
2021-11-25  0:46 ` Jakub Kicinski
2021-11-25  0:46   ` Jakub Kicinski
2021-11-25  7:32   ` Heiner Kallweit [this message]
2021-11-25  7:32     ` [Intel-wired-lan] " Heiner Kallweit
2021-11-25  7:32     ` Heiner Kallweit
2021-11-25 15:49     ` Jakub Kicinski
2021-11-25 15:49       ` [Intel-wired-lan] " Jakub Kicinski
2021-11-25 15:49       ` Jakub Kicinski
2021-11-25 21:11     ` Heiner Kallweit
2021-11-25 21:11       ` [Intel-wired-lan] " Heiner Kallweit
2021-11-25 21:11       ` Heiner Kallweit
2021-11-26  7:55       ` Fwd: " Heiner Kallweit
2021-11-25 11:17 ` Fw: " Thorsten Leemhuis
2021-12-04  8:55   ` Fw: [Bug 215129] New: Linux kernel hangs during power down #forregzbot Thorsten Leemhuis
2021-12-10  9:00     ` Thorsten Leemhuis
2021-12-23 19:09     ` Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1849b7a3-cdfe-f9dc-e4d1-172e8b1667d2@gmail.com \
    --to=hkallweit1@gmail.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=ath10k@lists.infradead.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jesse.brandeburg@intel.com \
    --cc=kuba@kernel.org \
    --cc=kvalo@codeaurora.org \
    --cc=netdev@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.