All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thorsten Leemhuis <regressions@leemhuis.info>
To: "Nguyen, Anthony L" <anthony.l.nguyen@intel.com>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"Brandeburg, Jesse" <jesse.brandeburg@intel.com>
Cc: "Torvalds, Linus" <torvalds@linux-foundation.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"intel-wired-lan@lists.osuosl.org"
	<intel-wired-lan@lists.osuosl.org>,
	"hkallweit1@gmail.com" <hkallweit1@gmail.com>
Subject: Re: [PATCH net] igb: fix deadlock caused by taking RTNL in RPM resume path
Date: Wed, 22 Dec 2021 13:50:07 +0100	[thread overview]
Message-ID: <24afef0d-84de-5eb7-3a2f-000b3e462278@leemhuis.info> (raw)
In-Reply-To: <ab998a12-9230-04b6-8875-884b9eb1a11e@leemhuis.info>

Scratch that mail, I was totally wrong, as I accidentally looked at
yesterdays linux-next tree, which due to an error of a local cron job
looked like todays linux-next tree to me.

The real one from today is out now and contains the patch. I apologise
for the noise.

Ciao, Thorsten

On 22.12.21 06:17, Thorsten Leemhuis wrote:
> On 20.12.21 20:56, Nguyen, Anthony L wrote:
>> On Sun, 2021-12-19 at 09:31 +0100, Thorsten Leemhuis wrote:
>>> Hi, this is your Linux kernel regression tracker speaking.
>>>
>>> On 29.11.21 22:14, Heiner Kallweit wrote:
>>>> Recent net core changes caused an issue with few Intel drivers
>>>> (reportedly igb), where taking RTNL in RPM resume path results in a
>>>> deadlock. See [0] for a bug report. I don't think the core changes
>>>> are wrong, but taking RTNL in RPM resume path isn't needed.
>>>> The Intel drivers are the only ones doing this. See [1] for a
>>>> discussion on the issue. Following patch changes the RPM resume
>>>> path
>>>> to not take RTNL.
>>>>
>>>> [0] https://bugzilla.kernel.org/show_bug.cgi?id=215129
>>>> [1]
>>>> https://lore.kernel.org/netdev/20211125074949.5f897431@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/t/
>>>>
>>>> Fixes: bd869245a3dc ("net: core: try to runtime-resume detached
>>>> device in __dev_open")
>>>> Fixes: f32a21376573 ("ethtool: runtime-resume netdev parent before
>>>> ethtool ioctl ops")
>>>> Tested-by: Martin Stolpe <martin.stolpe@gmail.com>
>>>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>>
>>> Long story short: what is taken this fix so long to get mainlined? It
>>> to
>>> me seems progressing unnecessary slow, especially as it's a
>>> regression
>>> that made it into v5.15 and thus for weeks now seems to bug more and
>>> more people.
>>>
>>>
>>> The long story, starting with the background details:
>>>
>>> The quoted patch fixes a regression among others caused by
>>> f32a21376573
>>> ("ethtool: runtime-resume netdev parent before ethtool ioctl ops"),
>>> which got merged for v5.15-rc1.
>>>
>>> The regression ("kernel hangs during power down") was afaik first
>>> reported on Wed, 24 Nov (IOW: nearly a month ago) and forwarded to
>>> the
>>> list shortly afterwards:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=215129
>>> https://lore.kernel.org/netdev/20211124144505.31e15716@hermes.local/
>>>
>>> The quoted patch to fix the regression was posted on Mon, 29 Nov (thx
>>> Heiner for providing it!). Obviously reviewing patches can take a few
>>> days when they are complicated, as the other messages in this thread
>>> show. But according to
>>> https://bugzilla.kernel.org/show_bug.cgi?id=215129#c8 the patch was
>>> ACKed by Thu, 7 Dec. To quote: ```The patch is on its way via the
>>> Intel
>>> network driver tree:
>>> https://kernel.googlesource.com/pub/scm/linux/kernel/git/tnguy/net-queue/+/refs/heads/dev-queue```
>>>
>>> And that's where the patch afaics still is. It hasn't even reached
>>> linux-next yet, unless I'm missing something. A merge into mainline
>>> thus
>>> is not even in sight; this seems especially bad with the holiday
>>> season
>>> coming up, as getting the fix mainlined is a prerequisite to get it
>>> backported to 5.15.y, as our latest stable kernel is affected by
>>> this.
>>
>> I've been waiting for our validation team to get to this patch to do
>> some additional testing. However, as you mentioned, with the holidays
>> coming up, it seems the tester is now out. As it looks like some in the
>> community have been able to do some testing on this, I'll go ahead and
>> send this on.
> 
> Thx. I see the patch now in addition to dev-queue is also in master of
> this repo:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue.git/
> 
> But the fix still didn't make it in todays linux-next. Seems neither
> your master branch nor branches like '1GbE' (which seem to be the ones
> from which such fixes later get send to the net tree) are in linux-next
> afaic.
> 
> Just wondering: Wouldn't it be better if they were? This would allow the
> users of linux-next and CIs checking it to test the fix before it's send
> to the net tree, which last week seems to have happened only a few hours
> (6209dd778f66) before net was merged into mainline (180f3bcfe362).
> 
> Ciao, Thorsten

WARNING: multiple messages have this Message-ID (diff)
From: Thorsten Leemhuis <regressions@leemhuis.info>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [PATCH net] igb: fix deadlock caused by taking RTNL in RPM resume path
Date: Wed, 22 Dec 2021 13:50:07 +0100	[thread overview]
Message-ID: <24afef0d-84de-5eb7-3a2f-000b3e462278@leemhuis.info> (raw)
In-Reply-To: <ab998a12-9230-04b6-8875-884b9eb1a11e@leemhuis.info>

Scratch that mail, I was totally wrong, as I accidentally looked at
yesterdays linux-next tree, which due to an error of a local cron job
looked like todays linux-next tree to me.

The real one from today is out now and contains the patch. I apologise
for the noise.

Ciao, Thorsten

On 22.12.21 06:17, Thorsten Leemhuis wrote:
> On 20.12.21 20:56, Nguyen, Anthony L wrote:
>> On Sun, 2021-12-19 at 09:31 +0100, Thorsten Leemhuis wrote:
>>> Hi, this is your Linux kernel regression tracker speaking.
>>>
>>> On 29.11.21 22:14, Heiner Kallweit wrote:
>>>> Recent net core changes caused an issue with few Intel drivers
>>>> (reportedly igb), where taking RTNL in RPM resume path results in a
>>>> deadlock. See [0] for a bug report. I don't think the core changes
>>>> are wrong, but taking RTNL in RPM resume path isn't needed.
>>>> The Intel drivers are the only ones doing this. See [1] for a
>>>> discussion on the issue. Following patch changes the RPM resume
>>>> path
>>>> to not take RTNL.
>>>>
>>>> [0] https://bugzilla.kernel.org/show_bug.cgi?id=215129
>>>> [1]
>>>> https://lore.kernel.org/netdev/20211125074949.5f897431 at kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/t/
>>>>
>>>> Fixes: bd869245a3dc ("net: core: try to runtime-resume detached
>>>> device in __dev_open")
>>>> Fixes: f32a21376573 ("ethtool: runtime-resume netdev parent before
>>>> ethtool ioctl ops")
>>>> Tested-by: Martin Stolpe <martin.stolpe@gmail.com>
>>>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>>
>>> Long story short: what is taken this fix so long to get mainlined? It
>>> to
>>> me seems progressing unnecessary slow, especially as it's a
>>> regression
>>> that made it into v5.15 and thus for weeks now seems to bug more and
>>> more people.
>>>
>>>
>>> The long story, starting with the background details:
>>>
>>> The quoted patch fixes a regression among others caused by
>>> f32a21376573
>>> ("ethtool: runtime-resume netdev parent before ethtool ioctl ops"),
>>> which got merged for v5.15-rc1.
>>>
>>> The regression ("kernel hangs during power down") was afaik first
>>> reported on Wed, 24 Nov (IOW: nearly a month ago) and forwarded to
>>> the
>>> list shortly afterwards:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=215129
>>> https://lore.kernel.org/netdev/20211124144505.31e15716 at hermes.local/
>>>
>>> The quoted patch to fix the regression was posted on Mon, 29 Nov (thx
>>> Heiner for providing it!). Obviously reviewing patches can take a few
>>> days when they are complicated, as the other messages in this thread
>>> show. But according to
>>> https://bugzilla.kernel.org/show_bug.cgi?id=215129#c8?the patch was
>>> ACKed by Thu, 7 Dec. To quote: ```The patch is on its way via the
>>> Intel
>>> network driver tree:
>>> https://kernel.googlesource.com/pub/scm/linux/kernel/git/tnguy/net-queue/+/refs/heads/dev-queue```
>>>
>>> And that's where the patch afaics still is. It hasn't even reached
>>> linux-next yet, unless I'm missing something. A merge into mainline
>>> thus
>>> is not even in sight; this seems especially bad with the holiday
>>> season
>>> coming up, as getting the fix mainlined is a prerequisite to get it
>>> backported to 5.15.y, as our latest stable kernel is affected by
>>> this.
>>
>> I've been waiting for our validation team to get to this patch to do
>> some additional testing. However, as you mentioned, with the holidays
>> coming up, it seems the tester is now out. As it looks like some in the
>> community have been able to do some testing on this, I'll go ahead and
>> send this on.
> 
> Thx. I see the patch now in addition to dev-queue is also in master of
> this repo:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue.git/
> 
> But the fix still didn't make it in todays linux-next. Seems neither
> your master branch nor branches like '1GbE' (which seem to be the ones
> from which such fixes later get send to the net tree) are in linux-next
> afaic.
> 
> Just wondering: Wouldn't it be better if they were? This would allow the
> users of linux-next and CIs checking it to test the fix before it's send
> to the net tree, which last week seems to have happened only a few hours
> (6209dd778f66) before net was merged into mainline (180f3bcfe362).
> 
> Ciao, Thorsten

  reply	other threads:[~2021-12-22 12:50 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-29 21:14 [PATCH net] igb: fix deadlock caused by taking RTNL in RPM resume path Heiner Kallweit
2021-11-29 21:14 ` [Intel-wired-lan] " Heiner Kallweit
2021-11-29 23:09 ` Stephen Hemminger
2021-11-29 23:09   ` [Intel-wired-lan] " Stephen Hemminger
2021-11-30  6:33   ` Heiner Kallweit
2021-11-30  6:33     ` [Intel-wired-lan] " Heiner Kallweit
2021-11-30  1:17 ` Jakub Kicinski
2021-11-30  1:17   ` [Intel-wired-lan] " Jakub Kicinski
2021-11-30  6:46   ` Heiner Kallweit
2021-11-30  6:46     ` [Intel-wired-lan] " Heiner Kallweit
2021-11-30 17:12     ` Jakub Kicinski
2021-11-30 17:12       ` [Intel-wired-lan] " Jakub Kicinski
2021-11-30 21:35       ` Heiner Kallweit
2021-11-30 21:35         ` [Intel-wired-lan] " Heiner Kallweit
2021-12-01  0:51         ` Jakub Kicinski
2021-12-01  0:51           ` [Intel-wired-lan] " Jakub Kicinski
2021-12-19  8:31 ` Thorsten Leemhuis
2021-12-19  8:31   ` [Intel-wired-lan] " Thorsten Leemhuis
2021-12-20 19:56   ` Nguyen, Anthony L
2021-12-20 19:56     ` [Intel-wired-lan] " Nguyen, Anthony L
2021-12-22  5:17     ` Thorsten Leemhuis
2021-12-22  5:17       ` [Intel-wired-lan] " Thorsten Leemhuis
2021-12-22 12:50       ` Thorsten Leemhuis [this message]
2021-12-22 12:50         ` Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=24afef0d-84de-5eb7-3a2f-000b3e462278@leemhuis.info \
    --to=regressions@leemhuis.info \
    --cc=anthony.l.nguyen@intel.com \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=hkallweit1@gmail.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jesse.brandeburg@intel.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.