netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Nguyen, Anthony L" <anthony.l.nguyen@intel.com>
To: "regressions@leemhuis.info" <regressions@leemhuis.info>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"Brandeburg, Jesse" <jesse.brandeburg@intel.com>
Cc: "Torvalds, Linus" <torvalds@linux-foundation.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"intel-wired-lan@lists.osuosl.org"
	<intel-wired-lan@lists.osuosl.org>,
	"hkallweit1@gmail.com" <hkallweit1@gmail.com>
Subject: Re: [PATCH net] igb: fix deadlock caused by taking RTNL in RPM resume path
Date: Mon, 20 Dec 2021 19:56:28 +0000	[thread overview]
Message-ID: <b4be04bbd6a20855526b961ef80669bd2647564c.camel@intel.com> (raw)
In-Reply-To: <edb8c052-9d20-d190-54e2-ed9bb03ba204@leemhuis.info>

On Sun, 2021-12-19 at 09:31 +0100, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker speaking.
> 
> On 29.11.21 22:14, Heiner Kallweit wrote:
> > Recent net core changes caused an issue with few Intel drivers
> > (reportedly igb), where taking RTNL in RPM resume path results in a
> > deadlock. See [0] for a bug report. I don't think the core changes
> > are wrong, but taking RTNL in RPM resume path isn't needed.
> > The Intel drivers are the only ones doing this. See [1] for a
> > discussion on the issue. Following patch changes the RPM resume
> > path
> > to not take RTNL.
> > 
> > [0] https://bugzilla.kernel.org/show_bug.cgi?id=215129
> > [1]
> > https://lore.kernel.org/netdev/20211125074949.5f897431@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/t/
> > 
> > Fixes: bd869245a3dc ("net: core: try to runtime-resume detached
> > device in __dev_open")
> > Fixes: f32a21376573 ("ethtool: runtime-resume netdev parent before
> > ethtool ioctl ops")
> > Tested-by: Martin Stolpe <martin.stolpe@gmail.com>
> > Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> 
> Long story short: what is taken this fix so long to get mainlined? It
> to
> me seems progressing unnecessary slow, especially as it's a
> regression
> that made it into v5.15 and thus for weeks now seems to bug more and
> more people.
> 
> 
> The long story, starting with the background details:
> 
> The quoted patch fixes a regression among others caused by
> f32a21376573
> ("ethtool: runtime-resume netdev parent before ethtool ioctl ops"),
> which got merged for v5.15-rc1.
> 
> The regression ("kernel hangs during power down") was afaik first
> reported on Wed, 24 Nov (IOW: nearly a month ago) and forwarded to
> the
> list shortly afterwards:
> https://bugzilla.kernel.org/show_bug.cgi?id=215129
> https://lore.kernel.org/netdev/20211124144505.31e15716@hermes.local/
> 
> The quoted patch to fix the regression was posted on Mon, 29 Nov (thx
> Heiner for providing it!). Obviously reviewing patches can take a few
> days when they are complicated, as the other messages in this thread
> show. But according to
> https://bugzilla.kernel.org/show_bug.cgi?id=215129#c8 the patch was
> ACKed by Thu, 7 Dec. To quote: ```The patch is on its way via the
> Intel
> network driver tree:
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/tnguy/net-queue/+/refs/heads/dev-queue```
> 
> And that's where the patch afaics still is. It hasn't even reached
> linux-next yet, unless I'm missing something. A merge into mainline
> thus
> is not even in sight; this seems especially bad with the holiday
> season
> coming up, as getting the fix mainlined is a prerequisite to get it
> backported to 5.15.y, as our latest stable kernel is affected by
> this.

I've been waiting for our validation team to get to this patch to do
some additional testing. However, as you mentioned, with the holidays
coming up, it seems the tester is now out. As it looks like some in the
community have been able to do some testing on this, I'll go ahead and
send this on.

Thanks,
Tony




  reply	other threads:[~2021-12-20 19:56 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-29 21:14 [PATCH net] igb: fix deadlock caused by taking RTNL in RPM resume path Heiner Kallweit
2021-11-29 23:09 ` Stephen Hemminger
2021-11-30  6:33   ` Heiner Kallweit
2021-11-30  1:17 ` Jakub Kicinski
2021-11-30  6:46   ` Heiner Kallweit
2021-11-30 17:12     ` Jakub Kicinski
2021-11-30 21:35       ` Heiner Kallweit
2021-12-01  0:51         ` Jakub Kicinski
2021-12-19  8:31 ` Thorsten Leemhuis
2021-12-20 19:56   ` Nguyen, Anthony L [this message]
2021-12-22  5:17     ` Thorsten Leemhuis
2021-12-22 12:50       ` Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b4be04bbd6a20855526b961ef80669bd2647564c.camel@intel.com \
    --to=anthony.l.nguyen@intel.com \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=hkallweit1@gmail.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jesse.brandeburg@intel.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=regressions@leemhuis.info \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).