linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fwd: RCU stalls with wireguard over bonding over igb on Linux 6.3.0+
@ 2023-07-02  3:31 Bagas Sanjaya
  2023-07-02 11:57 ` Bagas Sanjaya
  0 siblings, 1 reply; 8+ messages in thread
From: Bagas Sanjaya @ 2023-07-02  3:31 UTC (permalink / raw)
  To: Eric DeVolder, Borislav Petkov (AMD),
	David R, Boris Ostrovsky, Miguel Luis, Paul E. McKenney,
	Joel Fernandes, Boqun Feng, Jason A. Donenfeld, Jay Vosburgh,
	Andy Gospodarek, Rafael J. Wysocki, Len Brown, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin
  Cc: Linux Kernel Mailing List, Linux Regressions, Linux RCU,
	Wireguard Mailing List, Linux Networking, Linux ACPI

Hi,

I notice a regression report on Bugzilla [1]. Quoting from it:

> I've spent the last week on debugging a problem with my attempt to upgrade my kernel from 6.2.8 to 6.3.8 (now also with 6.4.0 too).
> 
> The lenghty and detailed bug reports with all aspects of git bisect are at
> https://bugs.gentoo.org/909066
> 
> A summary:
> - if I do not configure wg0, the kernel does not hang
> - if I use a kernel older than commit fed8d8773b8ea68ad99d9eee8c8343bef9da2c2c, it does not hang
> 
> The commit refers to code that seems unrelated to the problem for my naiive eye.
> 
> The hardware is a Dell PowerEdge R620 running Gentoo ~amd64.
> 
> I have so far excluded:
> - dracut for generating the initramfs is the same version over all kernels
> - linux-firmware has been the same
> - CPU microcode has been the same
> 
> It's been a long time since I seriously involved with software development and I have been even less involved with kernel development.
> 
> Gentoo maintainers recommended me to open a bug with upstream, so here I am.
> 
> I currently have no idea how to make progress, but I'm willing to try things.

See Bugzilla for the full thread.

Anyway, I'm adding it to regzbot to make sure it doesn't fall through cracks
unnoticed:

#regzbot introduced: fed8d8773b8ea6 https://bugzilla.kernel.org/show_bug.cgi?id=217620
#regzbot title: correcting acpi_is_processor_usable() check causes RCU stalls with wireguard over bonding+igb
#regzbot link: https://bugs.gentoo.org/909066

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217620

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fwd: RCU stalls with wireguard over bonding over igb on Linux 6.3.0+
  2023-07-02  3:31 Fwd: RCU stalls with wireguard over bonding over igb on Linux 6.3.0+ Bagas Sanjaya
@ 2023-07-02 11:57 ` Bagas Sanjaya
  2023-07-02 12:37   ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 8+ messages in thread
From: Bagas Sanjaya @ 2023-07-02 11:57 UTC (permalink / raw)
  To: Eric DeVolder, Borislav Petkov (AMD),
	David R, Boris Ostrovsky, Miguel Luis, Paul E. McKenney,
	Joel Fernandes, Boqun Feng, Jason A. Donenfeld, Jay Vosburgh,
	Andy Gospodarek, Rafael J. Wysocki, Len Brown, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Thorsten Leemhuis
  Cc: Linux Kernel Mailing List, Linux Regressions, Linux RCU,
	Wireguard Mailing List, Linux Networking, Linux ACPI,
	Manuel 'satmd' Leiner

[also Cc: original reporter]

On 7/2/23 10:31, Bagas Sanjaya wrote:
> Hi,
> 
> I notice a regression report on Bugzilla [1]. Quoting from it:
> 
>> I've spent the last week on debugging a problem with my attempt to upgrade my kernel from 6.2.8 to 6.3.8 (now also with 6.4.0 too).
>>
>> The lenghty and detailed bug reports with all aspects of git bisect are at
>> https://bugs.gentoo.org/909066
>>
>> A summary:
>> - if I do not configure wg0, the kernel does not hang
>> - if I use a kernel older than commit fed8d8773b8ea68ad99d9eee8c8343bef9da2c2c, it does not hang
>>
>> The commit refers to code that seems unrelated to the problem for my naiive eye.
>>
>> The hardware is a Dell PowerEdge R620 running Gentoo ~amd64.
>>
>> I have so far excluded:
>> - dracut for generating the initramfs is the same version over all kernels
>> - linux-firmware has been the same
>> - CPU microcode has been the same
>>
>> It's been a long time since I seriously involved with software development and I have been even less involved with kernel development.
>>
>> Gentoo maintainers recommended me to open a bug with upstream, so here I am.
>>
>> I currently have no idea how to make progress, but I'm willing to try things.
> 
> See Bugzilla for the full thread.
> 
> Anyway, I'm adding it to regzbot to make sure it doesn't fall through cracks
> unnoticed:
> 
> #regzbot introduced: fed8d8773b8ea6 https://bugzilla.kernel.org/show_bug.cgi?id=217620
> #regzbot title: correcting acpi_is_processor_usable() check causes RCU stalls with wireguard over bonding+igb
> #regzbot link: https://bugs.gentoo.org/909066
> 

satmd: Can you repeat bisection to confirm that fed8d8773b8ea6 is
really the culprit?

Thorsten: It seems like the reporter concluded bisection to the
(possibly) incorrect culprit. What can I do in this case besides
asking to repeat bisection?

-- 
An old man doll... just what I always wanted! - Clara


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fwd: RCU stalls with wireguard over bonding over igb on Linux 6.3.0+
  2023-07-02 11:57 ` Bagas Sanjaya
@ 2023-07-02 12:37   ` Linux regression tracking (Thorsten Leemhuis)
  2023-07-02 13:46     ` Jason A. Donenfeld
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-07-02 12:37 UTC (permalink / raw)
  To: Bagas Sanjaya, Eric DeVolder, Borislav Petkov (AMD),
	David R, Boris Ostrovsky, Miguel Luis, Paul E. McKenney,
	Joel Fernandes, Boqun Feng, Jason A. Donenfeld, Jay Vosburgh,
	Andy Gospodarek, Rafael J. Wysocki, Len Brown, Thomas Gleixner,
	Ingo Molnar, Dave Hansen, x86, H. Peter Anvin
  Cc: Linux Kernel Mailing List, Linux Regressions, Linux RCU,
	Wireguard Mailing List, Linux Networking, Linux ACPI,
	Manuel 'satmd' Leiner

On 02.07.23 13:57, Bagas Sanjaya wrote:
> [also Cc: original reporter]

BTW: I think you CCed too many developers here. There are situations
where this can makes sense, but it's rare. And if you do this too often
people might start to not really look into your mails or might even
ignore them completely.

Normally it's enough to write the mail to (1) the people in the
signed-off-by-chain, (2) the maintainers of the subsystem that merged a
commit, and (3) the lists for all affected subsystems; leave it up to
developers from the first two groups to CC the maintainers of the third
group.

> On 7/2/23 10:31, Bagas Sanjaya wrote:
>> I notice a regression report on Bugzilla [1]. Quoting from it:
>>
>>> I've spent the last week on debugging a problem with my attempt to upgrade my kernel from 6.2.8 to 6.3.8 (now also with 
> [...]
>> See Bugzilla for the full thread.
>>
>> Anyway, I'm adding it to regzbot to make sure it doesn't fall through cracks
>> unnoticed:
>>
>> #regzbot introduced: fed8d8773b8ea6 https://bugzilla.kernel.org/show_bug.cgi?id=217620
>> #regzbot title: correcting acpi_is_processor_usable() check causes RCU stalls with wireguard over bonding+igb
>> #regzbot link: https://bugs.gentoo.org/909066

> satmd: Can you repeat bisection to confirm that fed8d8773b8ea6 is
> really the culprit?

I'd be careful to ask people that, as that might mean a lot of work for
them. Best to leave things like that to developers, unless it's pretty
obvious that something went sideways.

> Thorsten: It seems like the reporter concluded bisection to the
> (possibly) incorrect culprit.

What makes your think so? I just looked at bugzilla and it (for now)
seems reverting fed8d8773b8ea6 ontop of 6.4 fixed things for the
reporter, which is a pretty strong indicator that this change really
causes the trouble somehow.

/me really wonders what's he's missing

> What can I do in this case besides
> asking to repeat bisection?

Not much apart from updating regzbot state (e.g. something like "regzbot
introduced v6.3..v6.4") and a reply to your initial report (ideally with
a quick apology) to let everyone know it was a false alarm.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fwd: RCU stalls with wireguard over bonding over igb on Linux 6.3.0+
  2023-07-02 12:37   ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-07-02 13:46     ` Jason A. Donenfeld
  2023-07-03  1:29       ` Jason A. Donenfeld
  2023-07-03  1:34       ` Bagas Sanjaya
  2023-07-02 14:03     ` Sam James
  2023-07-02 14:08     ` Bagas Sanjaya
  2 siblings, 2 replies; 8+ messages in thread
From: Jason A. Donenfeld @ 2023-07-02 13:46 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: Bagas Sanjaya, Eric DeVolder, Borislav Petkov (AMD),
	David R, Boris Ostrovsky, Miguel Luis, Paul E. McKenney,
	Joel Fernandes, Boqun Feng, Jay Vosburgh, Andy Gospodarek,
	Rafael J. Wysocki, Len Brown, Thomas Gleixner, Ingo Molnar,
	Dave Hansen, x86, H. Peter Anvin, Linux Kernel Mailing List,
	Linux RCU, Wireguard Mailing List, Linux Networking, Linux ACPI,
	Manuel 'satmd' Leiner

I've got an overdue patch that I still need to submit to netdev, which
I suspect might actually fix this.

Can you let me know if
https://git.zx2c4.com/wireguard-linux/patch/?id=54d5e4329efe0d1dba8b4a58720d29493926bed0
solves the problem?

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fwd: RCU stalls with wireguard over bonding over igb on Linux 6.3.0+
  2023-07-02 12:37   ` Linux regression tracking (Thorsten Leemhuis)
  2023-07-02 13:46     ` Jason A. Donenfeld
@ 2023-07-02 14:03     ` Sam James
  2023-07-02 14:08     ` Bagas Sanjaya
  2 siblings, 0 replies; 8+ messages in thread
From: Sam James @ 2023-07-02 14:03 UTC (permalink / raw)
  To: regressions
  Cc: Jason, andy, bagasdotme, boqun.feng, boris.ovstrosky, bp,
	dave.hansen, david, eric.devolder, hpa, j.vosburgh, joel, lenb,
	linux-acpi, linux-kernel, manuel.leiner, miguel.luis, mingo,
	netdev, paulmck, rafael, rcu, regressions, tglx, wireguard, x86

#regzbot link: https://bugs.gentoo.org/909066

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fwd: RCU stalls with wireguard over bonding over igb on Linux 6.3.0+
  2023-07-02 12:37   ` Linux regression tracking (Thorsten Leemhuis)
  2023-07-02 13:46     ` Jason A. Donenfeld
  2023-07-02 14:03     ` Sam James
@ 2023-07-02 14:08     ` Bagas Sanjaya
  2 siblings, 0 replies; 8+ messages in thread
From: Bagas Sanjaya @ 2023-07-02 14:08 UTC (permalink / raw)
  To: Linux regressions mailing list, Eric DeVolder,
	Borislav Petkov (AMD),
	David R, Boris Ostrovsky, Miguel Luis, Paul E. McKenney,
	Joel Fernandes, Boqun Feng, Jason A. Donenfeld, Jay Vosburgh,
	Andy Gospodarek, Rafael J. Wysocki, Len Brown, Thomas Gleixner,
	Ingo Molnar, Dave Hansen, x86, H. Peter Anvin
  Cc: Linux Kernel Mailing List, Linux RCU, Wireguard Mailing List,
	Linux Networking, Linux ACPI, Manuel 'satmd' Leiner

On 7/2/23 19:37, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 02.07.23 13:57, Bagas Sanjaya wrote:
>> [also Cc: original reporter]
> 
> BTW: I think you CCed too many developers here. There are situations
> where this can makes sense, but it's rare. And if you do this too often
> people might start to not really look into your mails or might even
> ignore them completely.
> 
> Normally it's enough to write the mail to (1) the people in the
> signed-off-by-chain, (2) the maintainers of the subsystem that merged a
> commit, and (3) the lists for all affected subsystems; leave it up to
> developers from the first two groups to CC the maintainers of the third
> group.
> 

Hi,

In this case I had to also Cc: wireguard, bonding, RCU, and x86 people,
since this issue spans these subsystems (I naively thought). Anyway,
thanks for detailed tip (honestly /me wonder if I forgot this later, as
is often the case).

>> On 7/2/23 10:31, Bagas Sanjaya wrote:
>>> I notice a regression report on Bugzilla [1]. Quoting from it:
>>>
>>>> I've spent the last week on debugging a problem with my attempt to upgrade my kernel from 6.2.8 to 6.3.8 (now also with 
>> [...]
>>> See Bugzilla for the full thread.
>>>
>>> Anyway, I'm adding it to regzbot to make sure it doesn't fall through cracks
>>> unnoticed:
>>>
>>> #regzbot introduced: fed8d8773b8ea6 https://bugzilla.kernel.org/show_bug.cgi?id=217620
>>> #regzbot title: correcting acpi_is_processor_usable() check causes RCU stalls with wireguard over bonding+igb
>>> #regzbot link: https://bugs.gentoo.org/909066
> 
>> satmd: Can you repeat bisection to confirm that fed8d8773b8ea6 is
>> really the culprit?
> 
> I'd be careful to ask people that, as that might mean a lot of work for
> them. Best to leave things like that to developers, unless it's pretty
> obvious that something went sideways.
> 

OK.

>> Thorsten: It seems like the reporter concluded bisection to the
>> (possibly) incorrect culprit.
> 
> What makes your think so? I just looked at bugzilla and it (for now)
> seems reverting fed8d8773b8ea6 ontop of 6.4 fixed things for the
> reporter, which is a pretty strong indicator that this change really
> causes the trouble somehow.
> 

OK too.

> /me really wonders what's he's missing
> 
>> What can I do in this case besides
>> asking to repeat bisection?
> 
> Not much apart from updating regzbot state (e.g. something like "regzbot
> introduced v6.3..v6.4") and a reply to your initial report (ideally with
> a quick apology) to let everyone know it was a false alarm.
> 

OK.

-- 
An old man doll... just what I always wanted! - Clara


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fwd: RCU stalls with wireguard over bonding over igb on Linux 6.3.0+
  2023-07-02 13:46     ` Jason A. Donenfeld
@ 2023-07-03  1:29       ` Jason A. Donenfeld
  2023-07-03  1:34       ` Bagas Sanjaya
  1 sibling, 0 replies; 8+ messages in thread
From: Jason A. Donenfeld @ 2023-07-03  1:29 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: Bagas Sanjaya, Eric DeVolder, Borislav Petkov (AMD),
	David R, Boris Ostrovsky, Miguel Luis, Paul E. McKenney,
	Joel Fernandes, Boqun Feng, Jay Vosburgh, Andy Gospodarek,
	Rafael J. Wysocki, Len Brown, Thomas Gleixner, Ingo Molnar,
	Dave Hansen, x86, H. Peter Anvin, Linux Kernel Mailing List,
	Linux RCU, Wireguard Mailing List, Linux Networking, Linux ACPI,
	Manuel 'satmd' Leiner

On Sun, Jul 02, 2023 at 03:46:38PM +0200, Jason A. Donenfeld wrote:
> I've got an overdue patch that I still need to submit to netdev, which
> I suspect might actually fix this.
> 
> Can you let me know if
> https://git.zx2c4.com/wireguard-linux/patch/?id=54d5e4329efe0d1dba8b4a58720d29493926bed0
> solves the problem?

satmd, the original reporter, confirmed over on the Gentoo bug report -
https://bugs.gentoo.org/909066 - that this patch fixes the issue.

This patch has been sent into netdev and will presumably hit the various
trees and stable in due time.

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fwd: RCU stalls with wireguard over bonding over igb on Linux 6.3.0+
  2023-07-02 13:46     ` Jason A. Donenfeld
  2023-07-03  1:29       ` Jason A. Donenfeld
@ 2023-07-03  1:34       ` Bagas Sanjaya
  1 sibling, 0 replies; 8+ messages in thread
From: Bagas Sanjaya @ 2023-07-03  1:34 UTC (permalink / raw)
  To: Jason A. Donenfeld, Linux regressions mailing list
  Cc: Eric DeVolder, Borislav Petkov (AMD),
	David R, Boris Ostrovsky, Miguel Luis, Paul E. McKenney,
	Joel Fernandes, Boqun Feng, Jay Vosburgh, Andy Gospodarek,
	Rafael J. Wysocki, Len Brown, Thomas Gleixner, Ingo Molnar,
	Dave Hansen, x86, H. Peter Anvin, Linux Kernel Mailing List,
	Linux RCU, Wireguard Mailing List, Linux Networking, Linux ACPI,
	Manuel 'satmd' Leiner

[-- Attachment #1: Type: text/plain, Size: 576 bytes --]

On Sun, Jul 02, 2023 at 03:46:38PM +0200, Jason A. Donenfeld wrote:
> I've got an overdue patch that I still need to submit to netdev, which
> I suspect might actually fix this.
> 
> Can you let me know if
> https://git.zx2c4.com/wireguard-linux/patch/?id=54d5e4329efe0d1dba8b4a58720d29493926bed0
> solves the problem?

The reporter on Bugzilla [1] said it fixed the regression, so telling
regzbot:

#regzbot fix: 54d5e4329efe0d

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217620#c6

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-07-03  1:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-02  3:31 Fwd: RCU stalls with wireguard over bonding over igb on Linux 6.3.0+ Bagas Sanjaya
2023-07-02 11:57 ` Bagas Sanjaya
2023-07-02 12:37   ` Linux regression tracking (Thorsten Leemhuis)
2023-07-02 13:46     ` Jason A. Donenfeld
2023-07-03  1:29       ` Jason A. Donenfeld
2023-07-03  1:34       ` Bagas Sanjaya
2023-07-02 14:03     ` Sam James
2023-07-02 14:08     ` Bagas Sanjaya

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).