linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net: atlantic: always deep reset on pm op, fixing null deref regression
@ 2022-05-04 22:06 Manuel Ullmann
  2022-05-04 22:24 ` Holger Hoffstätte
  2022-05-05  7:04 ` [EXT] " Igor Russkikh
  0 siblings, 2 replies; 5+ messages in thread
From: Manuel Ullmann @ 2022-05-04 22:06 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: linux-kernel, netdev, regressions, davem, ndanilov, kuba, pabeni,
	Jordan Leppert, Holger Hoffstätte, koo5

From a3eccd32c618fe4b4f5c537cd83ba5611149623e Mon Sep 17 00:00:00 2001
Date: Wed, 4 May 2022 21:30:44 +0200

The impact of this regression is the same for resume that I saw on
thaw: the kernel hangs and nothing except SysRq rebooting can be done.

The null deref occurs at the same position as on thaw.
BUG: kernel NULL pointer dereference
RIP: aq_ring_rx_fill+0xcf/0x210 [atlantic]

Fixes regression in cbe6c3a8f8f4 ("net: atlantic: invert deep par in
pm functions, preventing null derefs"), where I disabled deep pm
resets in suspend and resume, trying to make sense of the
atl_resume_common deep parameter in the first place.

It turns out, that atlantic always has to deep reset on pm operations
and the parameter is useless. Even though I expected that and tested
resume, I screwed up by kexec-rebooting into an unpatched kernel, thus
missing the breakage.

This fixup obsoletes the deep parameter of atl_resume_common, but I
leave the cleanup for the maintainers to post to mainline.

PS: I'm very sorry for this regression.

Fixes: cbe6c3a8f8f4315b96e46e1a1c70393c06d95a4c
Link: https://lore.kernel.org/regressions/9-Ehc_xXSwdXcvZqKD5aSqsqeNj5Izco4MYEwnx5cySXVEc9-x_WC4C3kAoCqNTi-H38frroUK17iobNVnkLtW36V6VWGSQEOHXhmVMm5iQ=@protonmail.com/
Reported-by: Jordan Leppert <jordanleppert@protonmail.com>
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
CC: <stable@vger.kernel.org> # 5.17.5
CC: <stable@vger.kernel.org> # 5.15.36
CC: <stable@vger.kernel.org> # 5.10.113
Signed-off-by: Manuel ULlmann <labre@posteo.de>
---
 drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
index 3a529ee8c834..831833911a52 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
@@ -449,7 +449,7 @@ static int aq_pm_freeze(struct device *dev)
 
 static int aq_pm_suspend_poweroff(struct device *dev)
 {
-	return aq_suspend_common(dev, false);
+	return aq_suspend_common(dev, true);
 }
 
 static int aq_pm_thaw(struct device *dev)
@@ -459,7 +459,7 @@ static int aq_pm_thaw(struct device *dev)
 
 static int aq_pm_resume_restore(struct device *dev)
 {
-	return atl_resume_common(dev, false);
+	return atl_resume_common(dev, true);
 }
 
 static const struct dev_pm_ops aq_pm_ops = {

base-commit: 672c0c5173427e6b3e2a9bbb7be51ceeec78093a
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] net: atlantic: always deep reset on pm op, fixing null deref regression
  2022-05-04 22:06 [PATCH] net: atlantic: always deep reset on pm op, fixing null deref regression Manuel Ullmann
@ 2022-05-04 22:24 ` Holger Hoffstätte
  2022-05-05  7:04 ` [EXT] " Igor Russkikh
  1 sibling, 0 replies; 5+ messages in thread
From: Holger Hoffstätte @ 2022-05-04 22:24 UTC (permalink / raw)
  To: Manuel Ullmann, Igor Russkikh
  Cc: linux-kernel, netdev, regressions, davem, ndanilov, kuba, pabeni,
	Jordan Leppert, koo5

On 2022-05-05 00:06, Manuel Ullmann wrote:
>  From a3eccd32c618fe4b4f5c537cd83ba5611149623e Mon Sep 17 00:00:00 2001
> Date: Wed, 4 May 2022 21:30:44 +0200
> 
> The impact of this regression is the same for resume that I saw on
> thaw: the kernel hangs and nothing except SysRq rebooting can be done.
> 
> The null deref occurs at the same position as on thaw.
> BUG: kernel NULL pointer dereference
> RIP: aq_ring_rx_fill+0xcf/0x210 [atlantic]
> 
> Fixes regression in cbe6c3a8f8f4 ("net: atlantic: invert deep par in
> pm functions, preventing null derefs"), where I disabled deep pm
> resets in suspend and resume, trying to make sense of the
> atl_resume_common deep parameter in the first place.
> 
> It turns out, that atlantic always has to deep reset on pm operations
> and the parameter is useless. Even though I expected that and tested
> resume, I screwed up by kexec-rebooting into an unpatched kernel, thus
> missing the breakage.
> 
> This fixup obsoletes the deep parameter of atl_resume_common, but I
> leave the cleanup for the maintainers to post to mainline.
> 
> PS: I'm very sorry for this regression.
> 
> Fixes: cbe6c3a8f8f4315b96e46e1a1c70393c06d95a4c
> Link: https://lore.kernel.org/regressions/9-Ehc_xXSwdXcvZqKD5aSqsqeNj5Izco4MYEwnx5cySXVEc9-x_WC4C3kAoCqNTi-H38frroUK17iobNVnkLtW36V6VWGSQEOHXhmVMm5iQ=@protonmail.com/
> Reported-by: Jordan Leppert <jordanleppert@protonmail.com>
> Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
> CC: <stable@vger.kernel.org> # 5.17.5
> CC: <stable@vger.kernel.org> # 5.15.36
> CC: <stable@vger.kernel.org> # 5.10.113
> Signed-off-by: Manuel ULlmann <labre@posteo.de>
> ---
>   drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
> index 3a529ee8c834..831833911a52 100644
> --- a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
> +++ b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
> @@ -449,7 +449,7 @@ static int aq_pm_freeze(struct device *dev)
>   
>   static int aq_pm_suspend_poweroff(struct device *dev)
>   {
> -	return aq_suspend_common(dev, false);
> +	return aq_suspend_common(dev, true);
>   }
>   
>   static int aq_pm_thaw(struct device *dev)
> @@ -459,7 +459,7 @@ static int aq_pm_thaw(struct device *dev)
>   
>   static int aq_pm_resume_restore(struct device *dev)
>   {
> -	return atl_resume_common(dev, false);
> +	return atl_resume_common(dev, true);
>   }
>   
>   static const struct dev_pm_ops aq_pm_ops = {
> 
> base-commit: 672c0c5173427e6b3e2a9bbb7be51ceeec78093a
> 

As mentioned in the discusson thread this reliably restores resume
for me, so:

Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>

Thanks!
Holger

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [EXT] [PATCH] net: atlantic: always deep reset on pm op, fixing null deref regression
  2022-05-04 22:06 [PATCH] net: atlantic: always deep reset on pm op, fixing null deref regression Manuel Ullmann
  2022-05-04 22:24 ` Holger Hoffstätte
@ 2022-05-05  7:04 ` Igor Russkikh
  2022-05-05 10:24   ` Jordan Leppert
  1 sibling, 1 reply; 5+ messages in thread
From: Igor Russkikh @ 2022-05-05  7:04 UTC (permalink / raw)
  To: Manuel Ullmann
  Cc: linux-kernel, netdev, regressions, davem, kuba, pabeni,
	Jordan Leppert, Holger Hoffstätte, koo5, Dmitry Bezrukov


> The impact of this regression is the same for resume that I saw on
> thaw: the kernel hangs and nothing except SysRq rebooting can be done.
> 
> The null deref occurs at the same position as on thaw.
> BUG: kernel NULL pointer dereference
> RIP: aq_ring_rx_fill+0xcf/0x210 [atlantic]
> 
> Fixes regression in cbe6c3a8f8f4 ("net: atlantic: invert deep par in
> pm functions, preventing null derefs"), where I disabled deep pm
> resets in suspend and resume, trying to make sense of the
> atl_resume_common deep parameter in the first place.
> 
> It turns out, that atlantic always has to deep reset on pm operations
> and the parameter is useless. Even though I expected that and tested
> resume, I screwed up by kexec-rebooting into an unpatched kernel, thus
> missing the breakage.
> 
> This fixup obsoletes the deep parameter of atl_resume_common, but I
> leave the cleanup for the maintainers to post to mainline.
> 
> PS: I'm very sorry for this regression.

Hi Manuel,

Unfortunately I've missed to review and comment on previous patch - it was too quickly accepted.

I'm still in doubt on your fixes, even after rereading the original problem.
Is it possible for you to test this with all the possible combinations?
suspend/resume with device up/down,
hibernate/restore with device up/down?

I'll try to do the same on our side, but we don't have much resources for that now unfortunately..


> Fixes: cbe6c3a8f8f4315b96e46e1a1c70393c06d95a4c

That tag format is incorrect I think..

Igor

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [EXT] [PATCH] net: atlantic: always deep reset on pm op, fixing null deref regression
  2022-05-05  7:04 ` [EXT] " Igor Russkikh
@ 2022-05-05 10:24   ` Jordan Leppert
  2022-05-05 17:39     ` Manuel Ullmann
  0 siblings, 1 reply; 5+ messages in thread
From: Jordan Leppert @ 2022-05-05 10:24 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: Manuel Ullmann, linux-kernel, netdev, regressions, davem, kuba,
	pabeni, Holger Hoffstätte, koo5, Dmitry Bezrukov

With the proposed patch (deep parameter is always true), I've managed to test:
1. Hibernate/restore (with device down/up)
2. Suspend/resume (with device down/up)

I put the device down with the command:
sudo ip link set <connection> down

I hope that's correct, if not please let me know correct command.

Regards,
Jordan


------- Original Message -------
On Thursday, May 5th, 2022 at 08:04, Igor Russkikh <irusskikh@marvell.com> wrote:


>
>
> > The impact of this regression is the same for resume that I saw on
> > thaw: the kernel hangs and nothing except SysRq rebooting can be done.
> >
> > The null deref occurs at the same position as on thaw.
> > BUG: kernel NULL pointer dereference
> > RIP: aq_ring_rx_fill+0xcf/0x210 [atlantic]
> >
> > Fixes regression in cbe6c3a8f8f4 ("net: atlantic: invert deep par in
> > pm functions, preventing null derefs"), where I disabled deep pm
> > resets in suspend and resume, trying to make sense of the
> > atl_resume_common deep parameter in the first place.
> >
> > It turns out, that atlantic always has to deep reset on pm operations
> > and the parameter is useless. Even though I expected that and tested
> > resume, I screwed up by kexec-rebooting into an unpatched kernel, thus
> > missing the breakage.
> >
> > This fixup obsoletes the deep parameter of atl_resume_common, but I
> > leave the cleanup for the maintainers to post to mainline.
> >
> > PS: I'm very sorry for this regression.
>
>
> Hi Manuel,
>
> Unfortunately I've missed to review and comment on previous patch - it was too quickly accepted.
>
> I'm still in doubt on your fixes, even after rereading the original problem.
> Is it possible for you to test this with all the possible combinations?
> suspend/resume with device up/down,
> hibernate/restore with device up/down?
>
> I'll try to do the same on our side, but we don't have much resources for that now unfortunately..
>
> > Fixes: cbe6c3a8f8f4315b96e46e1a1c70393c06d95a4c
>
>
> That tag format is incorrect I think..
>
> Igor

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [EXT] [PATCH] net: atlantic: always deep reset on pm op, fixing null deref regression
  2022-05-05 10:24   ` Jordan Leppert
@ 2022-05-05 17:39     ` Manuel Ullmann
  0 siblings, 0 replies; 5+ messages in thread
From: Manuel Ullmann @ 2022-05-05 17:39 UTC (permalink / raw)
  To: Jordan Leppert
  Cc: Igor Russkikh, Manuel Ullmann, linux-kernel, netdev, regressions,
	davem, kuba, pabeni, Holger Hoffstätte, koo5,
	Dmitry Bezrukov

>>
>>
>> > The impact of this regression is the same for resume that I saw on
>> > thaw: the kernel hangs and nothing except SysRq rebooting can be done.
>> >
>> > The null deref occurs at the same position as on thaw.
>> > BUG: kernel NULL pointer dereference
>> > RIP: aq_ring_rx_fill+0xcf/0x210 [atlantic]
>> >
>> > Fixes regression in cbe6c3a8f8f4 ("net: atlantic: invert deep par in
>> > pm functions, preventing null derefs"), where I disabled deep pm
>> > resets in suspend and resume, trying to make sense of the
>> > atl_resume_common deep parameter in the first place.
>> >
>> > It turns out, that atlantic always has to deep reset on pm operations
>> > and the parameter is useless. Even though I expected that and tested
>> > resume, I screwed up by kexec-rebooting into an unpatched kernel, thus
>> > missing the breakage.
>> >
>> > This fixup obsoletes the deep parameter of atl_resume_common, but I
>> > leave the cleanup for the maintainers to post to mainline.
>> >
>> > PS: I'm very sorry for this regression.
>>
>>
>> Hi Manuel,
>>
>> Unfortunately I've missed to review and comment on previous patch - it was too quickly accepted.
>>
>> I'm still in doubt on your fixes, even after rereading the original problem.
>> Is it possible for you to test this with all the possible combinations?
>> suspend/resume with device up/down,
>> hibernate/restore with device up/down?

I confirm that suspend/resume/hibernation/thaw keeps working in all
cases. Thaw would work without the original patch, if the device is down
before hibernation. I also originally described this behaviour on
bugzilla at

https://bugzilla.kernel.org/show_bug.cgi?id=215798

See also Jordan’s confirmation below.

I think, the main reason, why this could break, is, that the deep
parameter had no real impact until the breaking commit. So it was
practically untested, when the allocation/free functions were split.

Another thing, that I tested, was guarding all null pointer references
with null checks, which failed at first, because GCC optimized them
out. I think I have the atlantic tree for this (bad) fix attempt
floating around. I can try to rebase and create a patch from this and
post it to the Github issue, if you are interested.
https://github.com/Aquantia/AQtion/issues/32

Don’t have time for this before the weekend though.

>> I'll try to do the same on our side, but we don't have much resources for that now unfortunately..
>>
>> > Fixes: cbe6c3a8f8f4315b96e46e1a1c70393c06d95a4c
>>
>>
>> That tag format is incorrect I think..

Thanks for pointing that out. Also, are those stable Cc tags correct?
Because I figured, that the x in the documentation could be also the
branch name and not a placeholder. Should I resend the patch, fixing the
tags? Won’t get to it before tomorrow, though.

>> Igor

> With the proposed patch (deep parameter is always true), I've managed to test:
> 1. Hibernate/restore (with device down/up)
> 2. Suspend/resume (with device down/up)
>
> I put the device down with the command:
> sudo ip link set <connection> down
>
> I hope that's correct, if not please let me know correct command.

This should be the correct.

Manuel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-05-05 17:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-04 22:06 [PATCH] net: atlantic: always deep reset on pm op, fixing null deref regression Manuel Ullmann
2022-05-04 22:24 ` Holger Hoffstätte
2022-05-05  7:04 ` [EXT] " Igor Russkikh
2022-05-05 10:24   ` Jordan Leppert
2022-05-05 17:39     ` Manuel Ullmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).