netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ehea: Fix memory hook reference counting crashes
@ 2015-04-24  5:52 Michael Ellerman
  2015-04-25 18:43 ` David Miller
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Ellerman @ 2015-04-24  5:52 UTC (permalink / raw)
  To: netdev; +Cc: linuxppc-dev, Anton Blanchard, cascardo, David S. Miller

The recent commit to only register the EHEA memory hotplug hooks on
adapter probe has a few problems.

Firstly the reference counting is wrong for multiple adapters, in that
the hooks are registered multiple times. Secondly the check in the tear
down path is backward. Finally the error path doesn't decrement the
count.

The multiple registration of the hooks is the biggest problem, as it
leads to oopses when the system is rebooted, and/or errors during memory
hotplug, eg:

  $ ./mem-on-off-test.sh -r 2
  ...
  ehea: memory is going offline
  ehea: LPAR memory changed - re-initializing driver
  ehea: re-initializing driver complete
  ehea: memory is going offline
  ehea: LPAR memory changed - re-initializing driver
  ehea: opcode=26c ret=fffffffffffffffc arg1=8000000003000003 arg2=0 arg3=700000060000d600 arg4=3fded0000 arg5=200 arg6=0 arg7=0
  ehea: register_rpage_mr failed
  ehea: registering mr failed
  ehea: register MR failed - driver inoperable!
  ehea: memory is going offline

Fixes: aa183323312d ("ehea: Register memory hotplug, reboot and crash hooks on adapter probe")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 drivers/net/ethernet/ibm/ehea/ehea_main.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c b/drivers/net/ethernet/ibm/ehea/ehea_main.c
index c05e50759621..00d86be0c831 100644
--- a/drivers/net/ethernet/ibm/ehea/ehea_main.c
+++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c
@@ -3347,7 +3347,7 @@ static int ehea_register_memory_hooks(void)
 {
 	int ret = 0;
 
-	if (atomic_inc_and_test(&ehea_memory_hooks_registered))
+	if (atomic_inc_return(&ehea_memory_hooks_registered) > 1)
 		return 0;
 
 	ret = ehea_create_busmap();
@@ -3381,12 +3381,14 @@ out3:
 out2:
 	unregister_reboot_notifier(&ehea_reboot_nb);
 out:
+	atomic_dec(&ehea_memory_hooks_registered);
 	return ret;
 }
 
 static void ehea_unregister_memory_hooks(void)
 {
-	if (atomic_read(&ehea_memory_hooks_registered))
+	/* Only remove the hooks if we've registered them */
+	if (atomic_read(&ehea_memory_hooks_registered) == 0)
 		return;
 
 	unregister_reboot_notifier(&ehea_reboot_nb);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] ehea: Fix memory hook reference counting crashes
  2015-04-24  5:52 [PATCH] ehea: Fix memory hook reference counting crashes Michael Ellerman
@ 2015-04-25 18:43 ` David Miller
  2015-04-27  0:30   ` Michael Ellerman
  0 siblings, 1 reply; 3+ messages in thread
From: David Miller @ 2015-04-25 18:43 UTC (permalink / raw)
  To: mpe; +Cc: netdev, linuxppc-dev, anton, cascardo

From: Michael Ellerman <mpe@ellerman.id.au>
Date: Fri, 24 Apr 2015 15:52:32 +1000

> The recent commit to only register the EHEA memory hotplug hooks on
> adapter probe has a few problems.
> 
> Firstly the reference counting is wrong for multiple adapters, in that
> the hooks are registered multiple times. Secondly the check in the tear
> down path is backward. Finally the error path doesn't decrement the
> count.
> 
> The multiple registration of the hooks is the biggest problem, as it
> leads to oopses when the system is rebooted, and/or errors during memory
> hotplug, eg:
 ...
> Fixes: aa183323312d ("ehea: Register memory hotplug, reboot and crash hooks on adapter probe")
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

Applied, but using an atomic counter for this is really inappropriate
and is what lead to this bug in the first place.

You're not counting anything, because if you were, then you would be
decrementing this thing somewhere.

Rather, it's purely a boolean state saying "I did X".  So it should be
a boolean, and no atomicity nor other special considerations are
needed for setting it to true.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] ehea: Fix memory hook reference counting crashes
  2015-04-25 18:43 ` David Miller
@ 2015-04-27  0:30   ` Michael Ellerman
  0 siblings, 0 replies; 3+ messages in thread
From: Michael Ellerman @ 2015-04-27  0:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linuxppc-dev, anton, cascardo

On Sat, 2015-04-25 at 14:43 -0400, David Miller wrote:
> From: Michael Ellerman <mpe@ellerman.id.au>
> Date: Fri, 24 Apr 2015 15:52:32 +1000
> 
> > The recent commit to only register the EHEA memory hotplug hooks on
> > adapter probe has a few problems.
> > 
> > Firstly the reference counting is wrong for multiple adapters, in that
> > the hooks are registered multiple times. Secondly the check in the tear
> > down path is backward. Finally the error path doesn't decrement the
> > count.
> > 
> > The multiple registration of the hooks is the biggest problem, as it
> > leads to oopses when the system is rebooted, and/or errors during memory
> > hotplug, eg:
>  ...
> > Fixes: aa183323312d ("ehea: Register memory hotplug, reboot and crash hooks on adapter probe")
> > Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> 
> Applied, but using an atomic counter for this is really inappropriate
> and is what lead to this bug in the first place.
> 
> You're not counting anything, because if you were, then you would be
> decrementing this thing somewhere.
> 
> Rather, it's purely a boolean state saying "I did X".  So it should be
> a boolean, and no atomicity nor other special considerations are
> needed for setting it to true.

Yeah I agree, it's a mess.

We should be unregistering the hooks when the last adapter is removed, which is
where we'd do the decrement. As it's written the hooks stay registered until
the driver is removed.

I'll try and find time, or someone else with time, to fix it up properly for 4.2.

cheers

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-04-27  0:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-24  5:52 [PATCH] ehea: Fix memory hook reference counting crashes Michael Ellerman
2015-04-25 18:43 ` David Miller
2015-04-27  0:30   ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).