linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] notifier: Fix soft lockup for notifier_call_chain().
@ 2016-06-28  4:56 Ding Tianhong
  2016-06-28  5:13 ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Ding Tianhong @ 2016-06-28  4:56 UTC (permalink / raw)
  To: luto, mingo, linux-kernel, Eric Dumazet, David S. Miller, Netdev,
	Cong Wang

The problem was occurs in my system that a lot of drviers register
its own handler to the notifiler call chain for netdev_chain, and
then create 4095 vlan dev for one nic, and add several ipv6 address
on each one of them, just like this:

for i in `seq 1 4095`; do ip link add link eth0 name eth0.$i type vlan id $i; done
for i in `seq 1 4095`; do ip -6 addr add 2001::$i dev eth0.$i; done
for i in `seq 1 4095`; do ip -6 addr add 2002::$i dev eth0.$i; done
for i in `seq 1 4095`; do ip -6 addr add 2003::$i dev eth0.$i; done

ifconfig eth0 up
ifconfig eth0 down

then it will halt several seconds, and occurs softlockup:

<0>[ 7620.364058]NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [ifconfig:19186]
<0>[ 7620.364592]Call trace:
<4>[ 7620.364599][<ffffffc000208f68>] dump_backtrace+0x0/0x220
<4>[ 7620.364603][<ffffffc0002091a8>] show_stack+0x20/0x28
<4>[ 7620.364607][<ffffffc000691fac>] dump_stack+0x90/0xb0
<4>[ 7620.364612][<ffffffc0002cacbc>] watchdog_timer_fn+0x41c/0x460
<4>[ 7620.364617][<ffffffc000289ec8>] __run_hrtimer+0x98/0x2d8
<4>[ 7620.364620][<ffffffc00028a3e0>] hrtimer_interrupt+0x110/0x288
<4>[ 7620.364624][<ffffffc00059c0c8>] arch_timer_handler_phys+0x38/0x48
<4>[ 7620.364628][<ffffffc000276d2c>] handle_percpu_devid_irq+0x9c/0x190
<4>[ 7620.364632][<ffffffc000271ef8>] generic_handle_irq+0x40/0x58
<4>[ 7620.364635][<ffffffc000272270>] __handle_domain_irq+0x68/0xc0
<4>[ 7620.364638][<ffffffc000200634>] gic_handle_irq+0xc4/0x1c8
<4>[ 7620.364641]Exception stack(0xffffffc0309b3640 to 0xffffffc0309b3770)
<4>[ 7620.364644]3640: 0000000000001000 0000000000000000 ffffffc0309b37c0 ffffffbfa1019cf8
<4>[ 7620.364647]3660: 0000000080000145 ffffffc0309b3958 0000000000000000 ffffffbfa1013008
<4>[ 7620.364651]3680: 00000000000007f0 ffffffbfa131b770 ffffffd08aaadc40 ffffffbfa1019cf8
<4>[ 7620.364654]36a0: ffffffbfa1019cc4 ffffffd089c2b000 ffffffd08eff8000 ffffffc0309b3958
<4>[ 7620.364656]36c0: ffffffbfa101c5c0 0000000000000000 0000000000000000 ffffffbfa101c66c
<4>[ 7620.364659]36e0: 7f7f7f7f7f7f7f7f 0000000000000030 ffffffffffffffff ffff000000000000
<4>[ 7620.364662]3700: 0000000000000000 0000000000000000 ffffffc000393d58 0000007f794d67b0
<4>[ 7620.364665]3720: 0000007fe62215d0 ffffffc0309b3830 ffffffc00021d8e0 ffffffbfa1049b68
<4>[ 7620.364668]3740: ffffffc000697578 ffffffc0006974b8 ffffffc0309b3958 0000000000000000
<4>[ 7620.364670]3760: ffffffbfa1013008 00000000000007f0
<4>[ 7620.364673][<ffffffc000203780>] el1_irq+0x80/0x100
<4>[ 7620.364692][<ffffffbfa1019ed4>] fib6_walk+0x3c/0x70 [ipv6]
<4>[ 7620.364710][<ffffffbfa1019f70>] fib6_clean_tree+0x68/0x90 [ipv6]
<4>[ 7620.364727][<ffffffbfa101a020>] __fib6_clean_all+0x88/0xc0 [ipv6]
<4>[ 7620.364746][<ffffffbfa101c760>] fib6_clean_all+0x28/0x30 [ipv6]
<4>[ 7620.364763][<ffffffbfa101933c>] rt6_ifdown+0x64/0x148 [ipv6]
<4>[ 7620.364781][<ffffffbfa100e6d8>] addrconf_ifdown+0x68/0x540 [ipv6]
<4>[ 7620.364798][<ffffffbfa1010f58>] addrconf_notify+0xd0/0x8b8 [ipv6]
<4>[ 7620.364801][<ffffffc00023f83c>] notifier_call_chain+0x5c/0xa0
<4>[ 7620.364804][<ffffffc00023f9e0>] raw_notifier_call_chain+0x20/0x28
<4>[ 7620.364809][<ffffffc0005cbab4>] call_netdevice_notifiers_info+0x4c/0x80
<4>[ 7620.364812][<ffffffc0005cbfc8>] dev_close_many+0xd0/0x138
<4>[ 7620.364821][<ffffffbfa33be6e8>] vlan_device_event+0x4a8/0x6a0 [8021q]
<4>[ 7620.364824][<ffffffc00023f83c>] notifier_call_chain+0x5c/0xa0
<4>[ 7620.364827][<ffffffc00023f9e0>] raw_notifier_call_chain+0x20/0x28
<4>[ 7620.364830][<ffffffc0005cbab4>] call_netdevice_notifiers_info+0x4c/0x80
<4>[ 7620.364833][<ffffffc0005d5148>] __dev_notify_flags+0xb8/0xe0
<4>[ 7620.364836][<ffffffc0005d5994>] dev_change_flags+0x54/0x68
<4>[ 7620.364840][<ffffffc00064a620>] devinet_ioctl+0x650/0x700
<4>[ 7620.364843][<ffffffc00064bea4>] inet_ioctl+0xa4/0xc8
<4>[ 7620.364847][<ffffffc0005b1094>] sock_do_ioctl+0x44/0x88
<4>[ 7620.364850][<ffffffc0005b1a3c>] sock_ioctl+0x23c/0x308
<4>[ 7620.364854][<ffffffc000393bc4>] do_vfs_ioctl+0x48c/0x620
<4>[ 7620.364857][<ffffffc000393dec>] SyS_ioctl+0x94/0xa8

=================================cut here========================================

It looks that the notifier_call_chain has to deal with too much handler, and will not
feed the watchdog until finish the work, and the notifier_call_chain may be called
in atomic context, so add touch_nmi_watchdog() in the loops to fix this problem,
and it will not panic again.

v2: add cond_resched() will break the atomic context, so feed the watchdog in
    the loops to fix this bug.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 kernel/notifier.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/notifier.c b/kernel/notifier.c
index fd2c9ac..7eca3c1 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -5,6 +5,7 @@
 #include <linux/rcupdate.h>
 #include <linux/vmalloc.h>
 #include <linux/reboot.h>
+#include <linux/nmi.h>
 
 /*
  *	Notifier list for kernel code which wants to be called
@@ -92,6 +93,8 @@ static int notifier_call_chain(struct notifier_block **nl,
 #endif
 		ret = nb->notifier_call(nb, val, v);
 
+		touch_nmi_watchdog();
+
 		if (nr_calls)
 			(*nr_calls)++;
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] notifier: Fix soft lockup for notifier_call_chain().
  2016-06-28  4:56 [PATCH v2] notifier: Fix soft lockup for notifier_call_chain() Ding Tianhong
@ 2016-06-28  5:13 ` Eric Dumazet
  2016-06-28  6:09   ` Ding Tianhong
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2016-06-28  5:13 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: luto, mingo, linux-kernel, Eric Dumazet, David S. Miller, Netdev,
	Cong Wang

On Tue, 2016-06-28 at 12:56 +0800, Ding Tianhong wrote:
> The problem was occurs in my system that a lot of drviers register
> its own handler to the notifiler call chain for netdev_chain, and
> then create 4095 vlan dev for one nic, and add several ipv6 address
> on each one of them, just like this:
> 
> for i in `seq 1 4095`; do ip link add link eth0 name eth0.$i type vlan id $i; done
> for i in `seq 1 4095`; do ip -6 addr add 2001::$i dev eth0.$i; done
> for i in `seq 1 4095`; do ip -6 addr add 2002::$i dev eth0.$i; done
> for i in `seq 1 4095`; do ip -6 addr add 2003::$i dev eth0.$i; done
> 
> ifconfig eth0 up
> ifconfig eth0 down

I would very much prefer cond_resched() at a more appropriate place.

touch_nmi_watchdog() does not fundamentally solve the issue, as some
process is holding one cpu for a very long time.

Probably in addrconf_ifdown(), as if you have 100,000 IPv6 addresses on
a single netdev, this function might also trigger a soft lockup, without
playing with 4096 vlans...

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index a1f6b7b315317f811cafbf386cf21dfc510c2010..13b675f79a751db45af28fc0474ddb17d9b69b06 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3566,6 +3566,7 @@ restart:
 			}
 		}
 		spin_unlock_bh(&addrconf_hash_lock);
+		cond_resched();
 	}
 
 	write_lock_bh(&idev->lock);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] notifier: Fix soft lockup for notifier_call_chain().
  2016-06-28  5:13 ` Eric Dumazet
@ 2016-06-28  6:09   ` Ding Tianhong
  2016-06-28  6:22     ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Ding Tianhong @ 2016-06-28  6:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: luto, mingo, linux-kernel, Eric Dumazet, David S. Miller, Netdev,
	Cong Wang

On 2016/6/28 13:13, Eric Dumazet wrote:
> On Tue, 2016-06-28 at 12:56 +0800, Ding Tianhong wrote:
>> The problem was occurs in my system that a lot of drviers register
>> its own handler to the notifiler call chain for netdev_chain, and
>> then create 4095 vlan dev for one nic, and add several ipv6 address
>> on each one of them, just like this:
>>
>> for i in `seq 1 4095`; do ip link add link eth0 name eth0.$i type vlan id $i; done
>> for i in `seq 1 4095`; do ip -6 addr add 2001::$i dev eth0.$i; done
>> for i in `seq 1 4095`; do ip -6 addr add 2002::$i dev eth0.$i; done
>> for i in `seq 1 4095`; do ip -6 addr add 2003::$i dev eth0.$i; done
>>
>> ifconfig eth0 up
>> ifconfig eth0 down
> 
> I would very much prefer cond_resched() at a more appropriate place.
> 
> touch_nmi_watchdog() does not fundamentally solve the issue, as some
> process is holding one cpu for a very long time.
> 
> Probably in addrconf_ifdown(), as if you have 100,000 IPv6 addresses on
> a single netdev, this function might also trigger a soft lockup, without
> playing with 4096 vlans...
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index a1f6b7b315317f811cafbf386cf21dfc510c2010..13b675f79a751db45af28fc0474ddb17d9b69b06 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -3566,6 +3566,7 @@ restart:
>  			}
>  		}
>  		spin_unlock_bh(&addrconf_hash_lock);
> +		cond_resched();
>  	}
>  
>  	write_lock_bh(&idev->lock);
> 
> 
it looks like not enough, I still got this calltrace,

<4>[ 7618.596184]3840: ffffffbfa101a0a0 00000000000007f0
<4>[ 7618.596187][<ffffffc000203780>] el1_irq+0x80/0x100
<4>[ 7618.596255][<ffffffbfa1019d74>] fib6_walk_continue+0x1d4/0x200 [ipv6]
<4>[ 7618.596275][<ffffffbfa1019ed4>] fib6_walk+0x3c/0x70 [ipv6]
<4>[ 7618.596295][<ffffffbfa1019f70>] fib6_clean_tree+0x68/0x90 [ipv6]
<4>[ 7618.596314][<ffffffbfa101a020>] __fib6_clean_all+0x88/0xc0 [ipv6]
<4>[ 7618.596334][<ffffffbfa101c7f0>] fib6_run_gc+0x88/0x148 [ipv6]
<4>[ 7618.596354][<ffffffbfa1021678>] ndisc_netdev_event+0x80/0x140 [ipv6]
<4>[ 7618.596358][<ffffffc00023f83c>] notifier_call_chain+0x5c/0xa0
<4>[ 7618.596361][<ffffffc00023f9e0>] raw_notifier_call_chain+0x20/0x28
<4>[ 7618.596366][<ffffffc0005cbab4>] call_netdevice_notifiers_info+0x4c/0x80
<4>[ 7618.596369][<ffffffc0005cbfc8>] dev_close_many+0xd0/0x138
<4>[ 7618.596378][<ffffffbfa33be6e8>] vlan_device_event+0x4a8/0x6a0 [8021q]
<4>[ 7618.596381][<ffffffc00023f83c>] notifier_call_chain+0x5c/0xa0
<4>[ 7618.596384][<ffffffc00023f9e0>] raw_notifier_call_chain+0x20/0x28
<4>[ 7618.596387][<ffffffc0005cbab4>] call_netdevice_notifiers_info+0x4c/0x80
<4>[ 7618.596390][<ffffffc0005d5148>] __dev_notify_flags+0xb8/0xe0
<4>[ 7618.596393][<ffffffc0005d5994>] dev_change_flags+0x54/0x68
<4>[ 7618.596397][<ffffffc00064a620>] devinet_ioctl+0x650/0x700
<4>[ 7618.596400][<ffffffc00064bea4>] inet_ioctl+0xa4/0xc8
<4>[ 7618.596405][<ffffffc0005b1094>] sock_do_ioctl+0x44/0x88
<4>[ 7618.596408][<ffffffc0005b1a3c>] sock_ioctl+0x23c/0x308
<4>[ 7618.596413][<ffffffc000393bc4>] do_vfs_ioctl+0x48c/0x620




> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] notifier: Fix soft lockup for notifier_call_chain().
  2016-06-28  6:09   ` Ding Tianhong
@ 2016-06-28  6:22     ` Eric Dumazet
  2016-06-28  6:27       ` Eric Dumazet
  2016-06-28 17:33       ` Cong Wang
  0 siblings, 2 replies; 8+ messages in thread
From: Eric Dumazet @ 2016-06-28  6:22 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: luto, mingo, linux-kernel, Eric Dumazet, David S. Miller, Netdev,
	Cong Wang

On Tue, 2016-06-28 at 14:09 +0800, Ding Tianhong wrote:
> On 2016/6/28 13:13, Eric Dumazet wrote:
> > On Tue, 2016-06-28 at 12:56 +0800, Ding Tianhong wrote:
> >> The problem was occurs in my system that a lot of drviers register
> >> its own handler to the notifiler call chain for netdev_chain, and
> >> then create 4095 vlan dev for one nic, and add several ipv6 address
> >> on each one of them, just like this:
> >>
> >> for i in `seq 1 4095`; do ip link add link eth0 name eth0.$i type vlan id $i; done
> >> for i in `seq 1 4095`; do ip -6 addr add 2001::$i dev eth0.$i; done
> >> for i in `seq 1 4095`; do ip -6 addr add 2002::$i dev eth0.$i; done
> >> for i in `seq 1 4095`; do ip -6 addr add 2003::$i dev eth0.$i; done
> >>
> >> ifconfig eth0 up
> >> ifconfig eth0 down
> > 
> > I would very much prefer cond_resched() at a more appropriate place.
> > 
> > touch_nmi_watchdog() does not fundamentally solve the issue, as some
> > process is holding one cpu for a very long time.
> > 
> > Probably in addrconf_ifdown(), as if you have 100,000 IPv6 addresses on
> > a single netdev, this function might also trigger a soft lockup, without
> > playing with 4096 vlans...
> > 
> > diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> > index a1f6b7b315317f811cafbf386cf21dfc510c2010..13b675f79a751db45af28fc0474ddb17d9b69b06 100644
> > --- a/net/ipv6/addrconf.c
> > +++ b/net/ipv6/addrconf.c
> > @@ -3566,6 +3566,7 @@ restart:
> >  			}
> >  		}
> >  		spin_unlock_bh(&addrconf_hash_lock);
> > +		cond_resched();
> >  	}
> >  
> >  	write_lock_bh(&idev->lock);
> > 
> > 
> it looks like not enough, I still got this calltrace,
> 
> <4>[ 7618.596184]3840: ffffffbfa101a0a0 00000000000007f0
> <4>[ 7618.596187][<ffffffc000203780>] el1_irq+0x80/0x100
> <4>[ 7618.596255][<ffffffbfa1019d74>] fib6_walk_continue+0x1d4/0x200 [ipv6]
> <4>[ 7618.596275][<ffffffbfa1019ed4>] fib6_walk+0x3c/0x70 [ipv6]
> <4>[ 7618.596295][<ffffffbfa1019f70>] fib6_clean_tree+0x68/0x90 [ipv6]
> <4>[ 7618.596314][<ffffffbfa101a020>] __fib6_clean_all+0x88/0xc0 [ipv6]
> <4>[ 7618.596334][<ffffffbfa101c7f0>] fib6_run_gc+0x88/0x148 [ipv6]
> <4>[ 7618.596354][<ffffffbfa1021678>] ndisc_netdev_event+0x80/0x140 [ipv6]
> <4>[ 7618.596358][<ffffffc00023f83c>] notifier_call_chain+0x5c/0xa0
> <4>[ 7618.596361][<ffffffc00023f9e0>] raw_notifier_call_chain+0x20/0x28
> <4>[ 7618.596366][<ffffffc0005cbab4>] call_netdevice_notifiers_info+0x4c/0x80
> <4>[ 7618.596369][<ffffffc0005cbfc8>] dev_close_many+0xd0/0x138
> <4>[ 7618.596378][<ffffffbfa33be6e8>] vlan_device_event+0x4a8/0x6a0 [8021q]
> <4>[ 7618.596381][<ffffffc00023f83c>] notifier_call_chain+0x5c/0xa0
> <4>[ 7618.596384][<ffffffc00023f9e0>] raw_notifier_call_chain+0x20/0x28
> <4>[ 7618.596387][<ffffffc0005cbab4>] call_netdevice_notifiers_info+0x4c/0x80
> <4>[ 7618.596390][<ffffffc0005d5148>] __dev_notify_flags+0xb8/0xe0
> <4>[ 7618.596393][<ffffffc0005d5994>] dev_change_flags+0x54/0x68
> <4>[ 7618.596397][<ffffffc00064a620>] devinet_ioctl+0x650/0x700
> <4>[ 7618.596400][<ffffffc00064bea4>] inet_ioctl+0xa4/0xc8
> <4>[ 7618.596405][<ffffffc0005b1094>] sock_do_ioctl+0x44/0x88
> <4>[ 7618.596408][<ffffffc0005b1a3c>] sock_ioctl+0x23c/0x308
> <4>[ 7618.596413][<ffffffc000393bc4>] do_vfs_ioctl+0x48c/0x620
> 
> 

Follow the stack trace and add another cond_resched() where it is needed
then ?

Lot of this code was written decade ago where nobody expected a root
user was going to try hard to crash its host ;)

I did not check if the following is valid (Maybe __fib6_clean_all() is
called with some spinlock/rwlock held)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 1bcef2369d64e6f1325dcab50c14601e6ca5a40a..a2bb59b29dc1629aca1f7997bacb431f00c79227 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1680,17 +1680,18 @@ static void __fib6_clean_all(struct net *net,
 	struct hlist_head *head;
 	unsigned int h;
 
-	rcu_read_lock();
 	for (h = 0; h < FIB6_TABLE_HASHSZ; h++) {
 		head = &net->ipv6.fib_table_hash[h];
+		rcu_read_lock();
 		hlist_for_each_entry_rcu(table, head, tb6_hlist) {
 			write_lock_bh(&table->tb6_lock);
 			fib6_clean_tree(net, &table->tb6_root,
 					func, false, sernum, arg);
 			write_unlock_bh(&table->tb6_lock);
 		}
+		rcu_read_unlock();
+		cond_resched();
 	}
-	rcu_read_unlock();
 }
 
 void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *),

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] notifier: Fix soft lockup for notifier_call_chain().
  2016-06-28  6:22     ` Eric Dumazet
@ 2016-06-28  6:27       ` Eric Dumazet
  2016-07-01  3:06         ` Ding Tianhong
  2016-06-28 17:33       ` Cong Wang
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2016-06-28  6:27 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: luto, mingo, linux-kernel, Eric Dumazet, David S. Miller, Netdev,
	Cong Wang

On Tue, 2016-06-28 at 08:22 +0200, Eric Dumazet wrote:

> Follow the stack trace and add another cond_resched() where it is needed
> then ?
> 
> Lot of this code was written decade ago where nobody expected a root
> user was going to try hard to crash its host ;)
> 
> I did not check if the following is valid (Maybe __fib6_clean_all() is
> called with some spinlock/rwlock held)

Well, fib6_run_gc() can call it with
spin_lock_bh(&net->ipv6.fib6_gc_lock) so this wont work.

We need more invasive changes.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] notifier: Fix soft lockup for notifier_call_chain().
  2016-06-28  6:22     ` Eric Dumazet
  2016-06-28  6:27       ` Eric Dumazet
@ 2016-06-28 17:33       ` Cong Wang
  1 sibling, 0 replies; 8+ messages in thread
From: Cong Wang @ 2016-06-28 17:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ding Tianhong, luto, Ingo Molnar, LKML, Eric Dumazet,
	David S. Miller, Netdev, Cong Wang

On Mon, Jun 27, 2016 at 11:22 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Lot of this code was written decade ago where nobody expected a root
> user was going to try hard to crash its host ;)

+1

Adding cond_resched() to appropriate network notifiers sounds better.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] notifier: Fix soft lockup for notifier_call_chain().
  2016-06-28  6:27       ` Eric Dumazet
@ 2016-07-01  3:06         ` Ding Tianhong
  2016-07-01  5:12           ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Ding Tianhong @ 2016-07-01  3:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: luto, mingo, linux-kernel, Eric Dumazet, David S. Miller, Netdev,
	Cong Wang

On 2016/6/28 14:27, Eric Dumazet wrote:
> On Tue, 2016-06-28 at 08:22 +0200, Eric Dumazet wrote:
> 
>> Follow the stack trace and add another cond_resched() where it is needed
>> then ?
>>
>> Lot of this code was written decade ago where nobody expected a root
>> user was going to try hard to crash its host ;)
>>
>> I did not check if the following is valid (Maybe __fib6_clean_all() is
>> called with some spinlock/rwlock held)
> 
> Well, fib6_run_gc() can call it with
> spin_lock_bh(&net->ipv6.fib6_gc_lock) so this wont work.
> 
> We need more invasive changes.
> 
> 
> 
Hi Eric:

I debug this problem, and found that the __fib6_clean_all() would not hold the cpu more than 1 second event though there
is a lot of ipv6 address to deal with, but the notifier_chian would call the ipv6 notifier several times and hold the cpu
for a long time, so add cond_resched() in the addrconf_ifdown could solve the problem correctly, I think your first solution
is the good way to fix this bug.

Thanks
Ding

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] notifier: Fix soft lockup for notifier_call_chain().
  2016-07-01  3:06         ` Ding Tianhong
@ 2016-07-01  5:12           ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2016-07-01  5:12 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: luto, mingo, linux-kernel, Eric Dumazet, David S. Miller, Netdev,
	Cong Wang

On Fri, 2016-07-01 at 11:06 +0800, Ding Tianhong wrote:

> I debug this problem, and found that the __fib6_clean_all() would not
> hold the cpu more than 1 second event though there
> is a lot of ipv6 address to deal with, but the notifier_chian would
> call the ipv6 notifier several times and hold the cpu
> for a long time, so add cond_resched() in the addrconf_ifdown could
> solve the problem correctly, I think your first solution
> is the good way to fix this bug.

I am traveling these days, so please send an official patch once you've
tested it, thanks !

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-07-01  5:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-28  4:56 [PATCH v2] notifier: Fix soft lockup for notifier_call_chain() Ding Tianhong
2016-06-28  5:13 ` Eric Dumazet
2016-06-28  6:09   ` Ding Tianhong
2016-06-28  6:22     ` Eric Dumazet
2016-06-28  6:27       ` Eric Dumazet
2016-07-01  3:06         ` Ding Tianhong
2016-07-01  5:12           ` Eric Dumazet
2016-06-28 17:33       ` Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).