All of lore.kernel.org
 help / color / mirror / Atom feed
* Patch "rcu: Fix soft lockup for rcu_nocb_kthread" has been added to the 4.8-stable tree
@ 2016-12-03  8:53 gregkh
  2016-12-05  1:30 ` Ding Tianhong
  0 siblings, 1 reply; 5+ messages in thread
From: gregkh @ 2016-12-03  8:53 UTC (permalink / raw)
  To: dingtianhong, dhaval.giani, gregkh, paulmck; +Cc: stable, stable-commits


This is a note to let you know that I've just added the patch titled

    rcu: Fix soft lockup for rcu_nocb_kthread

to the 4.8-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     rcu-fix-soft-lockup-for-rcu_nocb_kthread.patch
and it can be found in the queue-4.8 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


>From bedc1969150d480c462cdac320fa944b694a7162 Mon Sep 17 00:00:00 2001
From: Ding Tianhong <dingtianhong@huawei.com>
Date: Wed, 15 Jun 2016 15:27:36 +0800
Subject: rcu: Fix soft lockup for rcu_nocb_kthread

From: Ding Tianhong <dingtianhong@huawei.com>

commit bedc1969150d480c462cdac320fa944b694a7162 upstream.

Carrying out the following steps results in a softlockup in the
RCU callback-offload (rcuo) kthreads:

1. Connect to ixgbevf, and set the speed to 10Gb/s.
2. Use ifconfig to bring the nic up and down repeatedly.

[  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
[  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
[  368.106005] RIP: 0010:[<ffffffff81579e04>]  [<ffffffff81579e04>] fib_table_lookup+0x14/0x390
[  368.106005] RSP: 0018:ffff88061fc83ce8  EFLAGS: 00000286
[  368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
[  368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
[  368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
[  368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
[  368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
[  368.106005] FS:  0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
[  368.106005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
[  368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  368.106005] Stack:
[  368.106005]  00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
[  368.106005]  ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
[  368.106005]  ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
[  368.106005] Call Trace:
[  368.106005]  <IRQ>
[  368.106005]
[  368.106005]  [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0
[  368.106005]  [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110
[  368.106005]  [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0
[  368.106005]  [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350
[  368.106005]  [<ffffffff81537034>] ip_rcv+0x234/0x380
[  368.106005]  [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870
[  368.106005]  [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60
[  368.106005]  [<ffffffff814fe4de>] process_backlog+0xae/0x180
[  368.106005]  [<ffffffff814fdcb2>] net_rx_action+0x152/0x240
[  368.106005]  [<ffffffff81077b3f>] __do_softirq+0xef/0x280
[  368.106005]  [<ffffffff8161619c>] call_softirq+0x1c/0x30
[  368.106005]  <EOI>
[  368.106005]
[  368.106005]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
[  368.106005]  [<ffffffff81077174>] local_bh_enable+0x94/0xa0
[  368.106005]  [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370
[  368.106005]  [<ffffffff81098250>] ? wake_up_bit+0x30/0x30
[  368.106005]  [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40
[  368.106005]  [<ffffffff8109728f>] kthread+0xcf/0xe0
[  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
[  368.106005]  [<ffffffff816147d8>] ret_from_fork+0x58/0x90
[  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140

==================================cut here==============================

It turns out that the rcuos callback-offload kthread is busy processing
a very large quantity of RCU callbacks, and it is not reliquishing the
CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
within the loop to allow other tasks to run.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
[ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/rcu/tree_plugin.h |    1 +
 1 file changed, 1 insertion(+)

--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2173,6 +2173,7 @@ static int rcu_nocb_kthread(void *arg)
 				cl++;
 			c++;
 			local_bh_enable();
+			cond_resched_rcu_qs();
 			list = next;
 		}
 		trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);


Patches currently in stable-queue which might be from dingtianhong@huawei.com are

queue-4.8/rcu-fix-soft-lockup-for-rcu_nocb_kthread.patch

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Patch "rcu: Fix soft lockup for rcu_nocb_kthread" has been added to the 4.8-stable tree
  2016-12-03  8:53 Patch "rcu: Fix soft lockup for rcu_nocb_kthread" has been added to the 4.8-stable tree gregkh
@ 2016-12-05  1:30 ` Ding Tianhong
  2016-12-05  2:06   ` Paul E. McKenney
  0 siblings, 1 reply; 5+ messages in thread
From: Ding Tianhong @ 2016-12-05  1:30 UTC (permalink / raw)
  To: gregkh, dhaval.giani, paulmck; +Cc: stable, stable-commits

Hi Greg:

Please don't add this patch to the stable tree, I still discus with Paul about this problem and maybe original solution is not
the best one, I need more time to check, thanks.

Ding

On 2016/12/3 16:53, gregkh@linuxfoundation.org wrote:
> 
> This is a note to let you know that I've just added the patch titled
> 
>     rcu: Fix soft lockup for rcu_nocb_kthread
> 
> to the 4.8-stable tree which can be found at:
>     http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
> 
> The filename of the patch is:
>      rcu-fix-soft-lockup-for-rcu_nocb_kthread.patch
> and it can be found in the queue-4.8 subdirectory.
> 
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable@vger.kernel.org> know about it.
> 
> 
>>>From bedc1969150d480c462cdac320fa944b694a7162 Mon Sep 17 00:00:00 2001
> From: Ding Tianhong <dingtianhong@huawei.com>
> Date: Wed, 15 Jun 2016 15:27:36 +0800
> Subject: rcu: Fix soft lockup for rcu_nocb_kthread
> 
> From: Ding Tianhong <dingtianhong@huawei.com>
> 
> commit bedc1969150d480c462cdac320fa944b694a7162 upstream.
> 
> Carrying out the following steps results in a softlockup in the
> RCU callback-offload (rcuo) kthreads:
> 
> 1. Connect to ixgbevf, and set the speed to 10Gb/s.
> 2. Use ifconfig to bring the nic up and down repeatedly.
> 
> [  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
> [  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
> [  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [  368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
> [  368.106005] RIP: 0010:[<ffffffff81579e04>]  [<ffffffff81579e04>] fib_table_lookup+0x14/0x390
> [  368.106005] RSP: 0018:ffff88061fc83ce8  EFLAGS: 00000286
> [  368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
> [  368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
> [  368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
> [  368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
> [  368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
> [  368.106005] FS:  0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
> [  368.106005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
> [  368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  368.106005] Stack:
> [  368.106005]  00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
> [  368.106005]  ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
> [  368.106005]  ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
> [  368.106005] Call Trace:
> [  368.106005]  <IRQ>
> [  368.106005]
> [  368.106005]  [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0
> [  368.106005]  [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110
> [  368.106005]  [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0
> [  368.106005]  [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350
> [  368.106005]  [<ffffffff81537034>] ip_rcv+0x234/0x380
> [  368.106005]  [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870
> [  368.106005]  [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60
> [  368.106005]  [<ffffffff814fe4de>] process_backlog+0xae/0x180
> [  368.106005]  [<ffffffff814fdcb2>] net_rx_action+0x152/0x240
> [  368.106005]  [<ffffffff81077b3f>] __do_softirq+0xef/0x280
> [  368.106005]  [<ffffffff8161619c>] call_softirq+0x1c/0x30
> [  368.106005]  <EOI>
> [  368.106005]
> [  368.106005]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
> [  368.106005]  [<ffffffff81077174>] local_bh_enable+0x94/0xa0
> [  368.106005]  [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370
> [  368.106005]  [<ffffffff81098250>] ? wake_up_bit+0x30/0x30
> [  368.106005]  [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40
> [  368.106005]  [<ffffffff8109728f>] kthread+0xcf/0xe0
> [  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
> [  368.106005]  [<ffffffff816147d8>] ret_from_fork+0x58/0x90
> [  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
> 
> ==================================cut here==============================
> 
> It turns out that the rcuos callback-offload kthread is busy processing
> a very large quantity of RCU callbacks, and it is not reliquishing the
> CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
> within the loop to allow other tasks to run.
> 
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Dhaval Giani <dhaval.giani@oracle.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> ---
>  kernel/rcu/tree_plugin.h |    1 +
>  1 file changed, 1 insertion(+)
> 
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2173,6 +2173,7 @@ static int rcu_nocb_kthread(void *arg)
>  				cl++;
>  			c++;
>  			local_bh_enable();
> +			cond_resched_rcu_qs();
>  			list = next;
>  		}
>  		trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
> 
> 
> Patches currently in stable-queue which might be from dingtianhong@huawei.com are
> 
> queue-4.8/rcu-fix-soft-lockup-for-rcu_nocb_kthread.patch
> 
> .
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Patch "rcu: Fix soft lockup for rcu_nocb_kthread" has been added to the 4.8-stable tree
  2016-12-05  1:30 ` Ding Tianhong
@ 2016-12-05  2:06   ` Paul E. McKenney
  2016-12-05  2:37     ` Ding Tianhong
  2016-12-05 12:56     ` Greg KH
  0 siblings, 2 replies; 5+ messages in thread
From: Paul E. McKenney @ 2016-12-05  2:06 UTC (permalink / raw)
  To: Ding Tianhong; +Cc: gregkh, dhaval.giani, stable, stable-commits

Sorry, Ding, but this patch has been shown to fix other people's problems
as well as a few of yours.  The fact that it doesn't solve -all- your
problems is absolutely no reason for you to stand in the way of its
solving these other people's problems.

Greg, please do add this patch to your -stable trees.

							Thanx, Paul

On Mon, Dec 05, 2016 at 09:30:22AM +0800, Ding Tianhong wrote:
> Hi Greg:
> 
> Please don't add this patch to the stable tree, I still discus with Paul about this problem and maybe original solution is not
> the best one, I need more time to check, thanks.
> 
> Ding
> 
> On 2016/12/3 16:53, gregkh@linuxfoundation.org wrote:
> > 
> > This is a note to let you know that I've just added the patch titled
> > 
> >     rcu: Fix soft lockup for rcu_nocb_kthread
> > 
> > to the 4.8-stable tree which can be found at:
> >     http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
> > 
> > The filename of the patch is:
> >      rcu-fix-soft-lockup-for-rcu_nocb_kthread.patch
> > and it can be found in the queue-4.8 subdirectory.
> > 
> > If you, or anyone else, feels it should not be added to the stable tree,
> > please let <stable@vger.kernel.org> know about it.
> > 
> > 
> >>From bedc1969150d480c462cdac320fa944b694a7162 Mon Sep 17 00:00:00 2001
> > From: Ding Tianhong <dingtianhong@huawei.com>
> > Date: Wed, 15 Jun 2016 15:27:36 +0800
> > Subject: rcu: Fix soft lockup for rcu_nocb_kthread
> > 
> > From: Ding Tianhong <dingtianhong@huawei.com>
> > 
> > commit bedc1969150d480c462cdac320fa944b694a7162 upstream.
> > 
> > Carrying out the following steps results in a softlockup in the
> > RCU callback-offload (rcuo) kthreads:
> > 
> > 1. Connect to ixgbevf, and set the speed to 10Gb/s.
> > 2. Use ifconfig to bring the nic up and down repeatedly.
> > 
> > [  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
> > [  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
> > [  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [  368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
> > [  368.106005] RIP: 0010:[<ffffffff81579e04>]  [<ffffffff81579e04>] fib_table_lookup+0x14/0x390
> > [  368.106005] RSP: 0018:ffff88061fc83ce8  EFLAGS: 00000286
> > [  368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
> > [  368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
> > [  368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
> > [  368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
> > [  368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
> > [  368.106005] FS:  0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
> > [  368.106005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
> > [  368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [  368.106005] Stack:
> > [  368.106005]  00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
> > [  368.106005]  ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
> > [  368.106005]  ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
> > [  368.106005] Call Trace:
> > [  368.106005]  <IRQ>
> > [  368.106005]
> > [  368.106005]  [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0
> > [  368.106005]  [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110
> > [  368.106005]  [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0
> > [  368.106005]  [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350
> > [  368.106005]  [<ffffffff81537034>] ip_rcv+0x234/0x380
> > [  368.106005]  [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870
> > [  368.106005]  [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60
> > [  368.106005]  [<ffffffff814fe4de>] process_backlog+0xae/0x180
> > [  368.106005]  [<ffffffff814fdcb2>] net_rx_action+0x152/0x240
> > [  368.106005]  [<ffffffff81077b3f>] __do_softirq+0xef/0x280
> > [  368.106005]  [<ffffffff8161619c>] call_softirq+0x1c/0x30
> > [  368.106005]  <EOI>
> > [  368.106005]
> > [  368.106005]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
> > [  368.106005]  [<ffffffff81077174>] local_bh_enable+0x94/0xa0
> > [  368.106005]  [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370
> > [  368.106005]  [<ffffffff81098250>] ? wake_up_bit+0x30/0x30
> > [  368.106005]  [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40
> > [  368.106005]  [<ffffffff8109728f>] kthread+0xcf/0xe0
> > [  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
> > [  368.106005]  [<ffffffff816147d8>] ret_from_fork+0x58/0x90
> > [  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
> > 
> > ==================================cut here==============================
> > 
> > It turns out that the rcuos callback-offload kthread is busy processing
> > a very large quantity of RCU callbacks, and it is not reliquishing the
> > CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
> > within the loop to allow other tasks to run.
> > 
> > Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> > [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Dhaval Giani <dhaval.giani@oracle.com>
> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > 
> > ---
> >  kernel/rcu/tree_plugin.h |    1 +
> >  1 file changed, 1 insertion(+)
> > 
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2173,6 +2173,7 @@ static int rcu_nocb_kthread(void *arg)
> >  				cl++;
> >  			c++;
> >  			local_bh_enable();
> > +			cond_resched_rcu_qs();
> >  			list = next;
> >  		}
> >  		trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
> > 
> > 
> > Patches currently in stable-queue which might be from dingtianhong@huawei.com are
> > 
> > queue-4.8/rcu-fix-soft-lockup-for-rcu_nocb_kthread.patch
> > 
> > .
> > 
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Patch "rcu: Fix soft lockup for rcu_nocb_kthread" has been added to the 4.8-stable tree
  2016-12-05  2:06   ` Paul E. McKenney
@ 2016-12-05  2:37     ` Ding Tianhong
  2016-12-05 12:56     ` Greg KH
  1 sibling, 0 replies; 5+ messages in thread
From: Ding Tianhong @ 2016-12-05  2:37 UTC (permalink / raw)
  To: paulmck; +Cc: gregkh, dhaval.giani, stable, stable-commits



On 2016/12/5 10:06, Paul E. McKenney wrote:
> Sorry, Ding, but this patch has been shown to fix other people's problems
> as well as a few of yours.  The fact that it doesn't solve -all- your
> problems is absolutely no reason for you to stand in the way of its
> solving these other people's problems.
> 
> Greg, please do add this patch to your -stable trees.
> 
> 							Thanx, Paul
> 

OK.

Thanks
Ding

> On Mon, Dec 05, 2016 at 09:30:22AM +0800, Ding Tianhong wrote:
>> Hi Greg:
>>
>> Please don't add this patch to the stable tree, I still discus with Paul about this problem and maybe original solution is not
>> the best one, I need more time to check, thanks.
>>
>> Ding
>>
>> On 2016/12/3 16:53, gregkh@linuxfoundation.org wrote:
>>>
>>> This is a note to let you know that I've just added the patch titled
>>>
>>>     rcu: Fix soft lockup for rcu_nocb_kthread
>>>
>>> to the 4.8-stable tree which can be found at:
>>>     http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
>>>
>>> The filename of the patch is:
>>>      rcu-fix-soft-lockup-for-rcu_nocb_kthread.patch
>>> and it can be found in the queue-4.8 subdirectory.
>>>
>>> If you, or anyone else, feels it should not be added to the stable tree,
>>> please let <stable@vger.kernel.org> know about it.
>>>
>>>
>>> >From bedc1969150d480c462cdac320fa944b694a7162 Mon Sep 17 00:00:00 2001
>>> From: Ding Tianhong <dingtianhong@huawei.com>
>>> Date: Wed, 15 Jun 2016 15:27:36 +0800
>>> Subject: rcu: Fix soft lockup for rcu_nocb_kthread
>>>
>>> From: Ding Tianhong <dingtianhong@huawei.com>
>>>
>>> commit bedc1969150d480c462cdac320fa944b694a7162 upstream.
>>>
>>> Carrying out the following steps results in a softlockup in the
>>> RCU callback-offload (rcuo) kthreads:
>>>
>>> 1. Connect to ixgbevf, and set the speed to 10Gb/s.
>>> 2. Use ifconfig to bring the nic up and down repeatedly.
>>>
>>> [  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
>>> [  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
>>> [  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>>> [  368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
>>> [  368.106005] RIP: 0010:[<ffffffff81579e04>]  [<ffffffff81579e04>] fib_table_lookup+0x14/0x390
>>> [  368.106005] RSP: 0018:ffff88061fc83ce8  EFLAGS: 00000286
>>> [  368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
>>> [  368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
>>> [  368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
>>> [  368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
>>> [  368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
>>> [  368.106005] FS:  0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
>>> [  368.106005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
>>> [  368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [  368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> [  368.106005] Stack:
>>> [  368.106005]  00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
>>> [  368.106005]  ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
>>> [  368.106005]  ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
>>> [  368.106005] Call Trace:
>>> [  368.106005]  <IRQ>
>>> [  368.106005]
>>> [  368.106005]  [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0
>>> [  368.106005]  [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110
>>> [  368.106005]  [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0
>>> [  368.106005]  [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350
>>> [  368.106005]  [<ffffffff81537034>] ip_rcv+0x234/0x380
>>> [  368.106005]  [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870
>>> [  368.106005]  [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60
>>> [  368.106005]  [<ffffffff814fe4de>] process_backlog+0xae/0x180
>>> [  368.106005]  [<ffffffff814fdcb2>] net_rx_action+0x152/0x240
>>> [  368.106005]  [<ffffffff81077b3f>] __do_softirq+0xef/0x280
>>> [  368.106005]  [<ffffffff8161619c>] call_softirq+0x1c/0x30
>>> [  368.106005]  <EOI>
>>> [  368.106005]
>>> [  368.106005]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
>>> [  368.106005]  [<ffffffff81077174>] local_bh_enable+0x94/0xa0
>>> [  368.106005]  [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370
>>> [  368.106005]  [<ffffffff81098250>] ? wake_up_bit+0x30/0x30
>>> [  368.106005]  [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40
>>> [  368.106005]  [<ffffffff8109728f>] kthread+0xcf/0xe0
>>> [  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
>>> [  368.106005]  [<ffffffff816147d8>] ret_from_fork+0x58/0x90
>>> [  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
>>>
>>> ==================================cut here==============================
>>>
>>> It turns out that the rcuos callback-offload kthread is busy processing
>>> a very large quantity of RCU callbacks, and it is not reliquishing the
>>> CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
>>> within the loop to allow other tasks to run.
>>>
>>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>>> [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
>>> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>>> Cc: Dhaval Giani <dhaval.giani@oracle.com>
>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>
>>> ---
>>>  kernel/rcu/tree_plugin.h |    1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> --- a/kernel/rcu/tree_plugin.h
>>> +++ b/kernel/rcu/tree_plugin.h
>>> @@ -2173,6 +2173,7 @@ static int rcu_nocb_kthread(void *arg)
>>>  				cl++;
>>>  			c++;
>>>  			local_bh_enable();
>>> +			cond_resched_rcu_qs();
>>>  			list = next;
>>>  		}
>>>  		trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
>>>
>>>
>>> Patches currently in stable-queue which might be from dingtianhong@huawei.com are
>>>
>>> queue-4.8/rcu-fix-soft-lockup-for-rcu_nocb_kthread.patch
>>>
>>> .
>>>
>>
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Patch "rcu: Fix soft lockup for rcu_nocb_kthread" has been added to the 4.8-stable tree
  2016-12-05  2:06   ` Paul E. McKenney
  2016-12-05  2:37     ` Ding Tianhong
@ 2016-12-05 12:56     ` Greg KH
  1 sibling, 0 replies; 5+ messages in thread
From: Greg KH @ 2016-12-05 12:56 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Ding Tianhong, dhaval.giani, stable, stable-commits

On Sun, Dec 04, 2016 at 06:06:47PM -0800, Paul E. McKenney wrote:
> Sorry, Ding, but this patch has been shown to fix other people's problems
> as well as a few of yours.  The fact that it doesn't solve -all- your
> problems is absolutely no reason for you to stand in the way of its
> solving these other people's problems.
> 
> Greg, please do add this patch to your -stable trees.

Ok, I've left it alone.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-12-05 13:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-03  8:53 Patch "rcu: Fix soft lockup for rcu_nocb_kthread" has been added to the 4.8-stable tree gregkh
2016-12-05  1:30 ` Ding Tianhong
2016-12-05  2:06   ` Paul E. McKenney
2016-12-05  2:37     ` Ding Tianhong
2016-12-05 12:56     ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.