linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] rcu: fix the OOM problem of huge IP abnormal packet traffic
@ 2016-11-18 12:40 Ding Tianhong
  2016-11-18 13:01 ` Paul E. McKenney
  2016-11-21  6:52 ` [lkp] [rcu] 83ee00c6cf: WARNING:at_kernel/softirq.c:#__local_bh_enable kernel test robot
  0 siblings, 2 replies; 15+ messages in thread
From: Ding Tianhong @ 2016-11-18 12:40 UTC (permalink / raw)
  To: paulmck, josh, rostedt, mathieu.desnoyers, jiangshanlai, linux-kernel

The commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
will introduce a new problem that when huge IP abnormal packet arrived,
it may cause OOM and break the kernel, just like this:

[   79.441538] mlx4_en: eth5: Leaving promiscuous mode steering mode:2
[  100.067032] ksoftirqd/0: page allocation failure: order:0, mode:0x120
[  100.067038] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G           OE  ----V-------   3.10.0-327.28.3.28.x86_64 #1
[  100.067039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-20161018_184732-HGH1000003483 04/01/2014
[  100.067041]  0000000000000120 00000000b080d798 ffff8802afd5b968 ffffffff81638cb9
[  100.067045]  ffff8802afd5b9f8 ffffffff81171380 0000000000000010 0000000000000000
[  100.067048]  ffff8802befd8000 00000000ffffffff 0000000000000001 00000000b080d798
[  100.067050] Call Trace:
[  100.067057]  [<ffffffff81638cb9>] dump_stack+0x19/0x1b
[  100.067062]  [<ffffffff81171380>] warn_alloc_failed+0x110/0x180
[  100.067066]  [<ffffffff81175b16>] __alloc_pages_nodemask+0x9b6/0xba0
[  100.067070]  [<ffffffff8151e400>] ? skb_add_rx_frag+0x90/0xb0
[  100.067075]  [<ffffffff811b6fba>] alloc_pages_current+0xaa/0x170
[  100.067080]  [<ffffffffa06b9be0>] mlx4_alloc_pages.isra.24+0x40/0x170 [mlx4_en]
[  100.067083]  [<ffffffffa06b9dec>] mlx4_en_alloc_frags+0xdc/0x220 [mlx4_en]
[  100.067086]  [<ffffffff8152eeb8>] ? __netif_receive_skb+0x18/0x60
[  100.067088]  [<ffffffff8152ef40>] ? netif_receive_skb+0x40/0xc0
[  100.067092]  [<ffffffffa06bb521>] mlx4_en_process_rx_cq+0x5f1/0xec0 [mlx4_en]
[  100.067095]  [<ffffffff8131027d>] ? list_del+0xd/0x30
[  100.067098]  [<ffffffff8152c90f>] ? __napi_complete+0x1f/0x30
[  100.067101]  [<ffffffffa06bbeef>] mlx4_en_poll_rx_cq+0x9f/0x170 [mlx4_en]
[  100.067103]  [<ffffffff8152f372>] net_rx_action+0x152/0x240
[  100.067107]  [<ffffffff81084d1f>] __do_softirq+0xef/0x280
[  100.067109]  [<ffffffff81084ee0>] run_ksoftirqd+0x30/0x50
[  100.067114]  [<ffffffff810ae93f>] smpboot_thread_fn+0xff/0x1a0
[  100.067117]  [<ffffffff8163e269>] ? schedule+0x29/0x70
[  100.067120]  [<ffffffff810ae840>] ? lg_double_unlock+0x90/0x90
[  100.067122]  [<ffffffff810a5d4f>] kthread+0xcf/0xe0
[  100.067124]  [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
[  100.067127]  [<ffffffff81649198>] ret_from_fork+0x58/0x90
[  100.067129]  [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140

================================cut here=====================================

The reason is that the huge abnormal IP packet will be received to net stack
and be dropped finally by dst_release, and the dst_release would use the rcuos
callback-offload kthread to free the packet, but the cond_resched_rcu_qs() will
calling do_softirq() to receive more and more IP abnormal packets which will be
throw into the RCU callbacks again later, the number of received packet is much
greater than the number of packets freed, it will exhaust the memory and then OOM,
so don't try to process any pending softirqs in the rcuos callback-offload kthread
is a more effective solution.

Fix commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 kernel/rcu/tree_plugin.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 85c5a88..760c3b5 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2172,8 +2172,7 @@ static int rcu_nocb_kthread(void *arg)
 			if (__rcu_reclaim(rdp->rsp->name, list))
 				cl++;
 			c++;
-			local_bh_enable();
-			cond_resched_rcu_qs();
+			_local_bh_enable();
 			list = next;
 		}
 		trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-01-10  7:29 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-18 12:40 [PATCH] rcu: fix the OOM problem of huge IP abnormal packet traffic Ding Tianhong
2016-11-18 13:01 ` Paul E. McKenney
2016-11-19  7:50   ` Ding Tianhong
2016-11-19  8:22     ` Paul E. McKenney
2016-11-21  0:13       ` Paul E. McKenney
2016-11-21  1:28         ` Ding Tianhong
2016-12-28  5:58           ` Ding Tianhong
2016-12-29  0:13             ` Paul E. McKenney
2017-01-04  0:57               ` Paul E. McKenney
2017-01-04  7:02                 ` Ding Tianhong
2017-01-04 13:48                   ` Paul E. McKenney
2017-01-10  3:20                     ` Ding Tianhong
2017-01-10  5:51                       ` Paul E. McKenney
2017-01-10  7:28                         ` Ding Tianhong
2016-11-21  6:52 ` [lkp] [rcu] 83ee00c6cf: WARNING:at_kernel/softirq.c:#__local_bh_enable kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).