From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752123AbcL2AOJ (ORCPT ); Wed, 28 Dec 2016 19:14:09 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:48173 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752082AbcL2AOH (ORCPT ); Wed, 28 Dec 2016 19:14:07 -0500 Date: Wed, 28 Dec 2016 16:13:15 -0800 From: "Paul E. McKenney" To: Ding Tianhong Cc: davem@davemloft.net, Eric Dumazet , josh@joshtriplett.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, "linux-kernel@vger.kernel.org" Subject: Re: [PATCH] rcu: fix the OOM problem of huge IP abnormal packet traffic Reply-To: paulmck@linux.vnet.ibm.com References: <635ca612-370c-b6e4-7f2a-cba702dd0c4a@huawei.com> <20161118130144.GO3612@linux.vnet.ibm.com> <809d327e-d4e2-51a5-bbfd-9ff143ee55da@huawei.com> <20161119082209.GC3612@linux.vnet.ibm.com> <20161121001347.GA27732@linux.vnet.ibm.com> <749be737-bbba-cf4d-0d97-7657e3b1b76b@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <749be737-bbba-cf4d-0d97-7657e3b1b76b@huawei.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16122900-0028-0000-0000-000006579686 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006334; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000199; SDB=6.00800455; UDB=6.00388987; IPR=6.00578269; BA=6.00005015; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013749; XFM=3.00000011; UTC=2016-12-29 00:13:12 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16122900-0029-0000-0000-000031FA6C81 Message-Id: <20161229001315.GW3742@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-12-28_20:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1612290001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 28, 2016 at 01:58:06PM +0800, Ding Tianhong wrote: > Hi, Paul: > > I try to debug this problem and found this solution could work well for both problem scene. > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > index 85c5a88..dbc14a7 100644 > --- a/kernel/rcu/tree_plugin.h > +++ b/kernel/rcu/tree_plugin.h > @@ -2172,7 +2172,7 @@ static int rcu_nocb_kthread(void *arg) > if (__rcu_reclaim(rdp->rsp->name, list)) > cl++; > c++; > - local_bh_enable(); > + _local_bh_enable(); > cond_resched_rcu_qs(); > list = next; > } > > > The cond_resched_rcu_qs() would process the softirq if the softirq is pending, so no need to use > local_bh_enable() to process the softirq twice here, and it will avoid OOM when huge packets arrives, > what do you think about it? Please give me some suggestion. >>From what I can see, there is absolutely no guarantee that cond_resched_rcu_qs() will do local_bh_enable(), and thus no guarantee that it will process any pending softirqs -- and that is not part of its job in any case. So I cannot recommend the above patch. On efficient handling of large invalid packets (that is still the issue, right?), I must defer to Dave and Eric. Thanx, Paul > Thanks. > Ding > > On 2016/11/21 9:28, Ding Tianhong wrote: > > > > > > On 2016/11/21 8:13, Paul E. McKenney wrote: > >> On Sat, Nov 19, 2016 at 12:22:09AM -0800, Paul E. McKenney wrote: > >>> On Sat, Nov 19, 2016 at 03:50:32PM +0800, Ding Tianhong wrote: > >>>> > >>>> > >>>> On 2016/11/18 21:01, Paul E. McKenney wrote: > >>>>> On Fri, Nov 18, 2016 at 08:40:09PM +0800, Ding Tianhong wrote: > >>>>>> The commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread") > >>>>>> will introduce a new problem that when huge IP abnormal packet arrived, > >>>>>> it may cause OOM and break the kernel, just like this: > >>>>>> > >>>>>> [ 79.441538] mlx4_en: eth5: Leaving promiscuous mode steering mode:2 > >>>>>> [ 100.067032] ksoftirqd/0: page allocation failure: order:0, mode:0x120 > >>>>>> [ 100.067038] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G OE ----V------- 3.10.0-327.28.3.28.x86_64 #1 > >>>>>> [ 100.067039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-20161018_184732-HGH1000003483 04/01/2014 > >>>>>> [ 100.067041] 0000000000000120 00000000b080d798 ffff8802afd5b968 ffffffff81638cb9 > >>>>>> [ 100.067045] ffff8802afd5b9f8 ffffffff81171380 0000000000000010 0000000000000000 > >>>>>> [ 100.067048] ffff8802befd8000 00000000ffffffff 0000000000000001 00000000b080d798 > >>>>>> [ 100.067050] Call Trace: > >>>>>> [ 100.067057] [] dump_stack+0x19/0x1b > >>>>>> [ 100.067062] [] warn_alloc_failed+0x110/0x180 > >>>>>> [ 100.067066] [] __alloc_pages_nodemask+0x9b6/0xba0 > >>>>>> [ 100.067070] [] ? skb_add_rx_frag+0x90/0xb0 > >>>>>> [ 100.067075] [] alloc_pages_current+0xaa/0x170 > >>>>>> [ 100.067080] [] mlx4_alloc_pages.isra.24+0x40/0x170 [mlx4_en] > >>>>>> [ 100.067083] [] mlx4_en_alloc_frags+0xdc/0x220 [mlx4_en] > >>>>>> [ 100.067086] [] ? __netif_receive_skb+0x18/0x60 > >>>>>> [ 100.067088] [] ? netif_receive_skb+0x40/0xc0 > >>>>>> [ 100.067092] [] mlx4_en_process_rx_cq+0x5f1/0xec0 [mlx4_en] > >>>>>> [ 100.067095] [] ? list_del+0xd/0x30 > >>>>>> [ 100.067098] [] ? __napi_complete+0x1f/0x30 > >>>>>> [ 100.067101] [] mlx4_en_poll_rx_cq+0x9f/0x170 [mlx4_en] > >>>>>> [ 100.067103] [] net_rx_action+0x152/0x240 > >>>>>> [ 100.067107] [] __do_softirq+0xef/0x280 > >>>>>> [ 100.067109] [] run_ksoftirqd+0x30/0x50 > >>>>>> [ 100.067114] [] smpboot_thread_fn+0xff/0x1a0 > >>>>>> [ 100.067117] [] ? schedule+0x29/0x70 > >>>>>> [ 100.067120] [] ? lg_double_unlock+0x90/0x90 > >>>>>> [ 100.067122] [] kthread+0xcf/0xe0 > >>>>>> [ 100.067124] [] ? kthread_create_on_node+0x140/0x140 > >>>>>> [ 100.067127] [] ret_from_fork+0x58/0x90 > >>>>>> [ 100.067129] [] ? kthread_create_on_node+0x140/0x140 > >>>>>> > >>>>>> ================================cut here===================================== > >>>>>> > >>>>>> The reason is that the huge abnormal IP packet will be received to net stack > >>>>>> and be dropped finally by dst_release, and the dst_release would use the rcuos > >>>>>> callback-offload kthread to free the packet, but the cond_resched_rcu_qs() will > >>>>>> calling do_softirq() to receive more and more IP abnormal packets which will be > >>>>>> throw into the RCU callbacks again later, the number of received packet is much > >>>>>> greater than the number of packets freed, it will exhaust the memory and then OOM, > >>>>>> so don't try to process any pending softirqs in the rcuos callback-offload kthread > >>>>>> is a more effective solution. > >>>>> > >>>>> OK, but we could still have softirqs processed by the grace-period kthread > >>>>> as a result of any number of other events. So this change might reduce > >>>>> the probability of this problem, but it doesn't eliminate it. > >>>>> > >>>>> How huge are these huge IP packets? Is the underlying problem that they > >>>>> are too large to use the memory-allocator fastpaths? > >>>>> > >>>>> Thanx, Paul > >>>>> > >>>> > >>>> I use the 40G mellanox NiC to receive packet, and the testgine could send Mac abnormal packet and > >>>> IP abnormal packet to full speed. > >>>> > >>>> The Mac abnormal packet would be dropped at low level and not be received to net stack, > >>>> but the IP abnormal packet will introduce this problem, every packet will looks as new dst first and > >>>> release later by dst_release because it is meaningless. > >>>> > >>>> dst_release->call_rcu(&dst->rcu_head, dst_destroy_rcu); > >>>> > >>>> so all packet will be freed until the rcuos callback-offload kthread processing, it will be a infinite loop > >>>> if huge packet is coming because the do_softirq will load more and more packet to the rcuos processing kthread, > >>>> so I still could not find a better way to fix this, btw, it is really hard to say the driver use too large memory-allocater > >>>> fastpaths, there is no memory leak and the Ixgbe may meet the same problem too. > >> > >> And following up on my fastpath point -- from what I can see, one > >> big effect of the large invalid packets is that they push processing > >> off of a number of fastpaths. If these packets could be rejected with > >> less per-packet processing, I bet that things would work much better. > >> > >> Thanx, Paul > > > > Yes, and I found the WARN_ON_ONCE(!irqs_disabled()) will be triggered if use _local_bh_enable here, > > so I think we could ask some help from Eric and David how to reject the huge number packets. > > > > Thanks > > Ding > > > >> > >>> The overall effect of these two patches is to move from enabling bh > >>> (and processing recent softirqs) to enabling bh without processing > >>> recent softirqs. Is this really the correct way to solve this problem? > >>> What about this solution is avoiding re-introducing the original > >>> softlockups? Have you talked to the networking guys about this issue? > >>> > >>> Thanx, Paul > >>> > >>>> Thanks. > >>>> Ding > >>>> > >>>> > >>>>>> Fix commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread") > >>>>>> Signed-off-by: Ding Tianhong > >>>>>> > >>>>>> Signed-off-by: Ding Tianhong > >>>>>> --- > >>>>>> kernel/rcu/tree_plugin.h | 3 +-- > >>>>>> 1 file changed, 1 insertion(+), 2 deletions(-) > >>>>>> > >>>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > >>>>>> index 85c5a88..760c3b5 100644 > >>>>>> --- a/kernel/rcu/tree_plugin.h > >>>>>> +++ b/kernel/rcu/tree_plugin.h > >>>>>> @@ -2172,8 +2172,7 @@ static int rcu_nocb_kthread(void *arg) > >>>>>> if (__rcu_reclaim(rdp->rsp->name, list)) > >>>>>> cl++; > >>>>>> c++; > >>>>>> - local_bh_enable(); > >>>>>> - cond_resched_rcu_qs(); > >>>>>> + _local_bh_enable(); > >>>>>> list = next; > >>>>>> } > >>>>>> trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1); > >>>>>> -- > >>>>>> 1.9.0 > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> . > >>>>> > >>>> > >> > >> > >> . > >> >