From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S967114AbdADA6G (ORCPT <rfc822;w@1wt.eu>);
        Tue, 3 Jan 2017 19:58:06 -0500
Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:46595 "EHLO
        mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S965064AbdADA5x (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 3 Jan 2017 19:57:53 -0500
Date: Tue, 3 Jan 2017 16:57:46 -0800
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Ding Tianhong <dingtianhong@huawei.com>
Cc: davem@davemloft.net, Eric Dumazet <eric.dumazet@gmail.com>,
        josh@joshtriplett.org, rostedt@goodmis.org,
        mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] rcu: fix the OOM problem of huge IP abnormal packet
 traffic
Reply-To: paulmck@linux.vnet.ibm.com
References: <635ca612-370c-b6e4-7f2a-cba702dd0c4a@huawei.com>
 <20161118130144.GO3612@linux.vnet.ibm.com>
 <809d327e-d4e2-51a5-bbfd-9ff143ee55da@huawei.com>
 <20161119082209.GC3612@linux.vnet.ibm.com>
 <20161121001347.GA27732@linux.vnet.ibm.com>
 <c809bddc-327d-779e-c393-47cc65202025@huawei.com>
 <749be737-bbba-cf4d-0d97-7657e3b1b76b@huawei.com>
 <20161229001315.GW3742@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20161229001315.GW3742@linux.vnet.ibm.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-GCONF: 00
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 17010400-8235-0000-0000-00000A4B79FB
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00006369; HX=3.00000240; KW=3.00000007;
 PH=3.00000004; SC=3.00000199; SDB=6.00803159; UDB=6.00390695; IPR=6.00581010;
 BA=6.00005028; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000;
 ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013815; XFM=3.00000011;
 UTC=2017-01-04 00:57:50
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 17010400-8236-0000-0000-0000381D69E6
Message-Id: <20170104005746.GA10429@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-01-03_22:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=2
 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam
 adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000
 definitions=main-1701040014
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, Dec 28, 2016 at 04:13:15PM -0800, Paul E. McKenney wrote:
> On Wed, Dec 28, 2016 at 01:58:06PM +0800, Ding Tianhong wrote:
> > Hi, Paul:
> > 
> > I try to debug this problem and found this solution could work well for both problem scene.
> > 
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 85c5a88..dbc14a7 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2172,7 +2172,7 @@ static int rcu_nocb_kthread(void *arg)
> >                         if (__rcu_reclaim(rdp->rsp->name, list))
> >                                 cl++;
> >                         c++;
> > -                   local_bh_enable();
> > +                 _local_bh_enable();
> >                         cond_resched_rcu_qs();
> >                         list = next;
> >                 }
> > 
> > 
> > The cond_resched_rcu_qs() would process the softirq if the softirq is pending, so no need to use
> > local_bh_enable() to process the softirq twice here, and it will avoid OOM when huge packets arrives,
> > what do you think about it? Please give me some suggestion.
> 
> From what I can see, there is absolutely no guarantee that
> cond_resched_rcu_qs() will do local_bh_enable(), and thus no guarantee
> that it will process any pending softirqs -- and that is not part of
> its job in any case.  So I cannot recommend the above patch.
> 
> On efficient handling of large invalid packets (that is still the issue,
> right?), I must defer to Dave and Eric.

On the perhaps unlikely off-chance that there is a fix for this outside
of networking, what symptoms are you seeing without this fix in place?
Still RCU CPU stall warnings?  Soft lockups?  Something else?

								Thanx, Paul

> > Thanks.
> > Ding
> > 
> > On 2016/11/21 9:28, Ding Tianhong wrote:
> > > 
> > > 
> > > On 2016/11/21 8:13, Paul E. McKenney wrote:
> > >> On Sat, Nov 19, 2016 at 12:22:09AM -0800, Paul E. McKenney wrote:
> > >>> On Sat, Nov 19, 2016 at 03:50:32PM +0800, Ding Tianhong wrote:
> > >>>>
> > >>>>
> > >>>> On 2016/11/18 21:01, Paul E. McKenney wrote:
> > >>>>> On Fri, Nov 18, 2016 at 08:40:09PM +0800, Ding Tianhong wrote:
> > >>>>>> The commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
> > >>>>>> will introduce a new problem that when huge IP abnormal packet arrived,
> > >>>>>> it may cause OOM and break the kernel, just like this:
> > >>>>>>
> > >>>>>> [   79.441538] mlx4_en: eth5: Leaving promiscuous mode steering mode:2
> > >>>>>> [  100.067032] ksoftirqd/0: page allocation failure: order:0, mode:0x120
> > >>>>>> [  100.067038] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G           OE  ----V-------   3.10.0-327.28.3.28.x86_64 #1
> > >>>>>> [  100.067039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-20161018_184732-HGH1000003483 04/01/2014
> > >>>>>> [  100.067041]  0000000000000120 00000000b080d798 ffff8802afd5b968 ffffffff81638cb9
> > >>>>>> [  100.067045]  ffff8802afd5b9f8 ffffffff81171380 0000000000000010 0000000000000000
> > >>>>>> [  100.067048]  ffff8802befd8000 00000000ffffffff 0000000000000001 00000000b080d798
> > >>>>>> [  100.067050] Call Trace:
> > >>>>>> [  100.067057]  [<ffffffff81638cb9>] dump_stack+0x19/0x1b
> > >>>>>> [  100.067062]  [<ffffffff81171380>] warn_alloc_failed+0x110/0x180
> > >>>>>> [  100.067066]  [<ffffffff81175b16>] __alloc_pages_nodemask+0x9b6/0xba0
> > >>>>>> [  100.067070]  [<ffffffff8151e400>] ? skb_add_rx_frag+0x90/0xb0
> > >>>>>> [  100.067075]  [<ffffffff811b6fba>] alloc_pages_current+0xaa/0x170
> > >>>>>> [  100.067080]  [<ffffffffa06b9be0>] mlx4_alloc_pages.isra.24+0x40/0x170 [mlx4_en]
> > >>>>>> [  100.067083]  [<ffffffffa06b9dec>] mlx4_en_alloc_frags+0xdc/0x220 [mlx4_en]
> > >>>>>> [  100.067086]  [<ffffffff8152eeb8>] ? __netif_receive_skb+0x18/0x60
> > >>>>>> [  100.067088]  [<ffffffff8152ef40>] ? netif_receive_skb+0x40/0xc0
> > >>>>>> [  100.067092]  [<ffffffffa06bb521>] mlx4_en_process_rx_cq+0x5f1/0xec0 [mlx4_en]
> > >>>>>> [  100.067095]  [<ffffffff8131027d>] ? list_del+0xd/0x30
> > >>>>>> [  100.067098]  [<ffffffff8152c90f>] ? __napi_complete+0x1f/0x30
> > >>>>>> [  100.067101]  [<ffffffffa06bbeef>] mlx4_en_poll_rx_cq+0x9f/0x170 [mlx4_en]
> > >>>>>> [  100.067103]  [<ffffffff8152f372>] net_rx_action+0x152/0x240
> > >>>>>> [  100.067107]  [<ffffffff81084d1f>] __do_softirq+0xef/0x280
> > >>>>>> [  100.067109]  [<ffffffff81084ee0>] run_ksoftirqd+0x30/0x50
> > >>>>>> [  100.067114]  [<ffffffff810ae93f>] smpboot_thread_fn+0xff/0x1a0
> > >>>>>> [  100.067117]  [<ffffffff8163e269>] ? schedule+0x29/0x70
> > >>>>>> [  100.067120]  [<ffffffff810ae840>] ? lg_double_unlock+0x90/0x90
> > >>>>>> [  100.067122]  [<ffffffff810a5d4f>] kthread+0xcf/0xe0
> > >>>>>> [  100.067124]  [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
> > >>>>>> [  100.067127]  [<ffffffff81649198>] ret_from_fork+0x58/0x90
> > >>>>>> [  100.067129]  [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
> > >>>>>>
> > >>>>>> ================================cut here=====================================
> > >>>>>>
> > >>>>>> The reason is that the huge abnormal IP packet will be received to net stack
> > >>>>>> and be dropped finally by dst_release, and the dst_release would use the rcuos
> > >>>>>> callback-offload kthread to free the packet, but the cond_resched_rcu_qs() will
> > >>>>>> calling do_softirq() to receive more and more IP abnormal packets which will be
> > >>>>>> throw into the RCU callbacks again later, the number of received packet is much
> > >>>>>> greater than the number of packets freed, it will exhaust the memory and then OOM,
> > >>>>>> so don't try to process any pending softirqs in the rcuos callback-offload kthread
> > >>>>>> is a more effective solution.
> > >>>>>
> > >>>>> OK, but we could still have softirqs processed by the grace-period kthread
> > >>>>> as a result of any number of other events.  So this change might reduce
> > >>>>> the probability of this problem, but it doesn't eliminate it.
> > >>>>>
> > >>>>> How huge are these huge IP packets?  Is the underlying problem that they
> > >>>>> are too large to use the memory-allocator fastpaths?
> > >>>>>
> > >>>>> 							Thanx, Paul
> > >>>>>
> > >>>>
> > >>>> I use the 40G mellanox NiC to receive packet, and the testgine could send Mac abnormal packet and
> > >>>> IP abnormal packet to full speed.
> > >>>>
> > >>>> The Mac abnormal packet would be dropped at low level and not be received to net stack,
> > >>>> but the IP abnormal packet will introduce this problem, every packet will looks as new dst first and
> > >>>> release later by dst_release because it is meaningless.
> > >>>>
> > >>>> dst_release->call_rcu(&dst->rcu_head, dst_destroy_rcu);
> > >>>>
> > >>>> so all packet will be freed until the rcuos callback-offload kthread processing, it will be a infinite loop
> > >>>> if huge packet is coming because the do_softirq will load more and more packet to the rcuos processing kthread,
> > >>>> so I still could not find a better way to fix this, btw, it is really hard to say the driver use too large memory-allocater
> > >>>> fastpaths, there is no memory leak and the Ixgbe may meet the same problem too.
> > >>
> > >> And following up on my fastpath point -- from what I can see, one
> > >> big effect of the large invalid packets is that they push processing
> > >> off of a number of fastpaths.  If these packets could be rejected with
> > >> less per-packet processing, I bet that things would work much better.
> > >>
> > >> 						Thanx, Paul
> > > 
> > > Yes, and I found the WARN_ON_ONCE(!irqs_disabled()) will be triggered if use _local_bh_enable here,
> > > so I think we could ask some help from Eric and David how to reject the huge number packets.
> > > 
> > > Thanks
> > > Ding
> > > 
> > >>
> > >>> The overall effect of these two patches is to move from enabling bh
> > >>> (and processing recent softirqs) to enabling bh without processing
> > >>> recent softirqs.  Is this really the correct way to solve this problem?
> > >>> What about this solution is avoiding re-introducing the original
> > >>> softlockups?  Have you talked to the networking guys about this issue?
> > >>>
> > >>> 							Thanx, Paul
> > >>>
> > >>>> Thanks.
> > >>>> Ding
> > >>>>
> > >>>>
> > >>>>>> Fix commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
> > >>>>>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> > >>>>>>
> > >>>>>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> > >>>>>> ---
> > >>>>>>  kernel/rcu/tree_plugin.h | 3 +--
> > >>>>>>  1 file changed, 1 insertion(+), 2 deletions(-)
> > >>>>>>
> > >>>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > >>>>>> index 85c5a88..760c3b5 100644
> > >>>>>> --- a/kernel/rcu/tree_plugin.h
> > >>>>>> +++ b/kernel/rcu/tree_plugin.h
> > >>>>>> @@ -2172,8 +2172,7 @@ static int rcu_nocb_kthread(void *arg)
> > >>>>>>  			if (__rcu_reclaim(rdp->rsp->name, list))
> > >>>>>>  				cl++;
> > >>>>>>  			c++;
> > >>>>>> -			local_bh_enable();
> > >>>>>> -			cond_resched_rcu_qs();
> > >>>>>> +			_local_bh_enable();
> > >>>>>>  			list = next;
> > >>>>>>  		}
> > >>>>>>  		trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
> > >>>>>> -- 
> > >>>>>> 1.9.0
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>> .
> > >>>>>
> > >>>>
> > >>
> > >>
> > >> .
> > >>
> >