From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Ding Tianhong <dingtianhong@huawei.com>
Cc: davem@davemloft.net, Eric Dumazet <eric.dumazet@gmail.com>,
josh@joshtriplett.org, rostedt@goodmis.org,
mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] rcu: fix the OOM problem of huge IP abnormal packet traffic
Date: Mon, 9 Jan 2017 21:51:53 -0800 [thread overview]
Message-ID: <20170110055153.GL3800@linux.vnet.ibm.com> (raw)
In-Reply-To: <9fc43387-4a23-f89f-168e-b46e7bf94e40@huawei.com>
On Tue, Jan 10, 2017 at 11:20:40AM +0800, Ding Tianhong wrote:
>
>
> On 2017/1/4 21:48, Paul E. McKenney wrote:
> > On Wed, Jan 04, 2017 at 03:02:30PM +0800, Ding Tianhong wrote:
> >>
> >>
> >> On 2017/1/4 8:57, Paul E. McKenney wrote:
> >>> On Wed, Dec 28, 2016 at 04:13:15PM -0800, Paul E. McKenney wrote:
> >>>> On Wed, Dec 28, 2016 at 01:58:06PM +0800, Ding Tianhong wrote:
> >>>>> Hi, Paul:
> >>>>>
> >>>>> I try to debug this problem and found this solution could work well for both problem scene.
> >>>>>
> >>>>>
> >>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> >>>>> index 85c5a88..dbc14a7 100644
> >>>>> --- a/kernel/rcu/tree_plugin.h
> >>>>> +++ b/kernel/rcu/tree_plugin.h
> >>>>> @@ -2172,7 +2172,7 @@ static int rcu_nocb_kthread(void *arg)
> >>>>> if (__rcu_reclaim(rdp->rsp->name, list))
> >>>>> cl++;
> >>>>> c++;
> >>>>> - local_bh_enable();
> >>>>> + _local_bh_enable();
> >>>>> cond_resched_rcu_qs();
> >>>>> list = next;
> >>>>> }
> >>>>>
> >>>>>
> >>>>> The cond_resched_rcu_qs() would process the softirq if the softirq is pending, so no need to use
> >>>>> local_bh_enable() to process the softirq twice here, and it will avoid OOM when huge packets arrives,
> >>>>> what do you think about it? Please give me some suggestion.
> >>>>
> >>>> From what I can see, there is absolutely no guarantee that
> >>>> cond_resched_rcu_qs() will do local_bh_enable(), and thus no guarantee
> >>>> that it will process any pending softirqs -- and that is not part of
> >>>> its job in any case. So I cannot recommend the above patch.
> >>>>
> >>>> On efficient handling of large invalid packets (that is still the issue,
> >>>> right?), I must defer to Dave and Eric.
> >>>
> >>> On the perhaps unlikely off-chance that there is a fix for this outside
> >>> of networking, what symptoms are you seeing without this fix in place?
> >>> Still RCU CPU stall warnings? Soft lockups? Something else?
> >>>
> >>> Thanx, Paul
> >>>
> >>
> >> Hi Paul:
> >>
> >> I was still try to test and fix this by another way, but could explain more about this problem.
> >>
> >> when the huge packets coming, the packets was abnormal and will be freed by dst_release->call_rcu(dst_destroy_rcu),
> >> so the rcuos kthread will handle the dst_destroy_rcu to free them, but when the rcuos was looping ,I fould the local_bh_enable() will
> >> call do_softirq to receive a certain number of packets which is abnormal and need to be free, but more packets is coming so when cond_resched_rcu_qs run,
> >> it will do the ksoftirqd and do softirq again, so rcuos kthread need free more, it looks more and more worse and lead to OOM because many more packets need to
> >> be freed.
> >> So I think the do_softirq in the local_bh_enable is not need here, the cond_resched_rcu_qs() will handle the do_softirq once, it is enough.
> >>
> >> and recently I found that the Eric has upstream a new patch named (softirq: Let ksoftirqd do its job) may fix this, and still test it, not get any results yet.
> >
> > OK, I don't see any reasonable way that the RCU callback-offload tasks
> > (rcuos) can figure out whether or not they should let softirqs happen --
> > unconditionally suppressing them might help your workload, but would
> > break workloads needing low networking latency, of which there are many.
> >
> > So please let me know now things go with Eric's patch.
> >
> Hi Paul:
>
> Good news, the Eric's patch could fix this problem, it means that if the softirqd kthread is running, we should not take too much
> time in the softirq process, this behavior equivalent that we remove the do_softirq in the local_bh_enable(), but this solution looks more
> perfect, we need to inform the lts kernel maintainer to applied this patch which is not looks like a bugfix.
Here is hoping! ;-)
Thanx, Paul
next prev parent reply other threads:[~2017-01-10 5:52 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-18 12:40 [PATCH] rcu: fix the OOM problem of huge IP abnormal packet traffic Ding Tianhong
2016-11-18 13:01 ` Paul E. McKenney
2016-11-19 7:50 ` Ding Tianhong
2016-11-19 8:22 ` Paul E. McKenney
2016-11-21 0:13 ` Paul E. McKenney
2016-11-21 1:28 ` Ding Tianhong
2016-12-28 5:58 ` Ding Tianhong
2016-12-29 0:13 ` Paul E. McKenney
2017-01-04 0:57 ` Paul E. McKenney
2017-01-04 7:02 ` Ding Tianhong
2017-01-04 13:48 ` Paul E. McKenney
2017-01-10 3:20 ` Ding Tianhong
2017-01-10 5:51 ` Paul E. McKenney [this message]
2017-01-10 7:28 ` Ding Tianhong
2016-11-21 6:52 ` [lkp] [rcu] 83ee00c6cf: WARNING:at_kernel/softirq.c:#__local_bh_enable kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170110055153.GL3800@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=davem@davemloft.net \
--cc=dingtianhong@huawei.com \
--cc=eric.dumazet@gmail.com \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).