From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752753AbcKSHSm (ORCPT ); Sat, 19 Nov 2016 02:18:42 -0500 Received: from szxga03-in.huawei.com ([119.145.14.66]:21751 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751990AbcKSHSl (ORCPT ); Sat, 19 Nov 2016 02:18:41 -0500 Subject: Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread To: References: <57610368.7080905@huawei.com> <20160615154913.GC3923@linux.vnet.ibm.com> <576242AB.5010204@huawei.com> <20160616141920.GO3923@linux.vnet.ibm.com> <57AA7FAA.1030801@huawei.com> <20160810015900.GB3482@linux.vnet.ibm.com> <3dedae95-d939-bdf5-ea1e-3932c44f0874@huawei.com> <20161118125627.GN3612@linux.vnet.ibm.com> CC: , , , , "linux-kernel@vger.kernel.org" From: Ding Tianhong Message-ID: <8a271871-8707-9fcb-2fcf-c39469be3584@huawei.com> Date: Sat, 19 Nov 2016 15:17:37 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161118125627.GN3612@linux.vnet.ibm.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.23.32] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2016/11/18 20:56, Paul E. McKenney wrote: > On Fri, Nov 18, 2016 at 08:37:28PM +0800, Ding Tianhong wrote: >> >> >> On 2016/8/10 9:59, Paul E. McKenney wrote: >>> On Wed, Aug 10, 2016 at 09:13:14AM +0800, Ding Tianhong wrote: >>>> On 2016/6/16 22:19, Paul E. McKenney wrote: >>>>> On Thu, Jun 16, 2016 at 02:09:47PM +0800, Ding Tianhong wrote: >>>>>> On 2016/6/15 23:49, Paul E. McKenney wrote: >>>>>>> On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote: >>>>>>>> I met this problem when using the Testgine to send package to ixgbevf nic >>>>>>>> by this steps: >>>>>>>> 1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine. >>>>>>>> 2. Then use ifconfig to down the nic and up again, loop for several times. >>>>>>>> 3. The system panic by soft lockup. >>>>>>> >>>>>>> Good catch, queued for review and testing. But what .config was your >>>>>>> kernel built with? >>>>>>> >>>>>> >>>>>> I use the redhat7.1 defconfig to build my kernel, and the RCU config is this: >>>>>> 120 # >>>>>> 121 # RCU Subsystem >>>>>> 122 # >>>>>> 123 CONFIG_TREE_RCU=y >>>>>> 124 # CONFIG_PREEMPT_RCU is not set >>>>>> 125 CONFIG_RCU_STALL_COMMON=y >>>>>> 126 CONFIG_CONTEXT_TRACKING=y >>>>>> 127 CONFIG_RCU_USER_QS=y >>>>>> 128 # CONFIG_CONTEXT_TRACKING_FORCE is not set >>>>>> 129 CONFIG_RCU_FANOUT=64 >>>>>> 130 CONFIG_RCU_FANOUT_LEAF=16 >>>>>> 131 # CONFIG_RCU_FANOUT_EXACT is not set >>>>>> 132 # CONFIG_RCU_FAST_NO_HZ is not set >>>>>> 133 # CONFIG_TREE_RCU_TRACE is not set >>>>>> 134 CONFIG_RCU_NOCB_CPU=y >>>>>> 135 CONFIG_RCU_NOCB_CPU_ALL=y >>>>>> 136 CONFIG_BUILD_BIN2C=y >>>>> >>>>> Thank you! You were running with preemption disabled, so your system >>>>> would indeed be very susceptible to this problem. >>>>> >>>>>>> Also, I did tweak both the commit log and the patch. Your cond_resched() >>>>>>> would prevent soft lockups, but not RCU stalls, so I substituted >>>>>>> cond_resched_rcu_qs(). Please let me know if either of those changes >>>>>>> causes problems at your end. >>>>>> >>>>>> Looks fine to me, I will apply this to my branch and test it, thanks. >>>>> >>>>> Please let me know how it goes! >>>>> >>>>> Thanx, Paul >>>>> >>>> >>>> Hi Paul: >>>> >>>> It has been a long time after applying this patch, and didn't found any problem, I believe this patch is fine, thanks. >>> >>> Very good! I will push this one upstream during the next merge window. >>> >>> Thanx, Paul >>> >> >> Hi Paul: >> >> Sorry to say that I have found this patch will introduce an OOM problem, it will be triggered by huge IP abnormal packet >> arrived, it looks that avoid process any pending softirqs in the rcuos kthread is the best way to fix this problem, I will >> send a new patch to revert this and fix the problem. > > Interesting... > > Could you please let me know exactly how the added cond_resched_rcu_qs() > leads to an OOM? Is it that the softirqs prevent the grace-period kthread > from making progress? > Ok, reply and discuss on other patch, thanks. Ding > Thanx, Paul > > > . >