From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757101AbaCEDzp (ORCPT ); Tue, 4 Mar 2014 22:55:45 -0500 Received: from mail-pd0-f175.google.com ([209.85.192.175]:41718 "EHLO mail-pd0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756055AbaCEDzn (ORCPT ); Tue, 4 Mar 2014 22:55:43 -0500 MIME-Version: 1.0 In-Reply-To: <20140305014310.GC3334@linux.vnet.ibm.com> References: <1393980534.26794.147.camel@edumazet-glaptop2.roam.corp.google.com> <20140305014310.GC3334@linux.vnet.ibm.com> From: Florian Fainelli Date: Tue, 4 Mar 2014 19:55:03 -0800 Message-ID: Subject: Re: RCU stalls when running out of memory on 3.14-rc4 w/ NFS and kernel threads priorities changed To: Paul McKenney Cc: Eric Dumazet , "linux-kernel@vger.kernel.org" , linux-mm , linux-nfs , "trond.myklebust" , netdev Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2014-03-04 17:43 GMT-08:00 Paul E. McKenney : > On Tue, Mar 04, 2014 at 05:16:27PM -0800, Florian Fainelli wrote: >> 2014-03-04 17:03 GMT-08:00 Florian Fainelli : >> > 2014-03-04 16:48 GMT-08:00 Eric Dumazet : >> >> On Tue, 2014-03-04 at 15:55 -0800, Florian Fainelli wrote: >> >>> Hi all, >> >>> >> >>> I am seeing the following RCU stalls messages appearing on an ARMv7 >> >>> 4xCPUs system running 3.14-rc4: >> >>> >> >>> [ 42.974327] INFO: rcu_sched detected stalls on CPUs/tasks: >> >>> [ 42.979839] (detected by 0, t=2102 jiffies, g=4294967082, >> >>> c=4294967081, q=516) >> >>> [ 42.987169] INFO: Stall ended before state dump start >> >>> >> >>> this is happening under the following conditions: >> >>> >> >>> - the attached bumper.c binary alters various kernel thread priorities >> >>> based on the contents of bumpup.cfg and >> >>> - malloc_crazy is running from a NFS share >> >>> - malloc_crazy.c is running in a loop allocating chunks of memory but >> >>> never freeing it >> >>> >> >>> when the priorities are altered, instead of getting the OOM killer to >> >>> be invoked, the RCU stalls are happening. Taking NFS out of the >> >>> equation does not allow me to reproduce the problem even with the >> >>> priorities altered. >> >>> >> >>> This "problem" seems to have been there for quite a while now since I >> >>> was able to get 3.8.13 to trigger that bug as well, with a slightly >> >>> more detailed RCU debugging trace which points the finger at kswapd0. >> >>> >> >>> You should be able to get that reproduced under QEMU with the >> >>> Versatile Express platform emulating a Cortex A15 CPU and the attached >> >>> files. >> >>> >> >>> Any help or suggestions would be greatly appreciated. Thanks! >> >> >> >> Do you have a more complete trace, including stack traces ? >> > >> > Attatched is what I get out of SysRq-t, which is the only thing I have >> > (note that the kernel is built with CONFIG_RCU_CPU_STALL_INFO=y): >> >> QEMU for Versatile Express w/ 2 CPUs yields something slightly >> different than the real HW platform this is happening with, but it >> does produce the RCU stall anyway: >> >> [ 125.762946] BUG: soft lockup - CPU#1 stuck for 53s! [malloc_crazy:91] > > This soft-lockup condition can result in RCU CPU stall warnings. Fix > the problem causing the soft lockup, and I bet that your RCU CPU stall > warnings go away. I definitively agree, which is why I was asking for help, as I think the kernel thread priority change is what is causing the soft lockup to appear, but nothing obvious jumps to mind when looking at the trace. Thanks! -- Florian From mboxrd@z Thu Jan 1 00:00:00 1970 From: Florian Fainelli Subject: Re: RCU stalls when running out of memory on 3.14-rc4 w/ NFS and kernel threads priorities changed Date: Tue, 4 Mar 2014 19:55:03 -0800 Message-ID: References: <1393980534.26794.147.camel@edumazet-glaptop2.roam.corp.google.com> <20140305014310.GC3334@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: Eric Dumazet , "linux-kernel@vger.kernel.org" , linux-mm , linux-nfs , "trond.myklebust" , netdev To: Paul McKenney Return-path: In-Reply-To: <20140305014310.GC3334@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-Id: netdev.vger.kernel.org 2014-03-04 17:43 GMT-08:00 Paul E. McKenney : > On Tue, Mar 04, 2014 at 05:16:27PM -0800, Florian Fainelli wrote: >> 2014-03-04 17:03 GMT-08:00 Florian Fainelli : >> > 2014-03-04 16:48 GMT-08:00 Eric Dumazet : >> >> On Tue, 2014-03-04 at 15:55 -0800, Florian Fainelli wrote: >> >>> Hi all, >> >>> >> >>> I am seeing the following RCU stalls messages appearing on an ARMv7 >> >>> 4xCPUs system running 3.14-rc4: >> >>> >> >>> [ 42.974327] INFO: rcu_sched detected stalls on CPUs/tasks: >> >>> [ 42.979839] (detected by 0, t=2102 jiffies, g=4294967082, >> >>> c=4294967081, q=516) >> >>> [ 42.987169] INFO: Stall ended before state dump start >> >>> >> >>> this is happening under the following conditions: >> >>> >> >>> - the attached bumper.c binary alters various kernel thread priorities >> >>> based on the contents of bumpup.cfg and >> >>> - malloc_crazy is running from a NFS share >> >>> - malloc_crazy.c is running in a loop allocating chunks of memory but >> >>> never freeing it >> >>> >> >>> when the priorities are altered, instead of getting the OOM killer to >> >>> be invoked, the RCU stalls are happening. Taking NFS out of the >> >>> equation does not allow me to reproduce the problem even with the >> >>> priorities altered. >> >>> >> >>> This "problem" seems to have been there for quite a while now since I >> >>> was able to get 3.8.13 to trigger that bug as well, with a slightly >> >>> more detailed RCU debugging trace which points the finger at kswapd0. >> >>> >> >>> You should be able to get that reproduced under QEMU with the >> >>> Versatile Express platform emulating a Cortex A15 CPU and the attached >> >>> files. >> >>> >> >>> Any help or suggestions would be greatly appreciated. Thanks! >> >> >> >> Do you have a more complete trace, including stack traces ? >> > >> > Attatched is what I get out of SysRq-t, which is the only thing I have >> > (note that the kernel is built with CONFIG_RCU_CPU_STALL_INFO=y): >> >> QEMU for Versatile Express w/ 2 CPUs yields something slightly >> different than the real HW platform this is happening with, but it >> does produce the RCU stall anyway: >> >> [ 125.762946] BUG: soft lockup - CPU#1 stuck for 53s! [malloc_crazy:91] > > This soft-lockup condition can result in RCU CPU stall warnings. Fix > the problem causing the soft lockup, and I bet that your RCU CPU stall > warnings go away. I definitively agree, which is why I was asking for help, as I think the kernel thread priority change is what is causing the soft lockup to appear, but nothing obvious jumps to mind when looking at the trace. Thanks! -- Florian -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org