From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752051AbdK1OQl (ORCPT ); Tue, 28 Nov 2017 09:16:41 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:33370 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751715AbdK1OQk (ORCPT ); Tue, 28 Nov 2017 09:16:40 -0500 Date: Tue, 28 Nov 2017 15:16:30 +0100 (CET) From: Thomas Gleixner To: "Paul E. McKenney" cc: kernel test robot , LKML , lkp@01.org Subject: Re: [lkp-robot] [torture] b151f93a71: INFO:rcu_preempt_detected_stalls_on_CPUs/tasks In-Reply-To: <20171127215735.GP3624@linux.vnet.ibm.com> Message-ID: References: <20171126084203.GE21779@yexl-desktop> <20171127215735.GP3624@linux.vnet.ibm.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 27 Nov 2017, Paul E. McKenney wrote: > On Sun, Nov 26, 2017 at 04:42:03PM +0800, kernel test robot wrote: > > [ 116.353432] rcu_preempt kthread starved for 9974 jiffies! g4294967208 c4294967207 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0 > > So the immediate reason for the stall warning is that the RCU grace-period > kthread isn't being allowed to run. > > > [ 116.355517] rcu_preempt I 7464 8 2 0x80000000 > > [ 116.356543] Call Trace: > > [ 116.357008] __schedule+0x493/0x620 > > [ 116.357682] schedule+0x24/0x40 > > [ 116.358291] schedule_timeout+0x330/0x3b0 > > And the reason that it isn't being allowed to run is that its few-jiffy > schedule_timeout has extended for more than nine thousand jiffies. > > There was an odd combination of kernel parameters that Thomas Gleixner > came across that could cause writer-thread stalls (since fixed in -rcu > by the exact patch you call out here), but I don't see how this could > cause an RCU CPU stall warning. The only reasonable explanation is that either a wakeup is missed or the timer is not expired. Hard to tell from that back trace, but it would be interesting to figure that out. Let me think about how that can be done. Thanks, tglx