From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753686AbaLVAwb (ORCPT ); Sun, 21 Dec 2014 19:52:31 -0500 Received: from mail-qg0-f47.google.com ([209.85.192.47]:39955 "EHLO mail-qg0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753380AbaLVAw3 (ORCPT ); Sun, 21 Dec 2014 19:52:29 -0500 MIME-Version: 1.0 In-Reply-To: References: <20141219145528.GC13404@redhat.com> <20141221223204.GA9618@codemonkey.org.uk> Date: Sun, 21 Dec 2014 16:52:28 -0800 X-Google-Sender-Auth: dH7KKHmZ2Lj6QMfwC28_JKafrzw Message-ID: Subject: Re: frequent lockups in 3.18rc4 From: Linus Torvalds To: Dave Jones , Linus Torvalds , Thomas Gleixner , Chris Mason , Mike Galbraith , Ingo Molnar , Peter Zijlstra , =?UTF-8?Q?D=C3=A2niel_Fraga?= , Sasha Levin , "Paul E. McKenney" , Linux Kernel Mailing List , Suresh Siddha , Oleg Nesterov , Peter Anvin Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Dec 21, 2014 at 4:41 PM, Linus Torvalds wrote: > > The second time (or third, or fourth - it might not take immediately) > you get a lockup or similar. Bad things happen. I've only tested it twice now, but the first time I got a weird lockup-like thing (things *kind* of worked, but I could imagine that one CPU was stuck with a lock held, because things eventually ground to a screeching halt. The second time I got INFO: rcu_sched self-detected stall on CPU { 5} (t=84533 jiffies g=11971 c=11970 q=17) and then INFO: rcu_sched detected stalls on CPUs/tasks: { 1 2 3 4 5 6 7} (detected by 0, t=291309 jiffies, g=12031, c=12030, q=57) with backtraces that made no sense (because obviously no actual stall had taken place), and were the CPU's mostly being idle. I could easily see it resulting in your softlockup scenario too. Linus