From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756332Ab3BQPL1 (ORCPT ); Sun, 17 Feb 2013 10:11:27 -0500 Received: from mail-lb0-f175.google.com ([209.85.217.175]:47479 "EHLO mail-lb0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755760Ab3BQPL0 (ORCPT ); Sun, 17 Feb 2013 10:11:26 -0500 MIME-Version: 1.0 In-Reply-To: References: <20130212193901.GA18906@redhat.com> <20130213004059.GA14451@redhat.com> <20130213041629.GA28622@redhat.com> <20130213193411.GA15928@redhat.com> <20130215011503.GA11914@redhat.com> <20130215174435.GA2792@linux.vnet.ibm.com> Date: Sun, 17 Feb 2013 16:11:24 +0100 Message-ID: Subject: Re: Debugging Thinkpad T430s occasional suspend failure. From: Frederic Weisbecker To: Linus Torvalds Cc: Paul McKenney , Thomas Gleixner , Ingo Molnar , Peter Zijlstra , Dave Jones , Hugh Dickins , Linux Kernel Mailing List , Paul McKenney Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2013/2/15 Linus Torvalds : > On Fri, Feb 15, 2013 at 9:44 AM, Paul E. McKenney > wrote: >> >> This commit was designed to increase the probability of hitting the >> races described in http://lwn.net/Articles/453002/. These races result >> in deadlocks involving the runqueue lock (and perhaps also the priority >> inheritance locks). And yes, I most certainly should have described >> this in the commit message. :-( > > Ugh. That particular race seems to be because the softirq handling is > just crazy, and does the "wakeup_softirqd()" form interrupt context, > BUT HAS SPECIFICALLY BROKEN THE IRQ COUNTING! > > Because it claims to do it from softirq context, which is pure > garbage. It's not actually in softirq context. > > The whole hardirq -> softirq transition seems stupid. I'm sure I made > some serious mistake in cleaning it up, and there's probably some > missed tracepoint (or perhaps screwed-up lockdep annotation), but I > think the hardirq -> softirq preempt thing shoudl be done as an atomic > preempt downgrade, so that we never have a window of "uhhuh, another > interrupt can come in between and see us as being in neither). And the > wakeup_softirqd should be done without playing with preempt count at > all. > > Something like this ENTIRELY UNTESTED patch. > > Note: I doubt this patch affects Dave's issue at all, I just started > looking at that do_softirq code when I read your bug explanation. > > Adding random people for kernel/softirq.c to the participants list. > Comments about the patch? Do note that it's entirely untested, so > consider it more a RFD than a real patch.. It looks like it adds a lot > of lines, but most of it is for comments and simplification of the > logic. preempt_value_in_interrupt() looks buggy in your patch: it makes invoke_softirq() returning if (val & HARDIRQ_MASK). But that's always true since you have moved further the sub_preempt_count(IRQ_EXIT) further.