From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753848AbZBWJp0 (ORCPT ); Mon, 23 Feb 2009 04:45:26 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751419AbZBWJpM (ORCPT ); Mon, 23 Feb 2009 04:45:12 -0500 Received: from mx2.mail.elte.hu ([157.181.151.9]:51197 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751265AbZBWJpK (ORCPT ); Mon, 23 Feb 2009 04:45:10 -0500 Date: Mon, 23 Feb 2009 10:44:41 +0100 From: Ingo Molnar To: "Eric W. Biederman" Cc: "Rafael J. Wysocki" , Linus Torvalds , LKML , Benjamin Herrenschmidt , Jeremy Fitzhardinge , pm list , Len Brown , Jesse Barnes , Thomas Gleixner Subject: Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume Message-ID: <20090223094441.GM9582@elte.hu> References: <200902221837.49396.rjw@sisk.pl> <200902222342.08285.rjw@sisk.pl> <200902230048.33635.rjw@sisk.pl> <20090223084400.GB9582@elte.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.3 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Eric W. Biederman wrote: > Ingo Molnar writes: > > > I think this aspect has been well-understood during the > > discussion of this topic and it's just a slightly misleading > > changelog. > > As I was a member of that discussion I did not see that. > > It took me several passes through the patches to realize the > goal is to allow drivers to be able to sleep while they are in > their late pm shutdown routines. > > Why we want this I don't know. But it seems simple enough to > implement, and it makes it harder to get the late pm suspend > routines wrong, which is always good. That's not the only goal. The other goal is to further shrink a particular window of suspend fragility: the irqs-disabled stage of the suspend/resume sequence. Since suspend/resume is a mini-reboot sequence, there's a large amount of code executed - and the variety of code is large as well. We had repeat cases of random drivers re-enabling interrupts and thus breaking other drivers - and these are nasty to debug. So this patchset disables device IRQs centrally and serializes with pending work - so there's no races with pending IRQs anymore. The fact that we keep the timer irq running is two-fold: firstly the timer code is special and not really part of the regular suspend/resume sequence. Drivers want to take timestamps, sometimes they even want to do a small usleep(), etc. Ideally the suspend/resume code is pretty much _the same_ as a regular bootup (and shutdown) code - so we want to provide a similar environment to how drivers initialize and deinitialize, and we want to enable them to share code between bootup/shutdown and suspend/resume agressively. So the more generic kernel environment we give these fragile handlers, the better we are off in the end. Since we already had IRQS_TIMER, that was just the natural thing to do. > > The new suspend code does not rely on truly disabling IRQs > > on the low level. The purpose is to not get IRQs to drivers > > - which might crash/hang/race/misbehave. > > Reasonable. I expect one of the problems with drivers getting > it wrong is that the interface is too complex for mortal > humans to understand. The suspend/resume state machine certainly used to be a piece of code that makes a seasoned kernel developer weep in fear. That has changed drastically in the past few months. The suspend+hibernation logic got unified (at least as far as driver methods go), and all the flow and ordering has been cleaned up and has been made more robust. What makes s2ram fragile is not human failure but the combination of a handful of physical property: 1) Psychology: shutting the lid or pushing the suspend button is a deceivingly 'simple' action to the user. But under the hood, a ton of stuff happens: we deinitialize a lot of things, we go through _all hardware state_, and we do so in a serial fashion. If just one piece fails to do the right thing, the box might not resume. Still, the user expects this 'simple' thing to just work, all the time. No excuses accepted. 2) Length of code: To get a successful s2ram sequence the kernel runs through tens of thousands of lines of code. Code which never gets executed on a normal box - only if we s2ram. If just one step fails, we get a hung box. 3) Debuggability: a lot of s2ram code runs with the console off, making any bugs hard to debug. Furthermore we have no meaningful persistent storage either for kernel bug messages. The RTC trick of PM_DEBUG works but is a very narrow channel of information and it takes a lot of time to debug a bug via that method. The combination of these factors really makes up for a perfect storm in terms of kernel technology: we have this very-deceivingly-simple-looking but complex-and-rarely-executed piece of code, which is very hard to debug. Even just one of these factors would be enough to make an otherwise healthy subsystem fragile - no wonder s2ram has been a problem ever since it existed in the upstream kernel. So now we need just one thing: patience and more of the same good stuff that happened lately. > > Still, it might make sense to not just use the ->disable > > sequence but primarily the ->shutdown irqchip method (when > > it's available in the irqchip). > > Disable seems fine to me. This is interesting in the context > of all of the irqs that will when masked show up somewhere > else (think boot interrupts). > > > While we obviously cannot turn off the PIC that delivers > > timer IRQs at this stage - there's no theoretical reason why > > the suspend sequence couldnt power down some secondary PICs > > as well - in some arch code, or maybe even in the generic > > driver suspend sequence if the device tree is structured > > carefully enough so that the PIC gets turned off last. > > If the point is simply to prevent deliver of irqs to the > drivers I don't see the point of anything more than what the > patch does now. ... except for the usecase i described above. Say some PIC sits on a piece of silicon which gets turned off. I'm not talking about x86 but some custom device. We really dont want that IRQ line to send half of an IRQ message (un-ACK-ed) when it gets turned off. So physically 'suspending' all IRQ lines does make a certain level of long-term sense. Especially if it's just 3 extra lines of code to the existing patch. There _might_ be one downside: overhead of ->shutdown() methods. With a typical IRQ count on the typical netbook i doubt it's more than ~50 usecs combined. Ingo