From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Rafael J. Wysocki" Subject: Re: Cpuidle drivers,Suspend : Fix suspend/resume hang with intel_idle driver Date: Mon, 2 Jul 2012 21:43:51 +0200 Message-ID: <201207022143.51489.rjw@sisk.pl> References: <4FEC20D2.2070501@linux.vnet.ibm.com> <4FF1336D.10808@linux.vnet.ibm.com> <4FF1329A.7000502@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Return-path: Received: from ogre.sisk.pl ([193.178.161.156]:49616 "EHLO ogre.sisk.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753153Ab2GBTiU (ORCPT ); Mon, 2 Jul 2012 15:38:20 -0400 In-Reply-To: <4FF1329A.7000502@linux.vnet.ibm.com> Sender: linux-acpi-owner@vger.kernel.org List-Id: linux-acpi@vger.kernel.org To: "Srivatsa S. Bhat" Cc: preeti , Dave Hansen , Linux PM mailing list , linux-acpi@vger.kernel.org, Deepthi Dharwar On Monday, July 02, 2012, Srivatsa S. Bhat wrote: > On 07/02/2012 11:06 AM, preeti wrote: > > On 06/30/2012 03:37 AM, Rafael J. Wysocki wrote: > >> On Friday, June 29, 2012, preeti wrote: > >>> On 06/29/2012 12:41 AM, Rafael J. Wysocki wrote: > >>>> On Thursday, June 28, 2012, preeti wrote: > >>>>> On 06/28/2012 03:23 PM, Rafael J. Wysocki wrote: > >>>>>> On Thursday, June 28, 2012, preeti wrote: > >>>>>>> > >>>>>>> From: Preeti U Murthy > >>>> [...] > >>>>> cpuidle is an architecture independent part of the kernel code.Since > >>>>> this patch aims at x86 architecture in specific,I considered it > >>>>> inappropriate. > >>>>> > >>>>> In addition to this,suspend happens on x86 only if ACPI is configured. > >>>> > >>>> But that is not required for intel_idle, so if it hangs with intel_idle, > >>>> then it is not dependent on ACPI after all. > >>> True intel_idle does not need ACPI to be configured,but that also means > >>> that the kernel will not provide you the means to suspend. > >> > >> Yes, it will. You can hibernate without ACPI in theory. > > > > True.You can suspend to disk without ACPI,but can suspend to RAM only > > with ACPI. > > This also highlights the fact that my patch has not taken care of > > hibernate notifiers,which I should have done.This goes to say that the > > callbacks during suspend better not be in ACPI specific code.I will look > > into correcting the placement of the callbacks. > > > >> > >>> There is no question of resume hang here at all as you cannot suspend in > >>> the first place. > >>> > >>> The issue is when ACPI is configured,and intel_idle is chosen to be the > >>> cpuidle driver.In this situation when the user suspends,cpus can enter > >>> deep sleep states as intel_idle driver does not prevent then from doing so. > >>> This is when resume hangs. > >> > >> I know that. > >> > >>>>> Therefore it seemed right to put the callback in ACPI specific code > >>>>> which deals with ACPI sleep support. > >>>> > >>>> I wonder if we can address this issue correctly. That is, in a non-racy > >>>> way and in a better place. > >>>> > >>>> First, I really don't think it is necessary to "suspend" cpuidle (be it > >>>> ACPI or any other) when device drivers' suspend routines are being > >>>> executed (which also is racy, because the cpuidle "suspend" may be running > >>>> concurrently with cpuidle on another CPU) or earlier. We really may want > >>>> to disable the deeper C-states when we're about to execute > >>>> suspend_ops->prepare_late(), or hibernation_ops->prepare(), i.e. after > >>>> we've run dpm_suspend_end() successfully. > >>> > >>> The commit "ACPI:disable lower idle C-states across suspend/resume" > >>> states that device_suspend() calls ACPI suspend functions which cause > >>> side effects on the lower idle C-states. > >> > >> I'd like to know the details, then. Which ACPI suspend functions those are, > >> in particular, because I'm not aware of any in device_suspend(). > >> > >> Also, please note that this commit is 5 years old and some things have changed > >> in the suspend code paths since that time. > > > > I agree.My view on this,as I have mentioned in one of my previous mails > > is, in the patch with the acpi_idle_suspend workaround,we have not taken > > precautions to check if there are cpus that have already entered deep > > C-states before the suspend routine has begun. > > > > We take care of disabling entry into deep C-states only during suspend. > > so if deep C-states are posing problems during suspend,then why dont > > these cpus that have entered idle states before suspend cause a resume hang? > > > > Because cpu hotplug kicks the cpu out of any idle state it was in, in order > to execute the CPU_DYING_FROZEN callbacks. (See my other mail about this). > > >> > >>> This means we need to disable entry into deeper C-states even before > >>> dpm_suspend_start(), > >> > >> This most likely is wrong. > >> > >> We may need to "suspend" cpuidle before calling suspend_device_irqs(), > >> but then again that shouldn't be made via a notifier I think. Why don't > >> we simply make suspend_device_irqs() disable lower C-states to start with? > >> > >>> but how much before? > >>> > >>> The commit answers this too.It says removing the functionality of > >>> entering deep C-states before suspend removed the side effects.Besides, > >>> the commit title says'across suspend/resume'.So I think to address the > >>> resume hang effectively,it is desirable to disable entry into deeper > >>> C-states during suspend_prepare operations. > >>>> > >>>> So, it seems, it might be a good idea to place a cpuidle driver suspend > >>>> (and/or hibernation) hook at the end of dpm_suspend_end() and make ACPI > >>>> and intel_idle drivers implement that hook. > >>>> > >>> Dont you think this patch is meant to fix a very ACPI specific problem? > >> > >> No, I don't. > >> > >>> Which is why I think the call backs or any hook meant to fix this should > >>> go into ACPI specific code. > >>> Else it will seem irrelevant to all other architectures that implement > >>> suspend. > >> > >> I don't honestly think it is irrelevant. The fact is that similar problems > >> have not been reported on other architectures _yet_, which by no means can > >> be taken as a proof that those architectures are not affected. > >> > >> Anyway, I think that the right way to address this is through a cpuidle driver > >> callback executed from suspend_device_irqs() (and analogously for the resume > >> code path) and not through some hacky-ugly ACPI changes. > > > > I agree with having a cpuidle driver callback as even hibernate > > callbacks need to be taken care of which have nothing to do with ACPI. > > > > But on what basis is the placement of the call back suggested at > > suspend_device_irqs()? What is the assurance that this placement will > > not cause a resume hang considering we dont know what precisely is > > causing this problem? > > > > Any place before CPU hotplug should do, I think. There may be ugly interactions with suspend_device_irqs(), though, that I'd prefer to avoid. Hence my suggestion to place the cpuidle "suspend" before suspend_device_irqs(). Thanks, Rafael