From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752831Ab2ARTWp (ORCPT ); Wed, 18 Jan 2012 14:22:45 -0500 Received: from e28smtp03.in.ibm.com ([122.248.162.3]:58149 "EHLO e28smtp03.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751240Ab2ARTWn (ORCPT ); Wed, 18 Jan 2012 14:22:43 -0500 Message-ID: <4F171BF8.50803@linux.vnet.ibm.com> Date: Thu, 19 Jan 2012 00:52:32 +0530 From: "Srivatsa S. Bhat" User-Agent: Mozilla/5.0 (X11; Linux i686; rv:7.0) Gecko/20110927 Thunderbird/7.0 MIME-Version: 1.0 To: Tejun Heo CC: "Rafael J. Wysocki" , Linux PM list , LKML , horms@verge.net.au, "pavel@ucw.cz" , Len Brown Subject: Re: [Update][PATCH] PM / Hibernate: Fix s2disk regression related to unlock_system_sleep() References: <201201172345.15010.rjw@sisk.pl> <201201180015.56510.rjw@sisk.pl> <4F16C24A.4050007@linux.vnet.ibm.com> <4F16F94C.4020000@linux.vnet.ibm.com> <4F16FF0D.1030606@linux.vnet.ibm.com> <20120118173037.GE30664@google.com> In-Reply-To: <20120118173037.GE30664@google.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit x-cbid: 12011819-3864-0000-0000-000001054FB7 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/18/2012 11:00 PM, Tejun Heo wrote: > Hello, Srivatsa. > > On Wed, Jan 18, 2012 at 10:49:09PM +0530, Srivatsa S. Bhat wrote: >> I agree, but I was trying to keep the comment from growing too long ;) > > It doesn't have to be long. It just has to give some meaning to the > decision. AFAICS, it is correct to call try_to_freeze() on > unlock_system_sleep() regardless of 20sec window. There's no > guarantee the unlocking task is gonna hit try_to_freeze() elsewhere > and not calling it actually makes the interface buggy. > Not really - In short, there is a *key* difference between: freezer_do_not_count() - freezer_count() vs lock_system_sleep() - unlock_system_sleep() And that is, the latter pair acquire and release pm_mutex - and the freezer doesn't do *anything* without first acquiring the same pm_mutex. Which is convincing enough to start believing that try_to_freeze() might be unnecessary in the latter pair. Now, more explanation follows: Consider the 2 cases below: Some random code: Case 1: /* Was doing something */ freezer_do_not_count(); /* Important work */ freezer_count(); Case 2: /* Was doing something */ lock_system_sleep(); /* Important work */ unlock_system_sleep(); In case 1, we asked the freezer to ignore us until we are done with some "important" work. And precisely because we interfered with the freezer trying to freeze us, it is our duty to cooperate with the freezer when our "important" work is finished. Which is why we call try_to_freeze() inside freezer_count(). However, in case 2, the same story as above applies, but with a catch: After we asked the freezer to ignore us, we could not even *start* doing our "important" work! Because, we were waiting for pm_mutex, since the freezer had got hold of it! So we get to do our "important work" only when the freezer and all other PM operations after that is done. And here is another catch: Once we start doing our "important work", no PM operation (freezing in particular) can begin, because now *we* have acquired the pm_mutex :-) So to summarize, calling try_to_freeze() inside some generic freezer_count() is justified and _needed_. But in case of unlock_system_sleep(), as I depicted in my previous post, it wouldn't buy us anything at all! (because freezing wouldn't have even begun by then). [ And in the case of freezer_count(), your point is right that calling try_to_freeze() is not to help the freezer meet its 20sec deadline - the freezer could have finished its work and gone away long before we completed our "important" work. So the real reason why we call try_to_freeze in freezer_count is that when some PM operation is running which requires tasks to be frozen, we want to respect that condition and get frozen - but we probably ended up totally bypassing the freezer by calling freezer_do_not_count, but now that we are done with our important work, we want to manually enter the refrigerator, to satisfy the freezing condition requested by the PM operation in progress and hence avoid any havoc by running any further.] > That said, it causes a problem because unlock_system_sleep() is called > in a special context during later stage of hibernation where the usual > expectation - that a freezable task which sees a freezing condition > should freeze - doesn't hold. > > The correct solution would be somehow marking that condition so that > either try_to_freeze() doesn't get invoked or gets nullified - Even I thought this would be the right solution at first, but then realized that having try_to_freeze() inside unlock_system_sleep() is not at all essential, to begin with. And not having try_to_freeze() within unlock_system_sleep() automatically solves the hibernation problem. In fact, it does even better: Later stages of hibernation would no longer be a "special context" to watch out for! > e.g. making the SKIP thing a counter and ensure the hibernating task > has it elevated throughout the whole process. Alternatively, if the > code path is limited enough, using a different version of the unlock > function, unlock_system_sleep_nofreeze() or whatever, would work too - > this is a popular approach for synchronization functions which > interacts with scheduler and preemption. > > For now, as a quick fix, maybe not calling try_to_freeze() > unconditionally is okay, I don't know, but it's a hack. > Somehow I don't think its a hack, based on my perception as described above. But feel free to prove me wrong :-) Regards, Srivatsa S. Bhat