From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932082AbcELIH3 (ORCPT ); Thu, 12 May 2016 04:07:29 -0400 Received: from mail-wm0-f68.google.com ([74.125.82.68]:33784 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751497AbcELIHV (ORCPT ); Thu, 12 May 2016 04:07:21 -0400 MIME-Version: 1.0 In-Reply-To: References: <1461214567-3356-1-git-send-email-lianwei.wang@gmail.com> <20160421105042.GI3408@twins.programming.kicks-ass.net> From: Lianwei Wang Date: Thu, 12 May 2016 01:06:59 -0700 Message-ID: Subject: Re: [PATCH] cpu/hotplug: handle unbalanced hotplug enable/disable To: Thomas Gleixner Cc: Peter Zijlstra , oleg@redhat.com, Ingo Molnar , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I have come up a patch to make the pm notifier called symmetrically and currently being tested. I will send it out after pass the test. On Fri, May 6, 2016 at 12:18 AM, Thomas Gleixner wrote: > On Fri, 6 May 2016, Lianwei Wang wrote: >> On Thu, May 5, 2016 at 5:13 AM, Thomas Gleixner wrote: >> > Can you eventually come up with a coherent explanation of the problem down to >> > the root cause or are we going to play this "move the workaround one step >> > down" game for another 10 rounds? >> > >> Do you agree that any driver can abort the suspend process by >> returning an error or NOTIFY_BAD if it is not ready to suspend? >> I have explain it and I also copied the example code that abort >> suspend by returning an error or NOTIFY_BAD in the pm notifier >> callback function. > > I don't need copied example code which does not tell me what the real problem > is. > >> The cpu_hotplug_disable and cpu_hotplug_enable are called in one of >> the PM notifier callback. And they are called from two difference >> place. >> Below is how it happened: >> pm_suspend >> |--enter_state >> |--suspend_prepare >> |--pm_notifier_call_chain(PM_SUSPEND_PREPARE) >> | |--call_back_1 >> | |--call_back_.. >> | |--call_back_n ===> return NOTIFY_BAD to abort call chain and >> | | suspend process here >> | |--cpu_hotplug_pm_callback() >> | | |--cpu_hotplug_disable =====> remember it is not >> called yet >> | |--call_back_.. >> | >> |--pm_notifier_call_chain(PM_POST_SUSPEND) >> | |--call_back_1 >> | |--call_back_.. >> | |--call_back_n >> | |--cpu_hotplug_pm_callback() >> | | |--cpu_hotplug_enable =====> Here it is unbalanced called >> | |--call_back_.. >> | >> So, keep in mind that for pm notifier call chain, the >> PM_SUSPEND_PREPARE notifier and PM_POST_SUSPEND notifier is not always >> paired called. Sometimes for a driver's pm notifier callback, the >> PM_POST_SUSPEND is called without PM_SUSPEND_PREPARE. > > So that is the real problem: cpu_hotplug_pm_callback(PM_POST_SUSPEND) can be > called w/o a previous call to cpu_hotplug_pm_callback(PM_SUSPEND_PREPARE). > >> > It cannot prevent any unbalanced calls. It mitigates the issue, but that's a >> > different problem. >> It did not migrate the issue. It give a warning message to log the >> unbalanced issue and it also make sure the cpu hotplug continue to >> work well even someone do an unbalanced call. It is a good checking as >> the enable_irq/disable_irq do. There are some other unbalanced >> checking in kernel too. All make sure the kernel has a better >> stability. > > I'm not opposed to do that and I said so several times. But I said as well, > that we do not add this without fixing the problem which made you write that > patch in the first place. > > So we have a proper explanation for the real problem now, but we have no > fix. > > And again: Your patch is NOT a fix. Simply because it will emit a warning > everytime the above happens. And that's wrong because the abort is a > legitimate scenario. > > So please come up with a sensible fix for the suspend abort issue and then we > can add the balance check/fixup to the hotplug_disable/enable() code. > > Thanks, > > tglx > >