From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S948200AbcKAFWZ (ORCPT ); Tue, 1 Nov 2016 01:22:25 -0400 Received: from mail-pf0-f180.google.com ([209.85.192.180]:35357 "EHLO mail-pf0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754852AbcKAFWX (ORCPT ); Tue, 1 Nov 2016 01:22:23 -0400 Date: Mon, 31 Oct 2016 22:22:20 -0700 From: Brian Norris To: "Rafael J. Wysocki" Cc: Pavel Machek , Len Brown , Greg Kroah-Hartman , linux-kernel@vger.kernel.org, Doug Anderson , Brian Norris , Jeffy Chen , linux-pm@vger.kernel.org, Chuansheng Liu , Dmitry Torokhov Subject: Re: [PATCH v2 2/2] PM / sleep: don't suspend parent when async child suspend_{noirq,late} fails Message-ID: <20161101052217.GA132958@google.com> References: <1476923170-111986-2-git-send-email-briannorris@chromium.org> <1477584334-39956-1-git-send-email-briannorris@chromium.org> <8017823.VJuZzSqtaY@vostro.rjw.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8017823.VJuZzSqtaY@vostro.rjw.lan> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Rafael, On Tue, Nov 01, 2016 at 05:25:39AM +0100, Rafael J. Wysocki wrote: > On Thursday, October 27, 2016 09:05:34 AM Brian Norris wrote: > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > > index c58563581345..eaf6b53463a5 100644 > > --- a/drivers/base/power/main.c > > +++ b/drivers/base/power/main.c > > @@ -1040,6 +1040,9 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a > > > > dpm_wait_for_children(dev, async); > > > > + if (async_error) > > + goto Complete; > > + > > This is a second chech for async_error in this routine and is the first one > really needed after adding this? Maybe not? I confess I'm not 100% sure on all the reasons for the code structure as-is, but it looks like we're trying to catch pending wakeups early, and because that procedure utilizes 'async_error' to stash the -EBUSY, it seemingly makes sense to check if it's non-zero before overwriting it. But then, that's all kind of racy, since there can be multiple writers to that variable, no? So it can't matter *that* much if we clobber the error, as long as we abort somewhere. Anyway, maybe it's best if dpm_wait_for_children() just moves to be first thing in this function (after the tracepoints). That seems just as correct to me, and shouldn't waste any additional time suspending devices for a failed system suspend attempt -- as long as we're still catching wakeups before we suspend the current device. (That also incidentally matches the structure of __device_suspend() more closely. Why did this all get out of sync (pun unintended) when copied from the suspend() to the suspend_{late,noirq}() phase?) All in all, the short response is that I wrote the smallest patch that fixes the bug, AFAICT. But actually I think the above would be both shorter and better. I'll give that a go. > > if (dev->pm_domain) { > > info = "noirq power domain "; > > callback = pm_noirq_op(&dev->pm_domain->ops, state); > > @@ -1187,6 +1190,9 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as > > > > dpm_wait_for_children(dev, async); > > > > + if (async_error) > > + goto Complete; > > + > > Same question. Same answer :) Brian > > if (dev->pm_domain) { > > info = "late power domain "; > > callback = pm_late_early_op(&dev->pm_domain->ops, state); > > > > Thanks, > Rafael >