From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63DCCC47083 for ; Wed, 2 Jun 2021 13:06:23 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2682B613D7 for ; Wed, 2 Jun 2021 13:06:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2682B613D7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=DMx1KQOduPjOdLeSdHvCRAQpTPCQbPTSWKxuxDJhyuY=; b=Xdanx+EhqxdoHO nmQQj9c80Wdlbz9VboyHhJgjv3vaNapPPJJGdLqPSmJhgmWWt0Ya7bPeh81iHRhj/+4Ms1wd62r58 l1bhM+fphBCpJJUJ6kvCAvOHp16+nw8VoiOM9i9a2YZ4PsYjyD49LaLdwSf0KR9557+c+uPJq8Gxe 2sBmhikRpZTPtHHq5VHFzzgtrbMkU8jRiMPMETqJcY4ldlnUHgvprm5n9/P4aEjfeIFHU8dcTJSx7 3e6LnSsBc7TgmDazXywUaZ7CwqvlelfM8oUYF0Ri1yKzbtzfV4X4OgS2d3e29Q1uM8l+8zEbj65eW AfSHO9KgR0meRukPA2tw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1loQXf-004Ec2-9n; Wed, 02 Jun 2021 13:04:16 +0000 Received: from mail.kernel.org ([198.145.29.99]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1loQOl-004Bz9-1Y for linux-arm-kernel@lists.infradead.org; Wed, 02 Jun 2021 12:55:04 +0000 Received: by mail.kernel.org (Postfix) with ESMTPSA id C0A93613F1; Wed, 2 Jun 2021 12:54:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1622638500; bh=Gb1jSzZgXPTfvYqSsFl4X7ozxzFoz0HLXy0fq/WZtik=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=hJn2qqH7S+Lfu/yaubMzR6Uwb/4v7iZ7/BptH0vFpRYoC3Ii1N1EOpH/SMzxEzu6e HzlGNDuAsQ/Gkj0SPN6SPS8vw8LMD7UXQwgQxBu2iJmLfIYNDSHvEV1rsAU9UVqoZ+ 1jLjxaqpWXoZRrrQKKBWACOlg5jl7C2cuPnkLHuXE4ScYlLTLkvxcs6lvmdl4oHpn7 3imfcxZH3xCYsrmNwXSvtbgq9Iwn8oXHS2Oj9IXV4tWcUxqnZmqmYsFpOvYth27mlj l6PjWykTq9Nft67NXXAjqj4S8sDRhyfVEUJjjq2KS7ooi4qop9WjYZVVrmo/e5zV0r bsXUQPRDlHpwg== Date: Wed, 2 Jun 2021 13:54:53 +0100 From: Will Deacon To: Peter Zijlstra Cc: linux-arm-kernel@lists.infradead.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, Catalin Marinas , Marc Zyngier , Greg Kroah-Hartman , Morten Rasmussen , Qais Yousef , Suren Baghdasaryan , Quentin Perret , Tejun Heo , Johannes Weiner , Ingo Molnar , Juri Lelli , Vincent Guittot , "Rafael J. Wysocki" , Dietmar Eggemann , Daniel Bristot de Oliveira , kernel-team@android.com Subject: Re: [RFC][PATCH] freezer,sched: Rewrite core freezer logic Message-ID: <20210602125452.GG30593@willie-the-truck> References: <20210525151432.16875-1-will@kernel.org> <20210525151432.16875-17-will@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210602_055503_178526_3785C8D7 X-CRM114-Status: GOOD ( 46.93 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Peter, On Tue, Jun 01, 2021 at 01:27:59PM +0200, Peter Zijlstra wrote: > On Tue, Jun 01, 2021 at 10:21:15AM +0200, Peter Zijlstra wrote: > > > > Hi, > > > > This here rewrites the core freezer to behave better wrt thawing. By > > replacing PF_FROZEN with TASK_FROZEN, a special block state, it is > > ensured frozen tasks stay frozen until woken and don't randomly wake up > > early, as is currently possible. > > > > As such, it does away with PF_FROZEN and PF_FREEZER_SKIP (yay). > > > > It does however completely wreck kernel/cgroup/legacy_freezer.c and I've > > not yet spend any time on trying to figure out that code, will do so > > shortly. > > > > Other than that, the freezer seems to work fine, I've tested it with: > > > > echo freezer > /sys/power/pm_test > > echo mem > /sys/power/state > > > > Even while having a GDB session active, and that all works. > > > > Another notable bit is in init/do_mounts_initrd.c; afaict that has been > > 'broken' for quite a while and is simply removed. > > > > Please have a look. > > > > Somewhat-Signed-off-by: Peter Zijlstra (Intel) > > cgroup crud now compiles, also fixed some allmodconfig fails. There's a lot here, but generally I really like the look of it, especially making the "freezable" waits explicit. I've left a few comments below. > drivers/android/binder.c | 4 +- > drivers/media/pci/pt3/pt3.c | 4 +- > fs/cifs/inode.c | 4 +- > fs/cifs/transport.c | 5 +- > fs/coredump.c | 4 +- > fs/nfs/file.c | 3 +- > fs/nfs/inode.c | 12 +- > fs/nfs/nfs3proc.c | 3 +- > fs/nfs/nfs4proc.c | 14 +-- > fs/nfs/nfs4state.c | 3 +- > fs/nfs/pnfs.c | 4 +- > fs/xfs/xfs_trans_ail.c | 8 +- > include/linux/completion.h | 2 + > include/linux/freezer.h | 244 ++--------------------------------------- > include/linux/sched.h | 9 +- > include/linux/sunrpc/sched.h | 7 +- > include/linux/wait.h | 40 ++++++- > init/do_mounts_initrd.c | 7 +- > kernel/cgroup/legacy_freezer.c | 23 ++-- > kernel/exit.c | 4 +- > kernel/fork.c | 4 +- > kernel/freezer.c | 115 +++++++++++++------ > kernel/futex.c | 4 +- > kernel/hung_task.c | 4 +- > kernel/power/main.c | 5 +- > kernel/power/process.c | 10 +- > kernel/sched/completion.c | 16 +++ > kernel/sched/core.c | 2 +- > kernel/signal.c | 14 +-- > kernel/time/hrtimer.c | 4 +- > mm/khugepaged.c | 4 +- > net/sunrpc/sched.c | 12 +- > net/unix/af_unix.c | 8 +- > 33 files changed, 225 insertions(+), 381 deletions(-) There's also Documentation/power/freezing-of-tasks.rst to update. I'm not sure if fs/proc/array.c should be updated to display frozen tasks; I couldn't see how that was useful, but thought I'd mention it anyway. > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 2982cfab1ae9..bfadc1dbcf24 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -95,7 +95,12 @@ struct task_group; > #define TASK_WAKING 0x0200 > #define TASK_NOLOAD 0x0400 > #define TASK_NEW 0x0800 > -#define TASK_STATE_MAX 0x1000 > +#define TASK_FREEZABLE 0x1000 > +#define __TASK_FREEZABLE_UNSAFE 0x2000 Give that this is only needed to avoid lockdep checks, maybe we should avoid allocating the bit if lockdep is not enabled? Otherwise, people might start to use it for other things. > +#define TASK_FROZEN 0x4000 > +#define TASK_STATE_MAX 0x8000 > + > +#define TASK_FREEZABLE_UNSAFE (TASK_FREEZABLE | __TASK_FREEZABLE_UNSAFE) We probably want to preserve the "DO NOT ADD ANY NEW CALLERS OF THIS STATE" comment for the unsafe stuff. > diff --git a/kernel/freezer.c b/kernel/freezer.c > index dc520f01f99d..df235fba6989 100644 > --- a/kernel/freezer.c > +++ b/kernel/freezer.c > @@ -13,8 +13,8 @@ > #include > > /* total number of freezing conditions in effect */ > -atomic_t system_freezing_cnt = ATOMIC_INIT(0); > -EXPORT_SYMBOL(system_freezing_cnt); > +DEFINE_STATIC_KEY_FALSE(freezer_active); > +EXPORT_SYMBOL(freezer_active); > > /* indicate whether PM freezing is in effect, protected by > * system_transition_mutex > @@ -29,7 +29,7 @@ static DEFINE_SPINLOCK(freezer_lock); > * freezing_slow_path - slow path for testing whether a task needs to be frozen > * @p: task to be tested > * > - * This function is called by freezing() if system_freezing_cnt isn't zero > + * This function is called by freezing() if freezer_active isn't zero > * and tests whether @p needs to enter and stay in frozen state. Can be > * called under any context. The freezers are responsible for ensuring the > * target tasks see the updated state. > @@ -52,41 +52,67 @@ bool freezing_slow_path(struct task_struct *p) > } > EXPORT_SYMBOL(freezing_slow_path); > > +/* Recursion relies on tail-call optimization to not blow away the stack */ > +static bool __frozen(struct task_struct *p) > +{ > + if (p->state == TASK_FROZEN) > + return true; READ_ONCE()? > + > + /* > + * If stuck in TRACED, and the ptracer is FROZEN, we're frozen too. > + */ > + if (task_is_traced(p)) > + return frozen(rcu_dereference(p->parent)); > + > + /* > + * If stuck in STOPPED and the parent is FROZEN, we're frozen too. > + */ > + if (task_is_stopped(p)) > + return frozen(rcu_dereference(p->real_parent)); This looks convincing, but I really can't tell if we're missing anything. > +static bool __freeze_task(struct task_struct *p) > +{ > + unsigned long flags; > + unsigned int state; > + bool frozen = false; > + > + raw_spin_lock_irqsave(&p->pi_lock, flags); > + state = READ_ONCE(p->state); > + if (state & TASK_FREEZABLE) { > + /* > + * Only TASK_NORMAL can be augmented with TASK_FREEZABLE, > + * since they can suffer spurious wakeups. > + */ > + WARN_ON_ONCE(!(state & TASK_NORMAL)); > + > +#ifdef CONFIG_LOCKDEP > + /* > + * It's dangerous to freeze with locks held; there be dragons there. > + */ > + if (!(state & __TASK_FREEZABLE_UNSAFE)) > + WARN_ON_ONCE(debug_locks && p->lockdep_depth); > +#endif > + > + p->state = TASK_FROZEN; > + frozen = true; > + } > + raw_spin_unlock_irqrestore(&p->pi_lock, flags); > + > + return frozen; > +} > + > /** > * freeze_task - send a freeze request to given task > * @p: task to send the request to > @@ -116,20 +173,8 @@ bool freeze_task(struct task_struct *p) > { > unsigned long flags; > > - /* > - * This check can race with freezer_do_not_count, but worst case that > - * will result in an extra wakeup being sent to the task. It does not > - * race with freezer_count(), the barriers in freezer_count() and > - * freezer_should_skip() ensure that either freezer_count() sees > - * freezing == true in try_to_freeze() and freezes, or > - * freezer_should_skip() sees !PF_FREEZE_SKIP and freezes the task > - * normally. > - */ > - if (freezer_should_skip(p)) > - return false; > - > spin_lock_irqsave(&freezer_lock, flags); > - if (!freezing(p) || frozen(p)) { > + if (!freezing(p) || frozen(p) || __freeze_task(p)) { > spin_unlock_irqrestore(&freezer_lock, flags); > return false; > } I've been trying to figure out how this serialises with ttwu(), given that frozen(p) will go and read p->state. I suppose it works out because only the freezer can wake up tasks from the FROZEN state, but it feels a bit brittle. > @@ -137,7 +182,7 @@ bool freeze_task(struct task_struct *p) > if (!(p->flags & PF_KTHREAD)) > fake_signal_wake_up(p); > else > - wake_up_state(p, TASK_INTERRUPTIBLE); > + wake_up_state(p, TASK_INTERRUPTIBLE); // TASK_NORMAL ?!? > > spin_unlock_irqrestore(&freezer_lock, flags); > return true; > @@ -148,8 +193,8 @@ void __thaw_task(struct task_struct *p) > unsigned long flags; > > spin_lock_irqsave(&freezer_lock, flags); > - if (frozen(p)) > - wake_up_process(p); > + WARN_ON_ONCE(freezing(p)); > + wake_up_state(p, TASK_FROZEN | TASK_NORMAL); Why do we need TASK_NORMAL here? Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel