From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C33A5C4CEC7 for ; Sun, 15 Sep 2019 14:09:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 914CC214DA for ; Sun, 15 Sep 2019 14:09:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1568556556; bh=rmNKWNk/aXvpVA0xUaI9iWM2hf/hp6RjrNoDZVj/K74=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:List-ID: From; b=PC2kVIw+HyePIyFt8OiCLIUxY/ORjZYPkWmGzOiDm/lHtZGemZ0DZWtnMpvCVtyR+ dhttcffkP7FZLqR3dUOLMbdw87TrYczLQCh8Rgeb1vx+97I2yehDyOodQ7/EYH8jQF yBTNp2ePoGkv+RvNoKsOG0vKKQr5G+fHp4CU3yaU= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731612AbfIOOJP (ORCPT ); Sun, 15 Sep 2019 10:09:15 -0400 Received: from mail.kernel.org ([198.145.29.99]:50196 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726024AbfIOOJP (ORCPT ); Sun, 15 Sep 2019 10:09:15 -0400 Received: from paulmck-ThinkPad-P72 (96-84-246-146-static.hfc.comcastbusiness.net [96.84.246.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id F2003214D9; Sun, 15 Sep 2019 14:09:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1568556553; bh=rmNKWNk/aXvpVA0xUaI9iWM2hf/hp6RjrNoDZVj/K74=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=TBZWLPqnS/yaAw+qJQMjb9WMQD0SJFfDDS7DWzm5GQSc2nnA2Ewn3YG/u/mPCyzl8 nhcL5f20JvInF34QIpSarjucakVhlg6SAp1JrxOSNg4demnsj6oGFZbQXQ3a1Lf+nG GbsgUXLSC4FscM6vEO9o0qRqxrycLZx2pNHt7pHM= Date: Sun, 15 Sep 2019 07:09:11 -0700 From: "Paul E. McKenney" To: "Eric W. Biederman" Cc: Peter Zijlstra , Oleg Nesterov , Russell King - ARM Linux admin , Chris Metcalf , Christoph Lameter , Kirill Tkhai , Mike Galbraith , Thomas Gleixner , Ingo Molnar , Linux List Kernel Mailing , Davidlohr Bueso , Linus Torvalds Subject: Re: [PATCH v2 2/4] task: Ensure tasks are available for a grace period after leaving the runqueue Message-ID: <20190915140911.GA17248@paulmck-ThinkPad-P72> Reply-To: paulmck@kernel.org References: <20190830160957.GC2634@redhat.com> <87o906wimo.fsf@x220.int.ebiederm.org> <20190902134003.GA14770@redhat.com> <87tv9uiq9r.fsf@x220.int.ebiederm.org> <87k1aqt23r.fsf_-_@x220.int.ebiederm.org> <87muf7f4bf.fsf_-_@x220.int.ebiederm.org> <87r24jdpl5.fsf_-_@x220.int.ebiederm.org> <20190915140752.GJ30224@paulmck-ThinkPad-P72> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190915140752.GJ30224@paulmck-ThinkPad-P72> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Sep 15, 2019 at 07:07:52AM -0700, Paul E. McKenney wrote: > On Sat, Sep 14, 2019 at 07:33:58AM -0500, Eric W. Biederman wrote: > > > > In the ordinary case today the rcu grace period for a task_struct is > > triggered when another process wait's for it's zombine and causes the Oh, and "waits for its", just to hit the grammar en passant... ;-) Thanx, Paul > > kernel to call release_task(). As the waiting task has to receive a > > signal and then act upon it before this happens, typically this will > > occur after the original task as been removed from the runqueue. > > > > Unfortunaty in some cases such as self reaping tasks it can be shown > > that release_task() will be called starting the grace period for > > task_struct long before the task leaves the runqueue. > > > > Therefore use put_task_struct_rcu_user in finish_task_switch to > > guarantee that the there is a rcu lifetime after the task > > leaves the runqueue. > > > > Besides the change in the start of the rcu grace period for the > > task_struct this change may cause perf_event_delayed_put and > > trace_sched_process_free. The function perf_event_delayed_put boils > > down to just a WARN_ON for cases that I assume never show happen. So > > I don't see any problem with delaying it. > > > > The function trace_sched_process_free is a trace point and thus > > visible to user space. Occassionally userspace has the strangest > > dependencies so this has a miniscule chance of causing a regression. > > This change only changes the timing of when the tracepoint is called. > > The change in timing arguably gives userspace a more accurate picture > > of what is going on. So I don't expect there to be a regression. > > > > In the case where a task self reaps we are pretty much guaranteed that > > the rcu grace period is delayed. So we should get quite a bit of > > coverage in of this worst case for the change in a normal threaded > > workload. So I expect any issues to turn up quickly or not at all. > > > > I have lightly tested this change and everything appears to work > > fine. > > > > Inspired-by: Linus Torvalds > > Inspired-by: Oleg Nesterov > > Signed-off-by: "Eric W. Biederman" > > --- > > kernel/fork.c | 11 +++++++---- > > kernel/sched/core.c | 2 +- > > 2 files changed, 8 insertions(+), 5 deletions(-) > > > > diff --git a/kernel/fork.c b/kernel/fork.c > > index 9f04741d5c70..7a74ade4e7d6 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -900,10 +900,13 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) > > if (orig->cpus_ptr == &orig->cpus_mask) > > tsk->cpus_ptr = &tsk->cpus_mask; > > > > - /* One for the user space visible state that goes away when reaped. */ > > - refcount_set(&tsk->rcu_users, 1); > > - /* One for the rcu users, and one for the scheduler */ > > - refcount_set(&tsk->usage, 2); > > + /* > > + * One for the user space visible state that goes away when reaped. > > + * One for the scheduler. > > + */ > > + refcount_set(&tsk->rcu_users, 2); > > OK, this would allow us to add a later decrement-and-test of > ->rcu_users ... > > > + /* One for the rcu users */ > > + refcount_set(&tsk->usage, 1); > > #ifdef CONFIG_BLK_DEV_IO_TRACE > > tsk->btrace_seq = 0; > > #endif > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 2b037f195473..69015b7c28da 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -3135,7 +3135,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) > > /* Task is done with its stack. */ > > put_task_stack(prev); > > > > - put_task_struct(prev); > > + put_task_struct_rcu_user(prev); > > ... which is here. And this looks to be invoked from the __schedule() > called from do_task_dead() at the very end of do_exit(). > > This looks plausible, but still requires that it no longer be possible to > enter an RCU read-side critical section that might increment ->rcu_users > after this point in time. This might be enforced by a grace period > between the time that the task was removed from its lists and the current > time (seems unlikely, though, in that case why bother with call_rcu()?) or > by some other synchronization. > > On to the next patch! > > Thanx, Paul > > > } > > > > tick_nohz_task_switch(); > > -- > > 2.21.0.dirty > >