From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C434C41514 for ; Mon, 2 Sep 2019 13:40:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6E29B23431 for ; Mon, 2 Sep 2019 13:40:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731506AbfIBNkI (ORCPT ); Mon, 2 Sep 2019 09:40:08 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36510 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730375AbfIBNkH (ORCPT ); Mon, 2 Sep 2019 09:40:07 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DC9F05AFF8; Mon, 2 Sep 2019 13:40:06 +0000 (UTC) Received: from dhcp-27-174.brq.redhat.com (unknown [10.43.17.63]) by smtp.corp.redhat.com (Postfix) with SMTP id 774FC5780; Mon, 2 Sep 2019 13:40:04 +0000 (UTC) Received: by dhcp-27-174.brq.redhat.com (nbSMTP-1.00) for uid 1000 oleg@redhat.com; Mon, 2 Sep 2019 15:40:06 +0200 (CEST) Date: Mon, 2 Sep 2019 15:40:03 +0200 From: Oleg Nesterov To: "Eric W. Biederman" Cc: Linus Torvalds , Russell King - ARM Linux admin , Peter Zijlstra , Chris Metcalf , Christoph Lameter , Kirill Tkhai , Mike Galbraith , Thomas Gleixner , Ingo Molnar , Linux List Kernel Mailing Subject: Re: [BUG] Use of probe_kernel_address() in task_rcu_dereference() without checking return value Message-ID: <20190902134003.GA14770@redhat.com> References: <20190830140805.GD13294@shell.armlinux.org.uk> <20190830160957.GC2634@redhat.com> <87o906wimo.fsf@x220.int.ebiederm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87o906wimo.fsf@x220.int.ebiederm.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Mon, 02 Sep 2019 13:40:07 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/30, Eric W. Biederman wrote: > > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -182,6 +182,24 @@ static void delayed_put_task_struct(struct rcu_head *rhp) > put_task_struct(tsk); > } > > +void put_dead_task_struct(struct task_struct *task) > +{ > + bool delay = false; > + unsigned long flags; > + > + /* Is the task both reaped and no longer being scheduled? */ > + raw_spin_lock_irqsave(&task->pi_lock, flags); > + if ((task->state == TASK_DEAD) && > + (cmpxchg(&task->exit_state, EXIT_DEAD, EXIT_RCU) == EXIT_DEAD)) > + delay = true; > + raw_spin_lock_irqrestore(&task->pi_lock, flags); > + > + /* If both are true use rcu delay the put_task_struct */ > + if (delay) > + call_rcu(&task->rcu, delayed_put_task_struct); > + else > + put_task_struct(task); > +} > > void release_task(struct task_struct *p) > { > @@ -222,76 +240,13 @@ void release_task(struct task_struct *p) > > write_unlock_irq(&tasklist_lock); > release_thread(p); > - call_rcu(&p->rcu, delayed_put_task_struct); > + put_dead_task_struct(p); I had a similar change in mind, see below. This is subjective, but to me it looks more simple and clean. Oleg. diff --git a/include/linux/sched.h b/include/linux/sched.h index 8dc1811..1f9b021 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1134,7 +1134,10 @@ struct task_struct { struct tlbflush_unmap_batch tlb_ubc; - struct rcu_head rcu; + union { + bool xxx; + struct rcu_head rcu; + }; /* Cache last used pipe for splice(): */ struct pipe_inode_info *splice_pipe; diff --git a/kernel/exit.c b/kernel/exit.c index a75b6a7..baacfce 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -182,6 +182,11 @@ static void delayed_put_task_struct(struct rcu_head *rhp) put_task_struct(tsk); } +void call_delayed_put_task_struct(struct task_struct *p) +{ + if (xchg(&p->xxx, 1)) + call_rcu(&p->rcu, delayed_put_task_struct); +} void release_task(struct task_struct *p) { @@ -222,7 +227,7 @@ void release_task(struct task_struct *p) write_unlock_irq(&tasklist_lock); release_thread(p); - call_rcu(&p->rcu, delayed_put_task_struct); + call_delayed_put_task_struct(p); p = leader; if (unlikely(zap_leader)) diff --git a/kernel/fork.c b/kernel/fork.c index d8ae0f1..e90f6de 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -900,11 +900,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) if (orig->cpus_ptr == &orig->cpus_mask) tsk->cpus_ptr = &tsk->cpus_mask; - /* - * One for us, one for whoever does the "release_task()" (usually - * parent) - */ - refcount_set(&tsk->usage, 2); + refcount_set(&tsk->usage, 1); #ifdef CONFIG_BLK_DEV_IO_TRACE tsk->btrace_seq = 0; #endif diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2b037f1..e77389c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3135,7 +3135,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) /* Task is done with its stack. */ put_task_stack(prev); - put_task_struct(prev); + call_delayed_put_task_struct(prev); } tick_nohz_task_switch();