From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752761AbdCOQpT (ORCPT ); Wed, 15 Mar 2017 12:45:19 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53150 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751183AbdCOQpA (ORCPT ); Wed, 15 Mar 2017 12:45:00 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com D60EE80468 Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=oleg@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com D60EE80468 Date: Wed, 15 Mar 2017 17:43:02 +0100 From: Oleg Nesterov To: Peter Zijlstra Cc: Dmitry Vyukov , Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , LKML , Mathieu Desnoyers , syzkaller Subject: Re: perf: use-after-free in perf_release Message-ID: <20170315164302.GA17317@redhat.com> References: <20170307165131.GA6097@redhat.com> <20170314125508.GK3343@twins.programming.kicks-ass.net> <20170314140302.GA28146@redhat.com> <20170314140754.GG3328@twins.programming.kicks-ass.net> <20170314143010.GA30351@redhat.com> <20170314150241.GO5680@worktop> <20170314151910.GA31766@redhat.com> <20170314152625.GP5680@worktop> <20170314155949.GE32474@worktop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170314155949.GE32474@worktop> User-Agent: Mutt/1.5.18 (2008-05-17) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 15 Mar 2017 16:45:00 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/14, Peter Zijlstra wrote: > > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -10346,6 +10346,17 @@ void perf_event_free_task(struct task_struct *task) > continue; > > mutex_lock(&ctx->mutex); > + raw_spin_lock_irq(&ctx->lock); > + /* > + * Destroy the task <-> ctx relation and mark the context dead. > + * > + * This is important because even though the task hasn't been > + * exposed yet the context has been (through child_list). > + */ > + RCU_INIT_POINTER(task->perf_event_ctxp[ctxn], NULL); > + WRITE_ONCE(ctx->task, TASK_TOMBSTONE); > + put_task_struct(task); /* cannot be last */ > + raw_spin_unlock_irq(&ctx->lock); Agreed, this is what I had in mind. Although you know, I spent 3 hours looking at your patch and I still can't convince myself I am really sure it closes all races ;) OK, I believe this is correct. And iiuc both RCU_INIT_POINTER(NULL) and put_task_struct() are not strictly necessary? At least until we add WARN_ON(tsk->usage != 2) before free_task() in copy process(). --------------------------------------------------------------------- This is off-topic, but to me list_for_each_entry(event->child_list) in perf_event_release_kernel() looks very confusing and misleading. And list_first_entry_or_null(), we do not really need NULL if list is empty, tmp == child should be F even if we use list_first_entry(). And given that we already have list_is_last(), it would be nice to add list_is_first() and cleanup perf_event_release_kernel() a bit: --- x/kernel/events/core.c +++ x/kernel/events/core.c @@ -4152,7 +4152,7 @@ static void put_event(struct perf_event int perf_event_release_kernel(struct perf_event *event) { struct perf_event_context *ctx = event->ctx; - struct perf_event *child, *tmp; + struct perf_event *child; /* * If we got here through err_file: fput(event_file); we will not have @@ -4190,8 +4190,9 @@ int perf_event_release_kernel(struct per again: mutex_lock(&event->child_mutex); - list_for_each_entry(child, &event->child_list, child_list) { - + if (!list_empty(&event->child_list)) { + child = list_first_entry(&event->child_list, + struct perf_event, child_list); /* * Cannot change, child events are not migrated, see the * comment with perf_event_ctx_lock_nested(). @@ -4221,9 +4222,7 @@ again: * state, if child is still the first entry, it didn't get freed * and we can continue doing so. */ - tmp = list_first_entry_or_null(&event->child_list, - struct perf_event, child_list); - if (tmp == child) { + if (list_is_first(child, &event->child_list)) { perf_remove_from_context(child, DETACH_GROUP); list_del(&child->child_list); free_event(child); But we can't, because static inline int list_is_first(const struct list_head *list, const struct list_head *head) { return list->prev == head; } won't work, "child" can be freed so we can't dereference it, and static inline int list_is_first(const struct list_head *list, const struct list_head *head) { return head->next == list; } won't be symmetrical with list_is_last() we already have. Oleg.