All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Marco Elver <elver@google.com>
Cc: Hillf Danton <hdanton@sina.com>,
	syzbot <syzbot+9228d6098455bb209ec8@syzkaller.appspotmail.com>,
	linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com
Subject: Re: [syzbot] KASAN: use-after-free Read in task_work_run (2)
Date: Wed, 23 Nov 2022 17:27:15 +0100	[thread overview]
Message-ID: <Y35J4383T5rAjSjO@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <Y340Y2iwy3YubOk1@elver.google.com>

On Wed, Nov 23, 2022 at 03:55:31PM +0100, Marco Elver wrote:

> > > --- x/kernel/events/core.c
> > > +++ c/kernel/events/core.c
> > > @@ -2291,6 +2291,7 @@ event_sched_out(struct perf_event *event
> > >                     !event->pending_work) {
> > >                         event->pending_work = 1;
> > >                         dec = false;
> > > +                       atomic_long_inc(&event->refcount);
> > >                         task_work_add(current, &event->pending_task, TWA_RESUME);
> > >                 }
> > >                 if (dec)
> > > @@ -6561,6 +6562,8 @@ static void perf_pending_task(struct cal
> > >         struct perf_event *event = container_of(head, struct perf_event, pending_task);
> > >         int rctx;
> > >
> > > +       if (event->state == PERF_EVENT_STATE_DEAD)
> > > +               goto out;
> > >         /*
> > >          * If we 'fail' here, that's OK, it means recursion is already disabled
> > >          * and we won't recurse 'further'.
> > > @@ -6577,6 +6580,8 @@ static void perf_pending_task(struct cal
> > >         if (rctx >= 0)
> > >                 perf_swevent_put_recursion_context(rctx);
> > >         preempt_enable_notrace();
> > > +out:
> > > +       put_event(event);
> > >  }
> > >
> > >  #ifdef CONFIG_GUEST_PERF_EVENTS

This is broken and will corrupt ctx->nr_pending.


> My guess is that the __fput task work is in the same task as the perf
> task work, and so if we tried to cancel the task work from within
> __fput, it won't actually cancel it if task_work_run() already exchanged
> the 'task_works' list.

That seems very likely indeed.

> So it looks like prolonging the perf events lifetime is the only option
> right now?

Depends a bit on how complicated all this is; at the very least
perf_event_release_kernel() will schedule out the event if it still
running. It does this before switching the state to DEAD (it has to)
which means it can raise perf_pending_task() at this point in time, even
though we're tearing down the event.

This can be avoided by a patch like this...

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9ab0eb073bd5..e9ad1ff7a9f8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2287,6 +2287,7 @@ group_sched_out(struct perf_event *group_event, struct perf_event_context *ctx)
 
 #define DETACH_GROUP	0x01UL
 #define DETACH_CHILD	0x02UL
+#define DETACH_DEAD	0x04UL
 
 /*
  * Cross CPU call to remove a performance event
@@ -2308,12 +2309,20 @@ __perf_remove_from_context(struct perf_event *event,
 		update_cgrp_time_from_cpuctx(cpuctx, false);
 	}
 
+	/*
+	 * Ensure event_sched_out() switches to OFF, at the very least
+	 * this avoids raising perf_pending_task() at this time.
+	 */
+	if (flags & DETACH_DEAD)
+		event->pending_disable = 1;
 	event_sched_out(event, ctx);
 	if (flags & DETACH_GROUP)
 		perf_group_detach(event);
 	if (flags & DETACH_CHILD)
 		perf_child_detach(event);
 	list_del_event(event, ctx);
+	if (flags & DETACH_DEAD)
+		event->state = PERF_EVENT_STATE_DEAD;
 
 	if (!pmu_ctx->nr_events) {
 		pmu_ctx->rotate_necessary = 0;
@@ -5299,9 +5308,7 @@ int perf_event_release_kernel(struct perf_event *event)
 
 	ctx = perf_event_ctx_lock(event);
 	WARN_ON_ONCE(ctx->parent_ctx);
-	perf_remove_from_context(event, DETACH_GROUP);
 
-	raw_spin_lock_irq(&ctx->lock);
 	/*
 	 * Mark this event as STATE_DEAD, there is no external reference to it
 	 * anymore.
@@ -5313,8 +5320,7 @@ int perf_event_release_kernel(struct perf_event *event)
 	 * Thus this guarantees that we will in fact observe and kill _ALL_
 	 * child events.
 	 */
-	event->state = PERF_EVENT_STATE_DEAD;
-	raw_spin_unlock_irq(&ctx->lock);
+	perf_remove_from_context(event, DETACH_GROUP|DETACH_DEAD);
 
 	perf_event_ctx_unlock(event, ctx);
 
---

However; I don't think that actually helps, because in this case the new
task_work would actually still be on the ->task_works list and
task_work_cancel() should've worked.

The other possibility seems to be that the sample happens and we
schedule before close() can terminate the event, which means we've
already got perf_pending_task() queued by the time we get to
perf_remove_from_context().

This means the perf_pending_task() queue happened before the fput()
queue, and it is thus ran later (due to FILO ordering -- also see commit
c82199061009 ("task_work: remove fifo ordering guarantee")).

And I can't really see a way out of that other than doing refcount games
indeed.

There is the straight forward way, similar to what Hillf attempted, and
a really nasty one that avoids the atomics in the common case and is
really only targeted at this case -- given the overhead of signals I'm
thinking simple is better.

---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9ab0eb073bd5..0228ea090b98 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2248,6 +2248,7 @@ event_sched_out(struct perf_event *event, struct perf_event_context *ctx)
 		    !event->pending_work) {
 			event->pending_work = 1;
 			dec = false;
+			WARN_ON_ONCE(!atomic_long_inc_not_zero(&event->refcount));
 			task_work_add(current, &event->pending_task, TWA_RESUME);
 		}
 		if (dec)
@@ -6755,6 +6762,8 @@ static void perf_pending_task(struct callback_head *head)
 	if (rctx >= 0)
 		perf_swevent_put_recursion_context(rctx);
 	preempt_enable_notrace();
+
+	put_event(event);
 }
 
 #ifdef CONFIG_GUEST_PERF_EVENTS


So perhaps both the above..

Does that actually work?

  reply	other threads:[~2022-11-23 16:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-06  7:36 [syzbot] KASAN: use-after-free Read in task_work_run (2) syzbot
2022-09-06  7:44 ` Dmitry Vyukov
2022-09-06  7:44   ` Dmitry Vyukov
2022-10-26 18:29 ` syzbot
     [not found]   ` <20221027030304.3017-1-hdanton@sina.com>
2022-10-27 11:30     ` syzbot
2022-11-23 11:12     ` Marco Elver
2022-11-23 14:55       ` Marco Elver
2022-11-23 16:27         ` Peter Zijlstra [this message]
2022-11-23 17:34           ` Marco Elver
2022-11-23  9:49   ` Dmitry Vyukov
2022-11-23 10:57     ` Marco Elver
2022-11-23 19:32       ` syzbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y35J4383T5rAjSjO@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=elver@google.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=syzbot+9228d6098455bb209ec8@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.