All of lore.kernel.org
 help / color / mirror / Atom feed
* perf: use-after-free in perf_release
@ 2017-03-06  9:57 Dmitry Vyukov
  2017-03-06 12:13 ` Peter Zijlstra
  2017-03-06 13:14 ` Peter Zijlstra
  0 siblings, 2 replies; 46+ messages in thread
From: Dmitry Vyukov @ 2017-03-06  9:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers
  Cc: syzkaller

Hello,

I've got the following use-after-free report while running syzkaller
fuzzer on 86292b33d4b79ee03e2f43ea0381ef85f077c760. Note that the task
is freed right in copy_process due to some error, but it's referenced
by another thread in perf subsystem.

==================================================================
BUG: KASAN: use-after-free in atomic_dec_and_test
arch/x86/include/asm/atomic.h:123 [inline] at addr ffff880079c30158
BUG: KASAN: use-after-free in put_task_struct
include/linux/sched/task.h:93 [inline] at addr ffff880079c30158
BUG: KASAN: use-after-free in put_ctx+0xcf/0x110
kernel/events/core.c:1131 at addr ffff880079c30158
Write of size 4 by task syz-executor6/25698
CPU: 2 PID: 25698 Comm: syz-executor6 Not tainted 4.10.0+ #302
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x2fb/0x3fd lib/dump_stack.c:52
 kasan_object_err+0x1c/0x90 mm/kasan/report.c:166
 print_address_description mm/kasan/report.c:208 [inline]
 kasan_report_error mm/kasan/report.c:292 [inline]
 kasan_report.part.2+0x1b0/0x460 mm/kasan/report.c:314
 kasan_report+0x21/0x30 mm/kasan/report.c:301
 check_memory_region_inline mm/kasan/kasan.c:326 [inline]
 check_memory_region+0x139/0x190 mm/kasan/kasan.c:333
 kasan_check_write+0x14/0x20 mm/kasan/kasan.c:344
 atomic_dec_and_test arch/x86/include/asm/atomic.h:123 [inline]
 put_task_struct include/linux/sched/task.h:93 [inline]
 put_ctx+0xcf/0x110 kernel/events/core.c:1131
 perf_event_release_kernel+0x3ad/0xc90 kernel/events/core.c:4322
 perf_release+0x37/0x50 kernel/events/core.c:4338
 __fput+0x332/0x800 fs/file_table.c:209
 ____fput+0x15/0x20 fs/file_table.c:245
 task_work_run+0x197/0x260 kernel/task_work.c:116
 exit_task_work include/linux/task_work.h:21 [inline]
 do_exit+0xb38/0x29c0 kernel/exit.c:880
 do_group_exit+0x149/0x420 kernel/exit.c:984
 get_signal+0x7e0/0x1820 kernel/signal.c:2318
 do_signal+0xd2/0x2190 arch/x86/kernel/signal.c:808
 exit_to_usermode_loop+0x200/0x2a0 arch/x86/entry/common.c:157
 syscall_return_slowpath arch/x86/entry/common.c:191 [inline]
 do_syscall_64+0x6fc/0x930 arch/x86/entry/common.c:286
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x4458d9
RSP: 002b:00007f3f07187cf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00000000007080c8 RCX: 00000000004458d9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000007080c8
RBP: 00000000007080a8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f3f071889c0 R15: 00007f3f07188700
Object at ffff880079c30140, in cache task_struct size: 5376
Allocated:
PID = 25681
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:513
 set_track mm/kasan/kasan.c:525 [inline]
 kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:616
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:555
 kmem_cache_alloc_node+0x122/0x6f0 mm/slab.c:3662
 alloc_task_struct_node kernel/fork.c:153 [inline]
 dup_task_struct kernel/fork.c:495 [inline]
 copy_process.part.38+0x19c8/0x4aa0 kernel/fork.c:1560
 copy_process kernel/fork.c:1531 [inline]
 _do_fork+0x200/0x1010 kernel/fork.c:1994
 SYSC_clone kernel/fork.c:2104 [inline]
 SyS_clone+0x37/0x50 kernel/fork.c:2098
 do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
 return_from_SYSCALL_64+0x0/0x7a
Freed:
PID = 25681
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:513
 set_track mm/kasan/kasan.c:525 [inline]
 kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:589
 __cache_free mm/slab.c:3514 [inline]
 kmem_cache_free+0x71/0x240 mm/slab.c:3774
 free_task_struct kernel/fork.c:158 [inline]
 free_task+0x151/0x1d0 kernel/fork.c:370
 copy_process.part.38+0x18e5/0x4aa0 kernel/fork.c:1931
 copy_process kernel/fork.c:1531 [inline]
 _do_fork+0x200/0x1010 kernel/fork.c:1994
 SYSC_clone kernel/fork.c:2104 [inline]
 SyS_clone+0x37/0x50 kernel/fork.c:2098
 do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
 return_from_SYSCALL_64+0x0/0x7a

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-06  9:57 perf: use-after-free in perf_release Dmitry Vyukov
@ 2017-03-06 12:13 ` Peter Zijlstra
  2017-03-06 12:17   ` Dmitry Vyukov
  2017-03-06 13:14 ` Peter Zijlstra
  1 sibling, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-06 12:13 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, LKML,
	Mathieu Desnoyers, syzkaller

On Mon, Mar 06, 2017 at 10:57:07AM +0100, Dmitry Vyukov wrote:
> Hello,
> 
> I've got the following use-after-free report while running syzkaller
> fuzzer on 86292b33d4b79ee03e2f43ea0381ef85f077c760. Note that the task
> is freed right in copy_process due to some error, but it's referenced
> by another thread in perf subsystem.

Weird... you don't happen to have a reproduction case available?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-06 12:13 ` Peter Zijlstra
@ 2017-03-06 12:17   ` Dmitry Vyukov
  2017-03-06 12:23     ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2017-03-06 12:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, LKML,
	Mathieu Desnoyers, syzkaller

On Mon, Mar 6, 2017 at 1:13 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Mar 06, 2017 at 10:57:07AM +0100, Dmitry Vyukov wrote:
>> Hello,
>>
>> I've got the following use-after-free report while running syzkaller
>> fuzzer on 86292b33d4b79ee03e2f43ea0381ef85f077c760. Note that the task
>> is freed right in copy_process due to some error, but it's referenced
>> by another thread in perf subsystem.
>
> Weird... you don't happen to have a reproduction case available?


Unfortunately no. I've looked at both logs that I have and there are
no memory allocation failures preceding the crash (however maybe
somebody used NOWARN?). But probably if you inject an error into
copy_process somewhere after perf_event_init_task, it should reproduce
the bug with KASAN I think.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-06 12:17   ` Dmitry Vyukov
@ 2017-03-06 12:23     ` Peter Zijlstra
  2017-03-06 12:27       ` Dmitry Vyukov
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-06 12:23 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, LKML,
	Mathieu Desnoyers, syzkaller

On Mon, Mar 06, 2017 at 01:17:42PM +0100, Dmitry Vyukov wrote:
> On Mon, Mar 6, 2017 at 1:13 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, Mar 06, 2017 at 10:57:07AM +0100, Dmitry Vyukov wrote:
> >> Hello,
> >>
> >> I've got the following use-after-free report while running syzkaller
> >> fuzzer on 86292b33d4b79ee03e2f43ea0381ef85f077c760. Note that the task
> >> is freed right in copy_process due to some error, but it's referenced
> >> by another thread in perf subsystem.
> >
> > Weird... you don't happen to have a reproduction case available?
> 
> 
> Unfortunately no. I've looked at both logs that I have and there are
> no memory allocation failures preceding the crash (however maybe
> somebody used NOWARN?). But probably if you inject an error into
> copy_process somewhere after perf_event_init_task, it should reproduce
> the bug with KASAN I think.

I'll try. Thanks!

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-06 12:23     ` Peter Zijlstra
@ 2017-03-06 12:27       ` Dmitry Vyukov
  2017-03-06 12:47         ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2017-03-06 12:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, LKML,
	Mathieu Desnoyers, syzkaller

[-- Attachment #1: Type: text/plain, Size: 1110 bytes --]

On Mon, Mar 6, 2017 at 1:23 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Mar 06, 2017 at 01:17:42PM +0100, Dmitry Vyukov wrote:
>> On Mon, Mar 6, 2017 at 1:13 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Mon, Mar 06, 2017 at 10:57:07AM +0100, Dmitry Vyukov wrote:
>> >> Hello,
>> >>
>> >> I've got the following use-after-free report while running syzkaller
>> >> fuzzer on 86292b33d4b79ee03e2f43ea0381ef85f077c760. Note that the task
>> >> is freed right in copy_process due to some error, but it's referenced
>> >> by another thread in perf subsystem.
>> >
>> > Weird... you don't happen to have a reproduction case available?
>>
>>
>> Unfortunately no. I've looked at both logs that I have and there are
>> no memory allocation failures preceding the crash (however maybe
>> somebody used NOWARN?). But probably if you inject an error into
>> copy_process somewhere after perf_event_init_task, it should reproduce
>> the bug with KASAN I think.
>
> I'll try. Thanks!


I think you will also need the attached patch. It seems that it was
found due to it. Going to send it out soon.

[-- Attachment #2: atomic.patch --]
[-- Type: text/x-patch, Size: 7088 bytes --]

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index 14635c5ea025..64f0a7fb9b2f 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -2,6 +2,7 @@
 #define _ASM_X86_ATOMIC_H
 
 #include <linux/compiler.h>
+#include <linux/kasan-checks.h>
 #include <linux/types.h>
 #include <asm/alternative.h>
 #include <asm/cmpxchg.h>
@@ -47,6 +48,7 @@ static __always_inline void atomic_set(atomic_t *v, int i)
  */
 static __always_inline void atomic_add(int i, atomic_t *v)
 {
+	kasan_check_write(v, sizeof(*v));
 	asm volatile(LOCK_PREFIX "addl %1,%0"
 		     : "+m" (v->counter)
 		     : "ir" (i));
@@ -61,6 +63,7 @@ static __always_inline void atomic_add(int i, atomic_t *v)
  */
 static __always_inline void atomic_sub(int i, atomic_t *v)
 {
+	kasan_check_write(v, sizeof(*v));
 	asm volatile(LOCK_PREFIX "subl %1,%0"
 		     : "+m" (v->counter)
 		     : "ir" (i));
@@ -77,6 +80,7 @@ static __always_inline void atomic_sub(int i, atomic_t *v)
  */
 static __always_inline bool atomic_sub_and_test(int i, atomic_t *v)
 {
+	kasan_check_write(v, sizeof(*v));
 	GEN_BINARY_RMWcc(LOCK_PREFIX "subl", v->counter, "er", i, "%0", e);
 }
 
@@ -88,6 +92,7 @@ static __always_inline bool atomic_sub_and_test(int i, atomic_t *v)
  */
 static __always_inline void atomic_inc(atomic_t *v)
 {
+	kasan_check_write(v, sizeof(*v));
 	asm volatile(LOCK_PREFIX "incl %0"
 		     : "+m" (v->counter));
 }
@@ -100,6 +105,7 @@ static __always_inline void atomic_inc(atomic_t *v)
  */
 static __always_inline void atomic_dec(atomic_t *v)
 {
+	kasan_check_write(v, sizeof(*v));
 	asm volatile(LOCK_PREFIX "decl %0"
 		     : "+m" (v->counter));
 }
@@ -114,6 +120,7 @@ static __always_inline void atomic_dec(atomic_t *v)
  */
 static __always_inline bool atomic_dec_and_test(atomic_t *v)
 {
+	kasan_check_write(v, sizeof(*v));
 	GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, "%0", e);
 }
 
@@ -127,6 +134,7 @@ static __always_inline bool atomic_dec_and_test(atomic_t *v)
  */
 static __always_inline bool atomic_inc_and_test(atomic_t *v)
 {
+	kasan_check_write(v, sizeof(*v));
 	GEN_UNARY_RMWcc(LOCK_PREFIX "incl", v->counter, "%0", e);
 }
 
@@ -141,6 +149,7 @@ static __always_inline bool atomic_inc_and_test(atomic_t *v)
  */
 static __always_inline bool atomic_add_negative(int i, atomic_t *v)
 {
+	kasan_check_write(v, sizeof(*v));
 	GEN_BINARY_RMWcc(LOCK_PREFIX "addl", v->counter, "er", i, "%0", s);
 }
 
@@ -194,6 +203,7 @@ static inline int atomic_xchg(atomic_t *v, int new)
 #define ATOMIC_OP(op)							\
 static inline void atomic_##op(int i, atomic_t *v)			\
 {									\
+	kasan_check_write(v, sizeof(*v));				\
 	asm volatile(LOCK_PREFIX #op"l %1,%0"				\
 			: "+m" (v->counter)				\
 			: "ir" (i)					\
@@ -258,6 +268,7 @@ static __always_inline int __atomic_add_unless(atomic_t *v, int a, int u)
  */
 static __always_inline short int atomic_inc_short(short int *v)
 {
+	kasan_check_write(v, sizeof(*v));
 	asm(LOCK_PREFIX "addw $1, %0" : "+m" (*v));
 	return *v;
 }
diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index 89ed2f6ae2f7..a75cb76c7b9b 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -42,6 +42,7 @@ static inline void atomic64_set(atomic64_t *v, long i)
  */
 static __always_inline void atomic64_add(long i, atomic64_t *v)
 {
+	kasan_check_write(v, sizeof(*v));
 	asm volatile(LOCK_PREFIX "addq %1,%0"
 		     : "=m" (v->counter)
 		     : "er" (i), "m" (v->counter));
@@ -56,6 +57,7 @@ static __always_inline void atomic64_add(long i, atomic64_t *v)
  */
 static inline void atomic64_sub(long i, atomic64_t *v)
 {
+	kasan_check_write(v, sizeof(*v));
 	asm volatile(LOCK_PREFIX "subq %1,%0"
 		     : "=m" (v->counter)
 		     : "er" (i), "m" (v->counter));
@@ -72,6 +74,7 @@ static inline void atomic64_sub(long i, atomic64_t *v)
  */
 static inline bool atomic64_sub_and_test(long i, atomic64_t *v)
 {
+	kasan_check_write(v, sizeof(*v));
 	GEN_BINARY_RMWcc(LOCK_PREFIX "subq", v->counter, "er", i, "%0", e);
 }
 
@@ -83,6 +86,7 @@ static inline bool atomic64_sub_and_test(long i, atomic64_t *v)
  */
 static __always_inline void atomic64_inc(atomic64_t *v)
 {
+	kasan_check_write(v, sizeof(*v));
 	asm volatile(LOCK_PREFIX "incq %0"
 		     : "=m" (v->counter)
 		     : "m" (v->counter));
@@ -96,6 +100,7 @@ static __always_inline void atomic64_inc(atomic64_t *v)
  */
 static __always_inline void atomic64_dec(atomic64_t *v)
 {
+	kasan_check_write(v, sizeof(*v));
 	asm volatile(LOCK_PREFIX "decq %0"
 		     : "=m" (v->counter)
 		     : "m" (v->counter));
@@ -111,6 +116,7 @@ static __always_inline void atomic64_dec(atomic64_t *v)
  */
 static inline bool atomic64_dec_and_test(atomic64_t *v)
 {
+	kasan_check_write(v, sizeof(*v));
 	GEN_UNARY_RMWcc(LOCK_PREFIX "decq", v->counter, "%0", e);
 }
 
@@ -124,6 +130,7 @@ static inline bool atomic64_dec_and_test(atomic64_t *v)
  */
 static inline bool atomic64_inc_and_test(atomic64_t *v)
 {
+	kasan_check_write(v, sizeof(*v));
 	GEN_UNARY_RMWcc(LOCK_PREFIX "incq", v->counter, "%0", e);
 }
 
@@ -138,6 +145,7 @@ static inline bool atomic64_inc_and_test(atomic64_t *v)
  */
 static inline bool atomic64_add_negative(long i, atomic64_t *v)
 {
+	kasan_check_write(v, sizeof(*v));
 	GEN_BINARY_RMWcc(LOCK_PREFIX "addq", v->counter, "er", i, "%0", s);
 }
 
@@ -233,6 +241,7 @@ static inline long atomic64_dec_if_positive(atomic64_t *v)
 #define ATOMIC64_OP(op)							\
 static inline void atomic64_##op(long i, atomic64_t *v)			\
 {									\
+	kasan_check_write(v, sizeof(*v));				\
 	asm volatile(LOCK_PREFIX #op"q %1,%0"				\
 			: "+m" (v->counter)				\
 			: "er" (i)					\
diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 97848cdfcb1a..1632918cf9b9 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -2,6 +2,7 @@
 #define ASM_X86_CMPXCHG_H
 
 #include <linux/compiler.h>
+#include <linux/kasan-checks.h>
 #include <asm/cpufeatures.h>
 #include <asm/alternative.h> /* Provides LOCK_PREFIX */
 
@@ -41,6 +42,7 @@ extern void __add_wrong_size(void)
 #define __xchg_op(ptr, arg, op, lock)					\
 	({								\
 	        __typeof__ (*(ptr)) __ret = (arg);			\
+	        kasan_check_write((void*)(ptr), sizeof(*(ptr)));	\
 		switch (sizeof(*(ptr))) {				\
 		case __X86_CASE_B:					\
 			asm volatile (lock #op "b %b0, %1\n"		\
@@ -86,6 +88,7 @@ extern void __add_wrong_size(void)
 	__typeof__(*(ptr)) __ret;					\
 	__typeof__(*(ptr)) __old = (old);				\
 	__typeof__(*(ptr)) __new = (new);				\
+	kasan_check_write((void*)(ptr), sizeof(*(ptr)));		\
 	switch (size) {							\
 	case __X86_CASE_B:						\
 	{								\
@@ -171,6 +174,7 @@ extern void __add_wrong_size(void)
 	BUILD_BUG_ON(sizeof(*(p2)) != sizeof(long));			\
 	VM_BUG_ON((unsigned long)(p1) % (2 * sizeof(long)));		\
 	VM_BUG_ON((unsigned long)((p1) + 1) != (unsigned long)(p2));	\
+	kasan_check_write((void*)(p1), 2 * sizeof(*(p1)));		\
 	asm volatile(pfx "cmpxchg%c4b %2; sete %0"			\
 		     : "=a" (__ret), "+d" (__old2),			\
 		       "+m" (*(p1)), "+m" (*(p2))			\

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-06 12:27       ` Dmitry Vyukov
@ 2017-03-06 12:47         ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-06 12:47 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, LKML,
	Mathieu Desnoyers, syzkaller

On Mon, Mar 06, 2017 at 01:27:41PM +0100, Dmitry Vyukov wrote:

> I think you will also need the attached patch. It seems that it was
> found due to it. Going to send it out soon.

Yuck, that's nasty. Although I don't see an alternative there.

You might also want to do the bitops, they suffer the same problem.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-06  9:57 perf: use-after-free in perf_release Dmitry Vyukov
  2017-03-06 12:13 ` Peter Zijlstra
@ 2017-03-06 13:14 ` Peter Zijlstra
  2017-03-06 13:34   ` Dmitry Vyukov
                     ` (2 more replies)
  1 sibling, 3 replies; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-06 13:14 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, LKML,
	Mathieu Desnoyers, syzkaller, Oleg Nesterov

On Mon, Mar 06, 2017 at 10:57:07AM +0100, Dmitry Vyukov wrote:

> ==================================================================
> BUG: KASAN: use-after-free in atomic_dec_and_test
> arch/x86/include/asm/atomic.h:123 [inline] at addr ffff880079c30158
> BUG: KASAN: use-after-free in put_task_struct
> include/linux/sched/task.h:93 [inline] at addr ffff880079c30158
> BUG: KASAN: use-after-free in put_ctx+0xcf/0x110

FWIW, this output is very confusing, is this a result of your
post-processing replicating the line for every 'inlined' part?

> kernel/events/core.c:1131 at addr ffff880079c30158
> Write of size 4 by task syz-executor6/25698

>  atomic_dec_and_test arch/x86/include/asm/atomic.h:123 [inline]
>  put_task_struct include/linux/sched/task.h:93 [inline]
>  put_ctx+0xcf/0x110 kernel/events/core.c:1131
>  perf_event_release_kernel+0x3ad/0xc90 kernel/events/core.c:4322
>  perf_release+0x37/0x50 kernel/events/core.c:4338
>  __fput+0x332/0x800 fs/file_table.c:209
>  ____fput+0x15/0x20 fs/file_table.c:245
>  task_work_run+0x197/0x260 kernel/task_work.c:116
>  exit_task_work include/linux/task_work.h:21 [inline]
>  do_exit+0xb38/0x29c0 kernel/exit.c:880
>  do_group_exit+0x149/0x420 kernel/exit.c:984
>  get_signal+0x7e0/0x1820 kernel/signal.c:2318
>  do_signal+0xd2/0x2190 arch/x86/kernel/signal.c:808
>  exit_to_usermode_loop+0x200/0x2a0 arch/x86/entry/common.c:157
>  syscall_return_slowpath arch/x86/entry/common.c:191 [inline]
>  do_syscall_64+0x6fc/0x930 arch/x86/entry/common.c:286
>  entry_SYSCALL64_slow_path+0x25/0x25

So this is fput()..


> Freed:
> PID = 25681
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:513
>  set_track mm/kasan/kasan.c:525 [inline]
>  kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:589
>  __cache_free mm/slab.c:3514 [inline]
>  kmem_cache_free+0x71/0x240 mm/slab.c:3774
>  free_task_struct kernel/fork.c:158 [inline]
>  free_task+0x151/0x1d0 kernel/fork.c:370
>  copy_process.part.38+0x18e5/0x4aa0 kernel/fork.c:1931
>  copy_process kernel/fork.c:1531 [inline]
>  _do_fork+0x200/0x1010 kernel/fork.c:1994
>  SYSC_clone kernel/fork.c:2104 [inline]
>  SyS_clone+0x37/0x50 kernel/fork.c:2098
>  do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
>  return_from_SYSCALL_64+0x0/0x7a

and this is a failed fork().


However, inherited events don't have a filedesc to fput(), and
similarly, a task that fails for has never been visible to attach a perf
event to because it never hits the pid-hash.

Or so it is assumed.

I'm forever getting lost in the PID code. Oleg, is there any way
find_task_by_vpid() can return a task that can still fail fork() ?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-06 13:14 ` Peter Zijlstra
@ 2017-03-06 13:34   ` Dmitry Vyukov
  2017-03-07  9:08     ` Peter Zijlstra
  2017-03-07 13:16   ` Peter Zijlstra
  2017-03-07 14:04   ` Oleg Nesterov
  2 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2017-03-06 13:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, LKML,
	Mathieu Desnoyers, syzkaller, Oleg Nesterov

On Mon, Mar 6, 2017 at 2:14 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Mar 06, 2017 at 10:57:07AM +0100, Dmitry Vyukov wrote:
>
>> ==================================================================
>> BUG: KASAN: use-after-free in atomic_dec_and_test
>> arch/x86/include/asm/atomic.h:123 [inline] at addr ffff880079c30158
>> BUG: KASAN: use-after-free in put_task_struct
>> include/linux/sched/task.h:93 [inline] at addr ffff880079c30158
>> BUG: KASAN: use-after-free in put_ctx+0xcf/0x110
>
> FWIW, this output is very confusing, is this a result of your
> post-processing replicating the line for every 'inlined' part?


Yes.
We probably should not do this inlining in the header line. But the
problem is that it is very difficult to understand that it is a header
line in general.


>> kernel/events/core.c:1131 at addr ffff880079c30158
>> Write of size 4 by task syz-executor6/25698
>
>>  atomic_dec_and_test arch/x86/include/asm/atomic.h:123 [inline]
>>  put_task_struct include/linux/sched/task.h:93 [inline]
>>  put_ctx+0xcf/0x110 kernel/events/core.c:1131
>>  perf_event_release_kernel+0x3ad/0xc90 kernel/events/core.c:4322
>>  perf_release+0x37/0x50 kernel/events/core.c:4338
>>  __fput+0x332/0x800 fs/file_table.c:209
>>  ____fput+0x15/0x20 fs/file_table.c:245
>>  task_work_run+0x197/0x260 kernel/task_work.c:116
>>  exit_task_work include/linux/task_work.h:21 [inline]
>>  do_exit+0xb38/0x29c0 kernel/exit.c:880
>>  do_group_exit+0x149/0x420 kernel/exit.c:984
>>  get_signal+0x7e0/0x1820 kernel/signal.c:2318
>>  do_signal+0xd2/0x2190 arch/x86/kernel/signal.c:808
>>  exit_to_usermode_loop+0x200/0x2a0 arch/x86/entry/common.c:157
>>  syscall_return_slowpath arch/x86/entry/common.c:191 [inline]
>>  do_syscall_64+0x6fc/0x930 arch/x86/entry/common.c:286
>>  entry_SYSCALL64_slow_path+0x25/0x25
>
> So this is fput()..
>
>
>> Freed:
>> PID = 25681
>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:513
>>  set_track mm/kasan/kasan.c:525 [inline]
>>  kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:589
>>  __cache_free mm/slab.c:3514 [inline]
>>  kmem_cache_free+0x71/0x240 mm/slab.c:3774
>>  free_task_struct kernel/fork.c:158 [inline]
>>  free_task+0x151/0x1d0 kernel/fork.c:370
>>  copy_process.part.38+0x18e5/0x4aa0 kernel/fork.c:1931
>>  copy_process kernel/fork.c:1531 [inline]
>>  _do_fork+0x200/0x1010 kernel/fork.c:1994
>>  SYSC_clone kernel/fork.c:2104 [inline]
>>  SyS_clone+0x37/0x50 kernel/fork.c:2098
>>  do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
>>  return_from_SYSCALL_64+0x0/0x7a
>
> and this is a failed fork().
>
>
> However, inherited events don't have a filedesc to fput(), and
> similarly, a task that fails for has never been visible to attach a perf
> event to because it never hits the pid-hash.
>
> Or so it is assumed.
>
> I'm forever getting lost in the PID code. Oleg, is there any way
> find_task_by_vpid() can return a task that can still fail fork() ?


FWIW here are 2 syzkaller programs that triggered the bug:
https://gist.githubusercontent.com/dvyukov/d67f980050589775237a7fbdff226bec/raw/4bca72861cb2ede64059b6dad403e19f425a361f/gistfile1.txt
They look very similar, so most likely they are a mutation of the same
program. Which may suggest that there is something in that program
that provokes the bug. Note that the calls in these programs are
executed potentially in multiple threads. But at least it can give
some idea wrt e.g. flags passed to perf_event_open.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-06 13:34   ` Dmitry Vyukov
@ 2017-03-07  9:08     ` Peter Zijlstra
  2017-03-07  9:26       ` Dmitry Vyukov
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-07  9:08 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, LKML,
	Mathieu Desnoyers, syzkaller, Oleg Nesterov

On Mon, Mar 06, 2017 at 02:34:50PM +0100, Dmitry Vyukov wrote:
> FWIW here are 2 syzkaller programs that triggered the bug:
> https://gist.githubusercontent.com/dvyukov/d67f980050589775237a7fbdff226bec/raw/4bca72861cb2ede64059b6dad403e19f425a361f/gistfile1.txt

Hurm, previously your gistfile thingies were actual C, but this thing is
gibberish. How do I run it?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-07  9:08     ` Peter Zijlstra
@ 2017-03-07  9:26       ` Dmitry Vyukov
  2017-03-07  9:37         ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2017-03-07  9:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, LKML,
	Mathieu Desnoyers, syzkaller, Oleg Nesterov

On Tue, Mar 7, 2017 at 10:08 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Mar 06, 2017 at 02:34:50PM +0100, Dmitry Vyukov wrote:
>> FWIW here are 2 syzkaller programs that triggered the bug:
>> https://gist.githubusercontent.com/dvyukov/d67f980050589775237a7fbdff226bec/raw/4bca72861cb2ede64059b6dad403e19f425a361f/gistfile1.txt
>
> Hurm, previously your gistfile thingies were actual C, but this thing is
> gibberish. How do I run it?

The same way we did it here:
https://groups.google.com/d/msg/syzkaller/MHXa-o8foyc/yrGfDOrwAQAJ
This will run it in infinite loop in 10 parallel processes:
./syz-execprog -repeat=0 -procs=10 -sandbox=namespace gistfile1.txt
-sandbox=namespace will require CONFIG_USER_NS=y, I am not sure if it
is actually required, but that's how bots triggered it. You can do
-sandbox=none as well.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-07  9:26       ` Dmitry Vyukov
@ 2017-03-07  9:37         ` Peter Zijlstra
  2017-03-07  9:43           ` Dmitry Vyukov
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-07  9:37 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, LKML,
	Mathieu Desnoyers, syzkaller, Oleg Nesterov

On Tue, Mar 07, 2017 at 10:26:20AM +0100, Dmitry Vyukov wrote:
> On Tue, Mar 7, 2017 at 10:08 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, Mar 06, 2017 at 02:34:50PM +0100, Dmitry Vyukov wrote:
> >> FWIW here are 2 syzkaller programs that triggered the bug:
> >> https://gist.githubusercontent.com/dvyukov/d67f980050589775237a7fbdff226bec/raw/4bca72861cb2ede64059b6dad403e19f425a361f/gistfile1.txt
> >
> > Hurm, previously your gistfile thingies were actual C, but this thing is
> > gibberish. How do I run it?
> 
> The same way we did it here:
> https://groups.google.com/d/msg/syzkaller/MHXa-o8foyc/yrGfDOrwAQAJ

Oh right, completely forgot about that. The last gistfile stuff I found
in my history were actual C files.

> This will run it in infinite loop in 10 parallel processes:
> ./syz-execprog -repeat=0 -procs=10 -sandbox=namespace gistfile1.txt
> -sandbox=namespace will require CONFIG_USER_NS=y, I am not sure if it
> is actually required, but that's how bots triggered it. You can do
> -sandbox=none as well.

I still have an ancient syzcaller; -sandbox doesn't exist and I needed
to add -executor=bin/syz-executor but now it appears to run.

I'll go up procs, 10 is somewhat low I feel.


---

root@ivb-ep:~/gopath/src/github.com/google/syzkaller# ./bin/syz-execprog -repeat=0 -procs=10 -executor=bin/syz-executor gistfile2.txt
2017/03/07 10:35:14 parsed 2 programs
2017/03/07 10:35:14 executed 0 programs
result: failed=false hanged=false err=executor is not serving

2017/03/07 10:36:14 executed 10 programs
result: failed=false hanged=false err=executor is not serving

result: failed=false hanged=false err=executor is not serving

result: failed=false hanged=false err=executor is not serving

result: failed=false hanged=false err=executor is not serving

result: failed=false hanged=false err=executor is not serving

result: failed=false hanged=false err=executor is not serving

result: failed=false hanged=false err=executor is not serving

result: failed=false hanged=false err=executor is not serving

result: failed=false hanged=false err=executor is not serving

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-07  9:37         ` Peter Zijlstra
@ 2017-03-07  9:43           ` Dmitry Vyukov
  2017-03-07 10:00             ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2017-03-07  9:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, LKML,
	Mathieu Desnoyers, syzkaller, Oleg Nesterov

On Tue, Mar 7, 2017 at 10:37 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Mar 07, 2017 at 10:26:20AM +0100, Dmitry Vyukov wrote:
>> On Tue, Mar 7, 2017 at 10:08 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Mon, Mar 06, 2017 at 02:34:50PM +0100, Dmitry Vyukov wrote:
>> >> FWIW here are 2 syzkaller programs that triggered the bug:
>> >> https://gist.githubusercontent.com/dvyukov/d67f980050589775237a7fbdff226bec/raw/4bca72861cb2ede64059b6dad403e19f425a361f/gistfile1.txt
>> >
>> > Hurm, previously your gistfile thingies were actual C, but this thing is
>> > gibberish. How do I run it?
>>
>> The same way we did it here:
>> https://groups.google.com/d/msg/syzkaller/MHXa-o8foyc/yrGfDOrwAQAJ
>
> Oh right, completely forgot about that. The last gistfile stuff I found
> in my history were actual C files.
>
>> This will run it in infinite loop in 10 parallel processes:
>> ./syz-execprog -repeat=0 -procs=10 -sandbox=namespace gistfile1.txt
>> -sandbox=namespace will require CONFIG_USER_NS=y, I am not sure if it
>> is actually required, but that's how bots triggered it. You can do
>> -sandbox=none as well.
>
> I still have an ancient syzcaller; -sandbox doesn't exist and I needed
> to add -executor=bin/syz-executor but now it appears to run.
>
> I'll go up procs, 10 is somewhat low I feel.
>
>
> ---
>
> root@ivb-ep:~/gopath/src/github.com/google/syzkaller# ./bin/syz-execprog -repeat=0 -procs=10 -executor=bin/syz-executor gistfile2.txt
> 2017/03/07 10:35:14 parsed 2 programs
> 2017/03/07 10:35:14 executed 0 programs
> result: failed=false hanged=false err=executor is not serving
>
> 2017/03/07 10:36:14 executed 10 programs
> result: failed=false hanged=false err=executor is not serving
>
> result: failed=false hanged=false err=executor is not serving
>
> result: failed=false hanged=false err=executor is not serving
>
> result: failed=false hanged=false err=executor is not serving
>
> result: failed=false hanged=false err=executor is not serving
>
> result: failed=false hanged=false err=executor is not serving
>
> result: failed=false hanged=false err=executor is not serving
>
> result: failed=false hanged=false err=executor is not serving
>
> result: failed=false hanged=false err=executor is not serving


An old syzkaller may not understand part of syscalls in the program
and silently drop them, you need a new one.

Here is a straightforward conversion of the syzkaller program to C
(with/without namespace sandbox):

https://gist.githubusercontent.com/dvyukov/b6540bed50b7da1dff3d7373ba570c77/raw/fd5f2f3aaa52b70b2bb9f114cf8a3226d8a30960/gistfile1.txt
https://gist.githubusercontent.com/dvyukov/dbd8ec38bcb50df4bdc95210d4247b09/raw/f9cbb5e17cd4ff4a7a7881c97dea7d6cd6dd8bf1/gistfile1.txt

That's also with -procs=10, you can change number of procs in main funciton.

But I wasn't able to reproduce the crash using these programs (neither
the syzkaller program), that's why I did not provide it all initially.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-07  9:43           ` Dmitry Vyukov
@ 2017-03-07 10:00             ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-07 10:00 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, LKML,
	Mathieu Desnoyers, syzkaller, Oleg Nesterov

On Tue, Mar 07, 2017 at 10:43:32AM +0100, Dmitry Vyukov wrote:

> An old syzkaller may not understand part of syscalls in the program
> and silently drop them, you need a new one.

That's yucky semantics, better to at least warn on that occasion.

> Here is a straightforward conversion of the syzkaller program to C
> (with/without namespace sandbox):
> 
> https://gist.githubusercontent.com/dvyukov/b6540bed50b7da1dff3d7373ba570c77/raw/fd5f2f3aaa52b70b2bb9f114cf8a3226d8a30960/gistfile1.txt
> https://gist.githubusercontent.com/dvyukov/dbd8ec38bcb50df4bdc95210d4247b09/raw/f9cbb5e17cd4ff4a7a7881c97dea7d6cd6dd8bf1/gistfile1.txt

Thanks!

> That's also with -procs=10, you can change number of procs in main funciton.
> 
> But I wasn't able to reproduce the crash using these programs (neither
> the syzkaller program), that's why I did not provide it all initially.

Right, I'll run them while at the same time trying to see what it is
they're doing to find clues.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-06 13:14 ` Peter Zijlstra
  2017-03-06 13:34   ` Dmitry Vyukov
@ 2017-03-07 13:16   ` Peter Zijlstra
  2017-03-07 13:27     ` Peter Zijlstra
  2017-03-07 14:04   ` Oleg Nesterov
  2 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-07 13:16 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, LKML,
	Mathieu Desnoyers, syzkaller, Oleg Nesterov

On Mon, Mar 06, 2017 at 02:14:59PM +0100, Peter Zijlstra wrote:
> On Mon, Mar 06, 2017 at 10:57:07AM +0100, Dmitry Vyukov wrote:
> 
> > ==================================================================
> > BUG: KASAN: use-after-free in atomic_dec_and_test
> > arch/x86/include/asm/atomic.h:123 [inline] at addr ffff880079c30158
> > BUG: KASAN: use-after-free in put_task_struct
> > include/linux/sched/task.h:93 [inline] at addr ffff880079c30158
> > BUG: KASAN: use-after-free in put_ctx+0xcf/0x110
> 
> FWIW, this output is very confusing, is this a result of your
> post-processing replicating the line for every 'inlined' part?
> 
> > kernel/events/core.c:1131 at addr ffff880079c30158
> > Write of size 4 by task syz-executor6/25698
> 
> >  atomic_dec_and_test arch/x86/include/asm/atomic.h:123 [inline]
> >  put_task_struct include/linux/sched/task.h:93 [inline]
> >  put_ctx+0xcf/0x110 kernel/events/core.c:1131
> >  perf_event_release_kernel+0x3ad/0xc90 kernel/events/core.c:4322
> >  perf_release+0x37/0x50 kernel/events/core.c:4338
> >  __fput+0x332/0x800 fs/file_table.c:209
> >  ____fput+0x15/0x20 fs/file_table.c:245
> >  task_work_run+0x197/0x260 kernel/task_work.c:116
> >  exit_task_work include/linux/task_work.h:21 [inline]
> >  do_exit+0xb38/0x29c0 kernel/exit.c:880
> >  do_group_exit+0x149/0x420 kernel/exit.c:984
> >  get_signal+0x7e0/0x1820 kernel/signal.c:2318
> >  do_signal+0xd2/0x2190 arch/x86/kernel/signal.c:808
> >  exit_to_usermode_loop+0x200/0x2a0 arch/x86/entry/common.c:157
> >  syscall_return_slowpath arch/x86/entry/common.c:191 [inline]
> >  do_syscall_64+0x6fc/0x930 arch/x86/entry/common.c:286
> >  entry_SYSCALL64_slow_path+0x25/0x25
> 
> So this is fput()..
> 
> 
> > Freed:
> > PID = 25681
> >  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
> >  save_stack+0x43/0xd0 mm/kasan/kasan.c:513
> >  set_track mm/kasan/kasan.c:525 [inline]
> >  kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:589
> >  __cache_free mm/slab.c:3514 [inline]
> >  kmem_cache_free+0x71/0x240 mm/slab.c:3774
> >  free_task_struct kernel/fork.c:158 [inline]
> >  free_task+0x151/0x1d0 kernel/fork.c:370
> >  copy_process.part.38+0x18e5/0x4aa0 kernel/fork.c:1931
> >  copy_process kernel/fork.c:1531 [inline]
> >  _do_fork+0x200/0x1010 kernel/fork.c:1994
> >  SYSC_clone kernel/fork.c:2104 [inline]
> >  SyS_clone+0x37/0x50 kernel/fork.c:2098
> >  do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
> >  return_from_SYSCALL_64+0x0/0x7a
> 
> and this is a failed fork().
> 
> 
> However, inherited events don't have a filedesc to fput(), and
> similarly, a task that fails for has never been visible to attach a perf
> event to because it never hits the pid-hash.
> 
> Or so it is assumed.
> 
> I'm forever getting lost in the PID code. Oleg, is there any way
> find_task_by_vpid() can return a task that can still fail fork() ?

So I _think_ find_task_by_vpid() can return an already dead task; and
we'll happily increase task->usage.

Dmitry; I have no idea how easy it is for you to reproduce the thing;
but so far I've not had much success. Could you perhaps stick the below
in?

Once we convert task_struct to refcount_t that should generate a WARN of
its own I suppose.

---

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 000fdb2..612d652 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -763,6 +763,7 @@ struct perf_event_context {
 #ifdef CONFIG_CGROUP_PERF
 	int				nr_cgroups;	 /* cgroup evts */
 #endif
+	int				switches;
 	void				*task_ctx_data; /* pmu specific data */
 	struct rcu_head			rcu_head;
 };
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6f41548f..6455b7a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2902,6 +2902,8 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
 	if (!parent && !next_parent)
 		goto unlock;
 
+	ctx->switches++;
+
 	if (next_parent == ctx || next_ctx == parent || next_parent == parent) {
 		/*
 		 * Looks like the two contexts are clones, so we might be
@@ -3780,6 +3782,12 @@ find_lively_task_by_vpid(pid_t vpid)
 		task = current;
 	else
 		task = find_task_by_vpid(vpid);
+
+	if (task) {
+		if (WARN_ON_ONCE(task->flags & PF_EXITING))
+			task = NULL;
+	}
+
 	if (task)
 		get_task_struct(task);
 	rcu_read_unlock();
@@ -10432,6 +10440,10 @@ void perf_event_free_task(struct task_struct *task)
 
 		mutex_unlock(&ctx->mutex);
 
+		WARN_ON_ONCE(ctx->switches);
+		WARN_ON_ONCE(atomic_read(&ctx->refcount) != 1);
+		WARN_ON_ONCE(ctx->task != task);
+
 		put_ctx(ctx);
 	}
 }

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-07 13:16   ` Peter Zijlstra
@ 2017-03-07 13:27     ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-07 13:27 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, LKML,
	Mathieu Desnoyers, syzkaller, Oleg Nesterov

On Tue, Mar 07, 2017 at 02:16:49PM +0100, Peter Zijlstra wrote:
> So I _think_ find_task_by_vpid() can return an already dead task; and
> we'll happily increase task->usage.

Hurm, so find_get_context() already does the PF_EXITING test. And then
the put_ctx would've been from find_get_context(), not fput().

So still puzzled.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-06 13:14 ` Peter Zijlstra
  2017-03-06 13:34   ` Dmitry Vyukov
  2017-03-07 13:16   ` Peter Zijlstra
@ 2017-03-07 14:04   ` Oleg Nesterov
  2017-03-07 14:17     ` Dmitry Vyukov
  2 siblings, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2017-03-07 14:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On 03/06, Peter Zijlstra wrote:
>
> and this is a failed fork().
>
>
> However, inherited events don't have a filedesc to fput(), and
> similarly, a task that fails for has never been visible to attach a perf
> event to because it never hits the pid-hash.

Yes, it is not visible to find_task_by_vpid() until copy_process() does
attach_pid(PIDTYPE_PID), and copy_process() can't fail after that.

Oleg.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-07 14:04   ` Oleg Nesterov
@ 2017-03-07 14:17     ` Dmitry Vyukov
  2017-03-07 16:51       ` Oleg Nesterov
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2017-03-07 14:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On Tue, Mar 7, 2017 at 3:04 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 03/06, Peter Zijlstra wrote:
>>
>> and this is a failed fork().
>>
>>
>> However, inherited events don't have a filedesc to fput(), and
>> similarly, a task that fails for has never been visible to attach a perf
>> event to because it never hits the pid-hash.
>
> Yes, it is not visible to find_task_by_vpid() until copy_process() does
> attach_pid(PIDTYPE_PID), and copy_process() can't fail after that.


I would what is that that is failed in copy_process. Could it be
perf_event_init_task itself? Maybe it leaves a pointer to p in some
shared state on some error conditions?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-07 14:17     ` Dmitry Vyukov
@ 2017-03-07 16:51       ` Oleg Nesterov
  2017-03-07 17:29         ` Peter Zijlstra
  2017-03-14 12:55         ` Peter Zijlstra
  0 siblings, 2 replies; 46+ messages in thread
From: Oleg Nesterov @ 2017-03-07 16:51 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On 03/07, Dmitry Vyukov wrote:
>
> On Tue, Mar 7, 2017 at 3:04 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 03/06, Peter Zijlstra wrote:
> >>
> >> and this is a failed fork().
> >>
> >>
> >> However, inherited events don't have a filedesc to fput(), and
> >> similarly, a task that fails for has never been visible to attach a perf
> >> event to because it never hits the pid-hash.
> >
> > Yes, it is not visible to find_task_by_vpid() until copy_process() does
> > attach_pid(PIDTYPE_PID), and copy_process() can't fail after that.
>
>
> I would what is that that is failed in copy_process. Could it be
> perf_event_init_task itself? Maybe it leaves a pointer to p in some
> shared state on some error conditions?

I am looking at perf_event_init_task() too and I can't understand the
error handling...

perf_event_init_context() can return success even if inherit_task_group() in
the first list_for_each_entry(pinned_groups) fails, "ret" will be overwritten
by the 2nd list_for_each_entry(flexible_groups) loop. "inherited_all" should
be cleared, still this looks confusing at least.

inherit_event() returns NULL under is_orphaned_event() check, not ERR_PTR().
Is it correct?

Oleg.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-07 16:51       ` Oleg Nesterov
@ 2017-03-07 17:29         ` Peter Zijlstra
  2017-03-14 12:55         ` Peter Zijlstra
  1 sibling, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-07 17:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On Tue, Mar 07, 2017 at 05:51:32PM +0100, Oleg Nesterov wrote:
> On 03/07, Dmitry Vyukov wrote:

> > I would what is that that is failed in copy_process. Could it be
> > perf_event_init_task itself? Maybe it leaves a pointer to p in some
> > shared state on some error conditions?
> 
> I am looking at perf_event_init_task() too and I can't understand the
> error handling...
> 
> perf_event_init_context() can return success even if inherit_task_group() in
> the first list_for_each_entry(pinned_groups) fails, "ret" will be overwritten
> by the 2nd list_for_each_entry(flexible_groups) loop. "inherited_all" should
> be cleared, still this looks confusing at least.
> 
> inherit_event() returns NULL under is_orphaned_event() check, not ERR_PTR().
> Is it correct?

Urgh, there was something tricky there, but I cannot remember, and it
seems we didn't put a comment in either :/

Alexander, can you remember?

But yes, this all looks a tad dodgy, I'll try and have a look, but I
feel like I'm coming down with something :-(

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-07 16:51       ` Oleg Nesterov
  2017-03-07 17:29         ` Peter Zijlstra
@ 2017-03-14 12:55         ` Peter Zijlstra
  2017-03-14 13:24           ` Oleg Nesterov
  2017-03-14 14:03           ` Oleg Nesterov
  1 sibling, 2 replies; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-14 12:55 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On Tue, Mar 07, 2017 at 05:51:32PM +0100, Oleg Nesterov wrote:

> I am looking at perf_event_init_task() too and I can't understand the
> error handling...
> 
> perf_event_init_context() can return success even if inherit_task_group() in
> the first list_for_each_entry(pinned_groups) fails, "ret" will be overwritten
> by the 2nd list_for_each_entry(flexible_groups) loop. "inherited_all" should
> be cleared, still this looks confusing at least.

Yes, this looks buggy. But I cannot explain how that would result in the
observed use-after-free. If we 'loose' an error this way,
perf_event_init_context() will not fail, and hence we'll not hit that
perf_event_free_task() path.

The task would continue living with an incorrectly copied context, and
counter behaviour would be affected.

> inherit_event() returns NULL under is_orphaned_event() check, not ERR_PTR().
> Is it correct?

Yes. This is all a tad tricky, but it seems to be correct.

By returning NULL, not an error, we affect the silent discard of
orphaned events. This is correct, because otherwise
perf_event_release_kernel() would have come by and explicitly discarded
those events for us anyway.


My current patch is below; won't do much good I suspect..

---
 kernel/events/core.c | 55 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 37 insertions(+), 18 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index a17ed56c8ce1..3cd907e00377 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4256,7 +4256,7 @@ int perf_event_release_kernel(struct perf_event *event)
 
 	raw_spin_lock_irq(&ctx->lock);
 	/*
-	 * Mark this even as STATE_DEAD, there is no external reference to it
+	 * Mark this event as STATE_DEAD, there is no external reference to it
 	 * anymore.
 	 *
 	 * Anybody acquiring event->child_mutex after the below loop _must_
@@ -10417,21 +10417,9 @@ void perf_event_free_task(struct task_struct *task)
 			continue;
 
 		mutex_lock(&ctx->mutex);
-again:
-		list_for_each_entry_safe(event, tmp, &ctx->pinned_groups,
-				group_entry)
-			perf_free_event(event, ctx);
-
-		list_for_each_entry_safe(event, tmp, &ctx->flexible_groups,
-				group_entry)
+		list_for_each_entry_safe(event, tmp, &ctx->event_list, event_entry)
 			perf_free_event(event, ctx);
-
-		if (!list_empty(&ctx->pinned_groups) ||
-				!list_empty(&ctx->flexible_groups))
-			goto again;
-
 		mutex_unlock(&ctx->mutex);
-
 		put_ctx(ctx);
 	}
 }
@@ -10469,7 +10457,12 @@ const struct perf_event_attr *perf_event_attrs(struct perf_event *event)
 }
 
 /*
- * inherit a event from parent task to child task:
+ * Inherit a event from parent task to child task.
+ *
+ * Returns:
+ *  - valid pointer on success
+ *  - NULL for orphaned events
+ *  - IS_ERR() on error
  */
 static struct perf_event *
 inherit_event(struct perf_event *parent_event,
@@ -10563,6 +10556,16 @@ inherit_event(struct perf_event *parent_event,
 	return child_event;
 }
 
+/*
+ * Inherits an event group.
+ *
+ * This will quietly suppress orphaned events; !inherit_event() is not an error.
+ * This matches with perf_event_release_kernel() removing all child events.
+ *
+ * Returns:
+ *  - 0 on success
+ *  - <0 on error
+ */
 static int inherit_group(struct perf_event *parent_event,
 	      struct task_struct *parent,
 	      struct perf_event_context *parent_ctx,
@@ -10577,6 +10580,11 @@ static int inherit_group(struct perf_event *parent_event,
 				 child, NULL, child_ctx);
 	if (IS_ERR(leader))
 		return PTR_ERR(leader);
+	/*
+	 * @leader can be NULL here because of is_orphaned_event(). In this
+	 * case inherit_event() will create individual events, similar to what
+	 * perf_group_detach() would do anyway.
+	 */
 	list_for_each_entry(sub, &parent_event->sibling_list, group_entry) {
 		child_ctr = inherit_event(sub, parent, parent_ctx,
 					    child, leader, child_ctx);
@@ -10586,6 +10594,17 @@ static int inherit_group(struct perf_event *parent_event,
 	return 0;
 }
 
+/*
+ * Creates the child task context and tries to inherit the event-group.
+ *
+ * Clears @inherited_all on !attr.inherited or error. Note that we'll leave
+ * inherited_all set when we 'fail' to inherit an orphaned event; this is
+ * consistent with perf_event_release_kernel() removing all child events.
+ *
+ * Returns:
+ *  - 0 on success
+ *  - <0 on error
+ */
 static int
 inherit_task_group(struct perf_event *event, struct task_struct *parent,
 		   struct perf_event_context *parent_ctx,
@@ -10608,7 +10627,6 @@ inherit_task_group(struct perf_event *event, struct task_struct *parent,
 		 * First allocate and initialize a context for the
 		 * child.
 		 */
-
 		child_ctx = alloc_perf_context(parent_ctx->pmu, child);
 		if (!child_ctx)
 			return -ENOMEM;
@@ -10670,7 +10688,7 @@ static int perf_event_init_context(struct task_struct *child, int ctxn)
 		ret = inherit_task_group(event, parent, parent_ctx,
 					 child, ctxn, &inherited_all);
 		if (ret)
-			break;
+			goto out_unlock;
 	}
 
 	/*
@@ -10686,7 +10704,7 @@ static int perf_event_init_context(struct task_struct *child, int ctxn)
 		ret = inherit_task_group(event, parent, parent_ctx,
 					 child, ctxn, &inherited_all);
 		if (ret)
-			break;
+			goto out_unlock;
 	}
 
 	raw_spin_lock_irqsave(&parent_ctx->lock, flags);
@@ -10714,6 +10732,7 @@ static int perf_event_init_context(struct task_struct *child, int ctxn)
 	}
 
 	raw_spin_unlock_irqrestore(&parent_ctx->lock, flags);
+out_unlock:
 	mutex_unlock(&parent_ctx->mutex);
 
 	perf_unpin_context(parent_ctx);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-14 12:55         ` Peter Zijlstra
@ 2017-03-14 13:24           ` Oleg Nesterov
  2017-03-14 13:47             ` Peter Zijlstra
  2017-03-14 14:03           ` Oleg Nesterov
  1 sibling, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2017-03-14 13:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On 03/14, Peter Zijlstra wrote:
>
> On Tue, Mar 07, 2017 at 05:51:32PM +0100, Oleg Nesterov wrote:
>
> > inherit_event() returns NULL under is_orphaned_event() check, not ERR_PTR().
> > Is it correct?
>
> Yes. This is all a tad tricky, but it seems to be correct.
>
> By returning NULL, not an error, we affect the silent discard of
> orphaned events. This is correct, because otherwise
> perf_event_release_kernel() would have come by and explicitly discarded
> those events for us anyway.

Thanks... I'll try to understand this later.

> @@ -10608,7 +10627,6 @@ inherit_task_group(struct perf_event *event, struct task_struct *parent,
>  		 * First allocate and initialize a context for the
>  		 * child.
>  		 */
> -
>  		child_ctx = alloc_perf_context(parent_ctx->pmu, child);
>  		if (!child_ctx)
>  			return -ENOMEM;
> @@ -10670,7 +10688,7 @@ static int perf_event_init_context(struct task_struct *child, int ctxn)
>  		ret = inherit_task_group(event, parent, parent_ctx,
>  					 child, ctxn, &inherited_all);
>  		if (ret)
> -			break;
> +			goto out_unlock;
>  	}
>  
>  	/*
> @@ -10686,7 +10704,7 @@ static int perf_event_init_context(struct task_struct *child, int ctxn)
>  		ret = inherit_task_group(event, parent, parent_ctx,
>  					 child, ctxn, &inherited_all);
>  		if (ret)
> -			break;
> +			goto out_unlock;

With this change you can also simplify inherit_task_group() a little bit,
it no longer needs to nullify *inherited_all if inherit_group() fails.

Oleg.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-14 13:24           ` Oleg Nesterov
@ 2017-03-14 13:47             ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-14 13:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On Tue, Mar 14, 2017 at 02:24:27PM +0100, Oleg Nesterov wrote:
> On 03/14, Peter Zijlstra wrote:
> >
> > On Tue, Mar 07, 2017 at 05:51:32PM +0100, Oleg Nesterov wrote:
> >
> > > inherit_event() returns NULL under is_orphaned_event() check, not ERR_PTR().
> > > Is it correct?
> >
> > Yes. This is all a tad tricky, but it seems to be correct.
> >
> > By returning NULL, not an error, we affect the silent discard of
> > orphaned events. This is correct, because otherwise
> > perf_event_release_kernel() would have come by and explicitly discarded
> > those events for us anyway.
> 
> Thanks... I'll try to understand this later.
> 
> > @@ -10608,7 +10627,6 @@ inherit_task_group(struct perf_event *event, struct task_struct *parent,
> >  		 * First allocate and initialize a context for the
> >  		 * child.
> >  		 */
> > -
> >  		child_ctx = alloc_perf_context(parent_ctx->pmu, child);
> >  		if (!child_ctx)
> >  			return -ENOMEM;
> > @@ -10670,7 +10688,7 @@ static int perf_event_init_context(struct task_struct *child, int ctxn)
> >  		ret = inherit_task_group(event, parent, parent_ctx,
> >  					 child, ctxn, &inherited_all);
> >  		if (ret)
> > -			break;
> > +			goto out_unlock;
> >  	}
> >  
> >  	/*
> > @@ -10686,7 +10704,7 @@ static int perf_event_init_context(struct task_struct *child, int ctxn)
> >  		ret = inherit_task_group(event, parent, parent_ctx,
> >  					 child, ctxn, &inherited_all);
> >  		if (ret)
> > -			break;
> > +			goto out_unlock;
> 
> With this change you can also simplify inherit_task_group() a little bit,
> it no longer needs to nullify *inherited_all if inherit_group() fails.

Ah, that last one is broken because then we forget to re-enable
parent_ctx->rotate_disable.

So if we keep that a break, we still need that inherited_all thing as
well.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-14 12:55         ` Peter Zijlstra
  2017-03-14 13:24           ` Oleg Nesterov
@ 2017-03-14 14:03           ` Oleg Nesterov
  2017-03-14 14:07             ` Peter Zijlstra
  1 sibling, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2017-03-14 14:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On 03/14, Peter Zijlstra wrote:
>
> Yes, this looks buggy. But I cannot explain how that would result in the
> observed use-after-free.

Yes...

Suppose that copy_process() fails after perf_event_init_task(). In this
case perf_event_free_task() does put_ctx(), but if this ctx has another
reference (ctx->refcount > 1) then ctx->task will point to the already
freed task, copy_process() does free_task() at the end of error path.
And we can't replace it with put_task_struct().

I am looking at TASK_TOMBSTONE, perhaps perf_event_free_task() should
use it too?

Oleg.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-14 14:03           ` Oleg Nesterov
@ 2017-03-14 14:07             ` Peter Zijlstra
  2017-03-14 14:30               ` Oleg Nesterov
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-14 14:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On Tue, Mar 14, 2017 at 03:03:02PM +0100, Oleg Nesterov wrote:
> On 03/14, Peter Zijlstra wrote:
> >
> > Yes, this looks buggy. But I cannot explain how that would result in the
> > observed use-after-free.
> 
> Yes...
> 
> Suppose that copy_process() fails after perf_event_init_task(). In this
> case perf_event_free_task() does put_ctx(), but if this ctx has another
> reference (ctx->refcount > 1) then ctx->task will point to the already
> freed task, copy_process() does free_task() at the end of error path.
> And we can't replace it with put_task_struct().
> 
> I am looking at TASK_TOMBSTONE, perhaps perf_event_free_task() should
> use it too?

The idea was that the task isn't visible when we use
perf_event_free_task(). But I'll have a look.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-14 14:07             ` Peter Zijlstra
@ 2017-03-14 14:30               ` Oleg Nesterov
  2017-03-14 15:02                 ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2017-03-14 14:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On 03/14, Peter Zijlstra wrote:
>
> On Tue, Mar 14, 2017 at 03:03:02PM +0100, Oleg Nesterov wrote:
> > On 03/14, Peter Zijlstra wrote:
> > >
> > > Yes, this looks buggy. But I cannot explain how that would result in the
> > > observed use-after-free.
> >
> > Yes...
> >
> > Suppose that copy_process() fails after perf_event_init_task(). In this
> > case perf_event_free_task() does put_ctx(), but if this ctx has another
> > reference (ctx->refcount > 1) then ctx->task will point to the already
> > freed task, copy_process() does free_task() at the end of error path.
> > And we can't replace it with put_task_struct().
> >
> > I am looking at TASK_TOMBSTONE, perhaps perf_event_free_task() should
> > use it too?
>
> The idea was that the task isn't visible when we use
> perf_event_free_task(). But I'll have a look.

I can be easily wrong, I do not understans this code.

But. perf_event_init_task() adds child_event to parent_event->child_list.

If perf_event_release_kernel(parent_event) is called before copy_process()
does perf_event_free_task() which (in particular) removes it from child_list,
perf_event_release_kernel() can find this child_event and do get_ctx(ctx)
(under the list_for_each_entry(child, &event->child_list, child_list) loop).

Then it does put_ctx(ctx), but ctx->task can be already freed by
copy_process()->free_task() in this case.

No?

Oleg.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-14 14:30               ` Oleg Nesterov
@ 2017-03-14 15:02                 ` Peter Zijlstra
  2017-03-14 15:07                   ` Peter Zijlstra
  2017-03-14 15:19                   ` Oleg Nesterov
  0 siblings, 2 replies; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-14 15:02 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On Tue, Mar 14, 2017 at 03:30:11PM +0100, Oleg Nesterov wrote:

> But. perf_event_init_task() adds child_event to parent_event->child_list.
> 
> If perf_event_release_kernel(parent_event) is called before copy_process()
> does perf_event_free_task() which (in particular) removes it from child_list,
> perf_event_release_kernel() can find this child_event and do get_ctx(ctx)
> (under the list_for_each_entry(child, &event->child_list, child_list) loop).

Right; the child_list is the only thing that is exposed. And yes, it
looks like that can interleave just right.

> Then it does put_ctx(ctx), but ctx->task can be already freed by
> copy_process()->free_task() in this case.


	Task1				Task2

	fork()
	  perf_event_init_task()
	  /* ... */
	  goto bad_fork_$foo;
	  /* ... */
	  perf_event_free_task()
	    mutex_lock(ctx->lock)
	    perf_free_event(B)

					perf_event_release_kernel(A)
					  mutex_lock(A->child_mutex)
					  list_for_each_entry(child, ...) {
					    /* child == B */
					    ctx = B->ctx;
					    get_ctx(ctx);
					    mutex_unlock(A->child_mutex);

	      mutex_lock(A->child_mutex)
	      list_del_init(B->child_list)
	      mutex_unlock(A->child_mutex)

	      /* ... */

	    mutex_unlock(ctx->lock);
	    put_ctx() /* >0 */
	  free_task();
					    mutex_lock(ctx->lock);
					    mutex_lock(A->child_mutex);
					    /* ... */
					    mutex_unlock(A->child_mutex);
					    mutex_unlock(ctx->lock)
					    put_ctx() /* 0 */
					      ctx->task && !TOMBSTONE
					        put_task_struct() /* UAF */


Something like that, right?


Let me see if it makes sense to retain perf_event_free_task() at all;
maybe we should always do perf_event_exit_task().

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-14 15:02                 ` Peter Zijlstra
@ 2017-03-14 15:07                   ` Peter Zijlstra
  2017-03-14 15:37                     ` Oleg Nesterov
  2017-03-14 15:19                   ` Oleg Nesterov
  1 sibling, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-14 15:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On Tue, Mar 14, 2017 at 04:02:41PM +0100, Peter Zijlstra wrote:
> On Tue, Mar 14, 2017 at 03:30:11PM +0100, Oleg Nesterov wrote:
> 
> > But. perf_event_init_task() adds child_event to parent_event->child_list.
> > 
> > If perf_event_release_kernel(parent_event) is called before copy_process()
> > does perf_event_free_task() which (in particular) removes it from child_list,
> > perf_event_release_kernel() can find this child_event and do get_ctx(ctx)
> > (under the list_for_each_entry(child, &event->child_list, child_list) loop).
> 
> Right; the child_list is the only thing that is exposed. And yes, it
> looks like that can interleave just right.
> 
> > Then it does put_ctx(ctx), but ctx->task can be already freed by
> > copy_process()->free_task() in this case.
> 
> 
> 	Task1				Task2
> 
> 	fork()
> 	  perf_event_init_task()
> 	  /* ... */
> 	  goto bad_fork_$foo;
> 	  /* ... */
> 	  perf_event_free_task()
> 	    mutex_lock(ctx->lock)
> 	    perf_free_event(B)
> 
> 					perf_event_release_kernel(A)
> 					  mutex_lock(A->child_mutex)
> 					  list_for_each_entry(child, ...) {
> 					    /* child == B */
> 					    ctx = B->ctx;
> 					    get_ctx(ctx);
> 					    mutex_unlock(A->child_mutex);
> 
> 	      mutex_lock(A->child_mutex)
> 	      list_del_init(B->child_list)
> 	      mutex_unlock(A->child_mutex)
> 
> 	      /* ... */
> 
> 	    mutex_unlock(ctx->lock);
> 	    put_ctx() /* >0 */
> 	  free_task();
> 					    mutex_lock(ctx->lock);
> 					    mutex_lock(A->child_mutex);
> 					    /* ... */
> 					    mutex_unlock(A->child_mutex);
> 					    mutex_unlock(ctx->lock)
> 					    put_ctx() /* 0 */
> 					      ctx->task && !TOMBSTONE
> 					        put_task_struct() /* UAF */
> 
> 
> Something like that, right?
> 
> 
> Let me see if it makes sense to retain perf_event_free_task() at all;
> maybe we should always do perf_event_exit_task().

Do we want a WARN_ON_ONCE(atomic_read(&tsk->usage)); in free_task()?
Because in the above scenario we're freeing it with references on.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-14 15:02                 ` Peter Zijlstra
  2017-03-14 15:07                   ` Peter Zijlstra
@ 2017-03-14 15:19                   ` Oleg Nesterov
  2017-03-14 15:26                     ` Peter Zijlstra
  1 sibling, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2017-03-14 15:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On 03/14, Peter Zijlstra wrote:
>
> 	    mutex_unlock(ctx->lock);
> 	    put_ctx() /* >0 */
> 	  free_task();
> 					    mutex_lock(ctx->lock);
> 					    mutex_lock(A->child_mutex);
> 					    /* ... */
> 					    mutex_unlock(A->child_mutex);
> 					    mutex_unlock(ctx->lock)
> 					    put_ctx() /* 0 */
> 					      ctx->task && !TOMBSTONE
> 					        put_task_struct() /* UAF */
>
>
> Something like that, right?

Yes, exactly.

> Let me see if it makes sense to retain perf_event_free_task() at all;
> maybe we should always do perf_event_exit_task().

Yes, perhaps... but this needs changes too. Say, WARN_ON_ONCE(child != current)
in perf_event_exit_task_context(). And even perf_event_task(new => F) does not
look right in this case. In fact it would be simply buggy to do this, this task
was not fully constructed yet, so even perf_event_pid(task) is not safe.

Oleg.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-14 15:19                   ` Oleg Nesterov
@ 2017-03-14 15:26                     ` Peter Zijlstra
  2017-03-14 15:59                       ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-14 15:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On Tue, Mar 14, 2017 at 04:19:10PM +0100, Oleg Nesterov wrote:
> On 03/14, Peter Zijlstra wrote:
> >
> > 	    mutex_unlock(ctx->lock);
> > 	    put_ctx() /* >0 */
> > 	  free_task();
> > 					    mutex_lock(ctx->lock);
> > 					    mutex_lock(A->child_mutex);
> > 					    /* ... */
> > 					    mutex_unlock(A->child_mutex);
> > 					    mutex_unlock(ctx->lock)
> > 					    put_ctx() /* 0 */
> > 					      ctx->task && !TOMBSTONE
> > 					        put_task_struct() /* UAF */
> >
> >
> > Something like that, right?
> 
> Yes, exactly.
> 
> > Let me see if it makes sense to retain perf_event_free_task() at all;
> > maybe we should always do perf_event_exit_task().
> 
> Yes, perhaps... but this needs changes too. Say, WARN_ON_ONCE(child != current)
> in perf_event_exit_task_context(). And even perf_event_task(new => F) does not
> look right in this case. In fact it would be simply buggy to do this, this task
> was not fully constructed yet, so even perf_event_pid(task) is not safe.

Yeah; there's a fair amount of stuff like that. I'm afraid crafting
exceptions for all that will just end up with more of a mess than we
safe by merging the two :/

A well.. I'll go do the 'trivial' patch then.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-14 15:07                   ` Peter Zijlstra
@ 2017-03-14 15:37                     ` Oleg Nesterov
  2017-03-14 15:46                       ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2017-03-14 15:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On 03/14, Peter Zijlstra wrote:
>
> Do we want a WARN_ON_ONCE(atomic_read(&tsk->usage)); in free_task()?
> Because in the above scenario we're freeing it with references on.

Not sure, in this case copy_process() should decrement tsk->usage
before free_task(), note the atomic_set(&tsk->usage, 2) in
dup_task_struct().

Perhaps we should just add WARN_ON(tsk->usage != 2) into copy_process()
right before free_task() ?

On the other hand, WARN_ON(atomic_read(&tsk->usage)) looks pointless,
the only caller is put_task_struct().

Oleg.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-14 15:37                     ` Oleg Nesterov
@ 2017-03-14 15:46                       ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-14 15:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On Tue, Mar 14, 2017 at 04:37:05PM +0100, Oleg Nesterov wrote:
> On 03/14, Peter Zijlstra wrote:
> >
> > Do we want a WARN_ON_ONCE(atomic_read(&tsk->usage)); in free_task()?
> > Because in the above scenario we're freeing it with references on.
> 
> Not sure, in this case copy_process() should decrement tsk->usage
> before free_task(), note the atomic_set(&tsk->usage, 2) in
> dup_task_struct().
> 
> Perhaps we should just add WARN_ON(tsk->usage != 2) into copy_process()
> right before free_task() ?

Sure; that works. I'll try that once I'm back home again, to see if
there's unexpected fail because other things increment it.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-14 15:26                     ` Peter Zijlstra
@ 2017-03-14 15:59                       ` Peter Zijlstra
  2017-03-15 16:43                         ` Oleg Nesterov
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-14 15:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On Tue, Mar 14, 2017 at 04:26:25PM +0100, Peter Zijlstra wrote:
> A well.. I'll go do the 'trivial' patch then.

A little like so; completely untested.

---
 kernel/events/core.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 110b38a58493..6576449b6029 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10346,6 +10346,17 @@ void perf_event_free_task(struct task_struct *task)
 			continue;
 
 		mutex_lock(&ctx->mutex);
+		raw_spin_lock_irq(&ctx->lock);
+		/*
+		 * Destroy the task <-> ctx relation and mark the context dead.
+		 *
+		 * This is important because even though the task hasn't been
+		 * exposed yet the context has been (through child_list).
+		 */
+		RCU_INIT_POINTER(task->perf_event_ctxp[ctxn], NULL);
+		WRITE_ONCE(ctx->task, TASK_TOMBSTONE);
+		put_task_struct(task); /* cannot be last */
+		raw_spin_unlock_irq(&ctx->lock);
 again:
 		list_for_each_entry_safe(event, tmp, &ctx->pinned_groups,
 				group_entry)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-14 15:59                       ` Peter Zijlstra
@ 2017-03-15 16:43                         ` Oleg Nesterov
  2017-03-16 12:05                           ` Peter Zijlstra
  2017-03-16 13:57                           ` Peter Zijlstra
  0 siblings, 2 replies; 46+ messages in thread
From: Oleg Nesterov @ 2017-03-15 16:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On 03/14, Peter Zijlstra wrote:
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -10346,6 +10346,17 @@ void perf_event_free_task(struct task_struct *task)
>  			continue;
>
>  		mutex_lock(&ctx->mutex);
> +		raw_spin_lock_irq(&ctx->lock);
> +		/*
> +		 * Destroy the task <-> ctx relation and mark the context dead.
> +		 *
> +		 * This is important because even though the task hasn't been
> +		 * exposed yet the context has been (through child_list).
> +		 */
> +		RCU_INIT_POINTER(task->perf_event_ctxp[ctxn], NULL);
> +		WRITE_ONCE(ctx->task, TASK_TOMBSTONE);
> +		put_task_struct(task); /* cannot be last */
> +		raw_spin_unlock_irq(&ctx->lock);

Agreed, this is what I had in mind. Although you know, I spent 3
hours looking at your patch and I still can't convince myself I am
really sure it closes all races ;)

OK, I believe this is correct. And iiuc both RCU_INIT_POINTER(NULL)
and put_task_struct() are not strictly necessary? At least until we
add WARN_ON(tsk->usage != 2) before free_task() in copy process().


---------------------------------------------------------------------
This is off-topic, but to me list_for_each_entry(event->child_list)
in perf_event_release_kernel() looks very confusing and misleading.
And list_first_entry_or_null(), we do not really need NULL if list
is empty, tmp == child should be F even if we use list_first_entry().
And given that we already have list_is_last(), it would be nice to
add list_is_first() and cleanup perf_event_release_kernel() a bit:

	--- x/kernel/events/core.c
	+++ x/kernel/events/core.c
	@@ -4152,7 +4152,7 @@ static void put_event(struct perf_event 
	 int perf_event_release_kernel(struct perf_event *event)
	 {
		struct perf_event_context *ctx = event->ctx;
	-	struct perf_event *child, *tmp;
	+	struct perf_event *child;
	 
		/*
		 * If we got here through err_file: fput(event_file); we will not have
	@@ -4190,8 +4190,9 @@ int perf_event_release_kernel(struct per
	 
	 again:
		mutex_lock(&event->child_mutex);
	-	list_for_each_entry(child, &event->child_list, child_list) {
	-
	+	if (!list_empty(&event->child_list)) {
	+		child = list_first_entry(&event->child_list,
	+					 struct perf_event, child_list);
			/*
			 * Cannot change, child events are not migrated, see the
			 * comment with perf_event_ctx_lock_nested().
	@@ -4221,9 +4222,7 @@ again:
			 * state, if child is still the first entry, it didn't get freed
			 * and we can continue doing so.
			 */
	-		tmp = list_first_entry_or_null(&event->child_list,
	-					       struct perf_event, child_list);
	-		if (tmp == child) {
	+		if (list_is_first(child, &event->child_list)) {
				perf_remove_from_context(child, DETACH_GROUP);
				list_del(&child->child_list);
				free_event(child);

But we can't, because

	static inline int list_is_first(const struct list_head *list,
					const struct list_head *head)
	{
		return list->prev == head;
	}

won't work, "child" can be freed so we can't dereference it, and

	static inline int list_is_first(const struct list_head *list,
					const struct list_head *head)
	{
		return head->next == list;
	}

won't be symmetrical with list_is_last() we already have.

Oleg.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-15 16:43                         ` Oleg Nesterov
@ 2017-03-16 12:05                           ` Peter Zijlstra
  2017-03-16 13:57                           ` Peter Zijlstra
  1 sibling, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-16 12:05 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On Wed, Mar 15, 2017 at 05:43:02PM +0100, Oleg Nesterov wrote:
> On 03/14, Peter Zijlstra wrote:
> >
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -10346,6 +10346,17 @@ void perf_event_free_task(struct task_struct *task)
> >  			continue;
> >
> >  		mutex_lock(&ctx->mutex);
> > +		raw_spin_lock_irq(&ctx->lock);
> > +		/*
> > +		 * Destroy the task <-> ctx relation and mark the context dead.
> > +		 *
> > +		 * This is important because even though the task hasn't been
> > +		 * exposed yet the context has been (through child_list).
> > +		 */
> > +		RCU_INIT_POINTER(task->perf_event_ctxp[ctxn], NULL);
> > +		WRITE_ONCE(ctx->task, TASK_TOMBSTONE);
> > +		put_task_struct(task); /* cannot be last */
> > +		raw_spin_unlock_irq(&ctx->lock);
> 
> Agreed, this is what I had in mind. Although you know, I spent 3
> hours looking at your patch and I still can't convince myself I am
> really sure it closes all races ;)

Ha; yes I know that feeling. I used to have a few sheets of paper filled
with diagrams. Sadly I could not find them again. Must've been over
eager cleaning my desk at some point.

> 
> OK, I believe this is correct. And iiuc both RCU_INIT_POINTER(NULL)
> and put_task_struct() are not strictly necessary? At least until we
> add WARN_ON(tsk->usage != 2) before free_task() in copy process().

Right; I just kept the code similar to the other location. I even
considered making a helper function to not duplicate, but in the end
decided against it.

> ---------------------------------------------------------------------
> This is off-topic, but to me list_for_each_entry(event->child_list)
> in perf_event_release_kernel() looks very confusing and misleading.
> And list_first_entry_or_null(), we do not really need NULL if list
> is empty, tmp == child should be F even if we use list_first_entry().
> And given that we already have list_is_last(), it would be nice to
> add list_is_first() and cleanup perf_event_release_kernel() a bit:
> 

Agreed; its a bit of a weird one.

Let me go write proper patches for the things we have so far though.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 0/4] perf patches
@ 2017-03-16 12:47 Peter Zijlstra
  2017-03-16 12:47 ` [PATCH 1/4] perf: Fix use-after-free in perf_release() Peter Zijlstra
                   ` (4 more replies)
  0 siblings, 5 replies; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-16 12:47 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, acme, mathieu.desnoyers, alexander.shishkin,
	dvyukov, fweisbec, oleg, peterz

Hi,

These 4 patches are the result of Dmitry reporting a use-after-free.

The first two are /urgent material, the last two can go in /core.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 1/4] perf: Fix use-after-free in perf_release()
  2017-03-16 12:47 [PATCH 0/4] perf patches Peter Zijlstra
@ 2017-03-16 12:47 ` Peter Zijlstra
  2017-03-16 15:19   ` [tip:perf/urgent] perf/core: " tip-bot for Peter Zijlstra
  2017-03-16 12:47 ` [PATCH 2/4] perf: Fix event inheritance on fork() Peter Zijlstra
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-16 12:47 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, acme, mathieu.desnoyers, alexander.shishkin,
	dvyukov, fweisbec, oleg, peterz, stable

[-- Attachment #1: peterz-perf-UAF-perf_release.patch --]
[-- Type: text/plain, Size: 2900 bytes --]

Dmitry reported syzcaller tripped a use-after-free in perf_release().

After much puzzlement Oleg spotted the below scenario:


  Task1                           Task2

  fork()
    perf_event_init_task()
    /* ... */
    goto bad_fork_$foo;
    /* ... */
    perf_event_free_task()
      mutex_lock(ctx->lock)
      perf_free_event(B)

                                  perf_event_release_kernel(A)
                                    mutex_lock(A->child_mutex)
                                    list_for_each_entry(child, ...) {
                                      /* child == B */
                                      ctx = B->ctx;
                                      get_ctx(ctx);
                                      mutex_unlock(A->child_mutex);

        mutex_lock(A->child_mutex)
        list_del_init(B->child_list)
        mutex_unlock(A->child_mutex)

        /* ... */

      mutex_unlock(ctx->lock);
      put_ctx() /* >0 */
    free_task();
                                      mutex_lock(ctx->lock);
                                      mutex_lock(A->child_mutex);
                                      /* ... */
                                      mutex_unlock(A->child_mutex);
                                      mutex_unlock(ctx->lock)
                                      put_ctx() /* 0 */
                                        ctx->task && !TOMBSTONE
                                          put_task_struct() /* UAF */


This patch closes the hole by making perf_event_free_task() destroy the
task <-> ctx relation such that perf_event_release_kernel() will no longer
observe the now dead task.


Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: stable@vger.kernel.org
Fixes: c6e5b73242d2 ("perf: Synchronously clean up child events")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Spotted-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170314155949.GE32474@worktop
---
 kernel/events/core.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10556,6 +10556,17 @@ void perf_event_free_task(struct task_st
 			continue;
 
 		mutex_lock(&ctx->mutex);
+		raw_spin_lock_irq(&ctx->lock);
+		/*
+		 * Destroy the task <-> ctx relation and mark the context dead.
+		 *
+		 * This is important because even though the task hasn't been
+		 * exposed yet the context has been (through child_list).
+		 */
+		RCU_INIT_POINTER(task->perf_event_ctxp[ctxn], NULL);
+		WRITE_ONCE(ctx->task, TASK_TOMBSTONE);
+		put_task_struct(task); /* cannot be last */
+		raw_spin_unlock_irq(&ctx->lock);
 again:
 		list_for_each_entry_safe(event, tmp, &ctx->pinned_groups,
 				group_entry)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 2/4] perf: Fix event inheritance on fork()
  2017-03-16 12:47 [PATCH 0/4] perf patches Peter Zijlstra
  2017-03-16 12:47 ` [PATCH 1/4] perf: Fix use-after-free in perf_release() Peter Zijlstra
@ 2017-03-16 12:47 ` Peter Zijlstra
  2017-03-16 15:19   ` [tip:perf/urgent] perf/core: " tip-bot for Peter Zijlstra
  2017-03-16 12:47 ` [PATCH 3/4] perf: Simplify perf_event_free_task() Peter Zijlstra
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-16 12:47 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, acme, mathieu.desnoyers, alexander.shishkin,
	dvyukov, fweisbec, oleg, peterz, stable

[-- Attachment #1: peterz-perf-fix-perf_event_init_context.patch --]
[-- Type: text/plain, Size: 1556 bytes --]

While hunting for clues to a use-after-free, Oleg spotted that
perf_event_init_context() can loose an error value with the result
that fork() can succeed even though we did not fully inherit the perf
event context.

Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: stable@vger.kernel.org
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Spotted-by: Oleg Nesterov <oleg@redhat.com>
Fixes: 889ff0150661 ("perf/core: Split context's event group list into pinned and non-pinned lists")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10820,7 +10819,7 @@ static int perf_event_init_context(struc
 		ret = inherit_task_group(event, parent, parent_ctx,
 					 child, ctxn, &inherited_all);
 		if (ret)
-			break;
+			goto out_unlock;
 	}
 
 	/*
@@ -10836,7 +10835,7 @@ static int perf_event_init_context(struc
 		ret = inherit_task_group(event, parent, parent_ctx,
 					 child, ctxn, &inherited_all);
 		if (ret)
-			break;
+			goto out_unlock;
 	}
 
 	raw_spin_lock_irqsave(&parent_ctx->lock, flags);
@@ -10864,6 +10863,7 @@ static int perf_event_init_context(struc
 	}
 
 	raw_spin_unlock_irqrestore(&parent_ctx->lock, flags);
+out_unlock:
 	mutex_unlock(&parent_ctx->mutex);
 
 	perf_unpin_context(parent_ctx);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 3/4] perf: Simplify perf_event_free_task()
  2017-03-16 12:47 [PATCH 0/4] perf patches Peter Zijlstra
  2017-03-16 12:47 ` [PATCH 1/4] perf: Fix use-after-free in perf_release() Peter Zijlstra
  2017-03-16 12:47 ` [PATCH 2/4] perf: Fix event inheritance on fork() Peter Zijlstra
@ 2017-03-16 12:47 ` Peter Zijlstra
  2017-03-16 15:20   ` [tip:perf/urgent] perf/core: " tip-bot for Peter Zijlstra
  2017-03-16 12:47 ` [PATCH 4/4] perf: Better explain the inherit magic Peter Zijlstra
  2017-03-16 13:20 ` [PATCH 0/4] perf patches Ingo Molnar
  4 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-16 12:47 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, acme, mathieu.desnoyers, alexander.shishkin,
	dvyukov, fweisbec, oleg, peterz

[-- Attachment #1: peterz-perf-simplify-perf_event_free_task.patch --]
[-- Type: text/plain, Size: 1257 bytes --]

We have ctx->event_list that contains all events; no need to
repeatedly iterate the group lists to find them all.

Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |   12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10567,21 +10567,11 @@ void perf_event_free_task(struct task_st
 		WRITE_ONCE(ctx->task, TASK_TOMBSTONE);
 		put_task_struct(task); /* cannot be last */
 		raw_spin_unlock_irq(&ctx->lock);
-again:
-		list_for_each_entry_safe(event, tmp, &ctx->pinned_groups,
-				group_entry)
-			perf_free_event(event, ctx);
 
-		list_for_each_entry_safe(event, tmp, &ctx->flexible_groups,
-				group_entry)
+		list_for_each_entry_safe(event, tmp, &ctx->event_list, event_entry)
 			perf_free_event(event, ctx);
 
-		if (!list_empty(&ctx->pinned_groups) ||
-				!list_empty(&ctx->flexible_groups))
-			goto again;
-
 		mutex_unlock(&ctx->mutex);
-
 		put_ctx(ctx);
 	}
 }

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 4/4] perf: Better explain the inherit magic
  2017-03-16 12:47 [PATCH 0/4] perf patches Peter Zijlstra
                   ` (2 preceding siblings ...)
  2017-03-16 12:47 ` [PATCH 3/4] perf: Simplify perf_event_free_task() Peter Zijlstra
@ 2017-03-16 12:47 ` Peter Zijlstra
  2017-03-16 15:21   ` [tip:perf/urgent] perf/core: " tip-bot for Peter Zijlstra
  2017-03-16 13:20 ` [PATCH 0/4] perf patches Ingo Molnar
  4 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-16 12:47 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, acme, mathieu.desnoyers, alexander.shishkin,
	dvyukov, fweisbec, oleg, peterz

[-- Attachment #1: peterz-perf-comment-inherit.patch --]
[-- Type: text/plain, Size: 3784 bytes --]

While going through the event inheritance code Oleg got confused.

Add some comments to better explain the silent dissapearance of
orphaned events.

So what happens is that at perf_event_release_kernel() time; when an
event looses its connection to userspace (and ceases to exist from the
user's perspective) we can still have an arbitrary amount of inherited
copies of the event. We want to synchronously find and remove all
these child events.

Since that requires a bit of lock juggling, there is the possibility
that concurrent clone()s will create new child events. Therefore we
first mark the parent event as DEAD, which marks all the extant child
events as orphaned.

We then avoid copying orphaned events; in order to avoid getting more
of them.

Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |   36 +++++++++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4261,7 +4261,7 @@ int perf_event_release_kernel(struct per
 
 	raw_spin_lock_irq(&ctx->lock);
 	/*
-	 * Mark this even as STATE_DEAD, there is no external reference to it
+	 * Mark this event as STATE_DEAD, there is no external reference to it
 	 * anymore.
 	 *
 	 * Anybody acquiring event->child_mutex after the below loop _must_
@@ -10609,7 +10609,12 @@ const struct perf_event_attr *perf_event
 }
 
 /*
- * inherit a event from parent task to child task:
+ * Inherit a event from parent task to child task.
+ *
+ * Returns:
+ *  - valid pointer on success
+ *  - NULL for orphaned events
+ *  - IS_ERR() on error
  */
 static struct perf_event *
 inherit_event(struct perf_event *parent_event,
@@ -10703,6 +10708,16 @@ inherit_event(struct perf_event *parent_
 	return child_event;
 }
 
+/*
+ * Inherits an event group.
+ *
+ * This will quietly suppress orphaned events; !inherit_event() is not an error.
+ * This matches with perf_event_release_kernel() removing all child events.
+ *
+ * Returns:
+ *  - 0 on success
+ *  - <0 on error
+ */
 static int inherit_group(struct perf_event *parent_event,
 	      struct task_struct *parent,
 	      struct perf_event_context *parent_ctx,
@@ -10717,6 +10732,11 @@ static int inherit_group(struct perf_eve
 				 child, NULL, child_ctx);
 	if (IS_ERR(leader))
 		return PTR_ERR(leader);
+	/*
+	 * @leader can be NULL here because of is_orphaned_event(). In this
+	 * case inherit_event() will create individual events, similar to what
+	 * perf_group_detach() would do anyway.
+	 */
 	list_for_each_entry(sub, &parent_event->sibling_list, group_entry) {
 		child_ctr = inherit_event(sub, parent, parent_ctx,
 					    child, leader, child_ctx);
@@ -10726,6 +10746,17 @@ static int inherit_group(struct perf_eve
 	return 0;
 }
 
+/*
+ * Creates the child task context and tries to inherit the event-group.
+ *
+ * Clears @inherited_all on !attr.inherited or error. Note that we'll leave
+ * inherited_all set when we 'fail' to inherit an orphaned event; this is
+ * consistent with perf_event_release_kernel() removing all child events.
+ *
+ * Returns:
+ *  - 0 on success
+ *  - <0 on error
+ */
 static int
 inherit_task_group(struct perf_event *event, struct task_struct *parent,
 		   struct perf_event_context *parent_ctx,
@@ -10748,7 +10779,6 @@ inherit_task_group(struct perf_event *ev
 		 * First allocate and initialize a context for the
 		 * child.
 		 */
-
 		child_ctx = alloc_perf_context(parent_ctx->pmu, child);
 		if (!child_ctx)
 			return -ENOMEM;

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/4] perf patches
  2017-03-16 12:47 [PATCH 0/4] perf patches Peter Zijlstra
                   ` (3 preceding siblings ...)
  2017-03-16 12:47 ` [PATCH 4/4] perf: Better explain the inherit magic Peter Zijlstra
@ 2017-03-16 13:20 ` Ingo Molnar
  4 siblings, 0 replies; 46+ messages in thread
From: Ingo Molnar @ 2017-03-16 13:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, acme, mathieu.desnoyers, alexander.shishkin,
	dvyukov, fweisbec, oleg, Linus Torvalds


* Peter Zijlstra <peterz@infradead.org> wrote:

> Hi,
> 
> These 4 patches are the result of Dmitry reporting a use-after-free.
> 
> The first two are /urgent material, the last two can go in /core.

Applied to tip:perf/urgent, thanks Peter!

Note that I applied all four to perf/urgent, to keep the source code variants 
count lower.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-15 16:43                         ` Oleg Nesterov
  2017-03-16 12:05                           ` Peter Zijlstra
@ 2017-03-16 13:57                           ` Peter Zijlstra
  2017-03-16 16:41                             ` Oleg Nesterov
  1 sibling, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2017-03-16 13:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On Wed, Mar 15, 2017 at 05:43:02PM +0100, Oleg Nesterov wrote:
> 	static inline int list_is_first(const struct list_head *list,
> 					const struct list_head *head)
> 	{
> 		return head->next == list;
> 	}
> 
> won't be symmetrical with list_is_last() we already have.

This is the one that makes sense to me though; that is, the current
list_is_last() doesn't make sense to me.

I would expect:

static inline int list_is_last(const struct list_head *list,
				const struct list_head *head)
{
	return head->prev == list
}

because @head is the list argument (yes, I know, horrible naming!).

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [tip:perf/urgent] perf/core: Fix use-after-free in perf_release()
  2017-03-16 12:47 ` [PATCH 1/4] perf: Fix use-after-free in perf_release() Peter Zijlstra
@ 2017-03-16 15:19   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot for Peter Zijlstra @ 2017-03-16 15:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, dvyukov, vincent.weaver, alexander.shishkin, acme, acme,
	linux-kernel, torvalds, eranian, peterz, mathieu.desnoyers, oleg,
	hpa, mingo, tglx

Commit-ID:  e552a8389aa409e257b7dcba74f67f128f979ccc
Gitweb:     http://git.kernel.org/tip/e552a8389aa409e257b7dcba74f67f128f979ccc
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 16 Mar 2017 13:47:48 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 16 Mar 2017 14:16:52 +0100

perf/core: Fix use-after-free in perf_release()

Dmitry reported syzcaller tripped a use-after-free in perf_release().

After much puzzlement Oleg spotted the below scenario:

  Task1                           Task2

  fork()
    perf_event_init_task()
    /* ... */
    goto bad_fork_$foo;
    /* ... */
    perf_event_free_task()
      mutex_lock(ctx->lock)
      perf_free_event(B)

                                  perf_event_release_kernel(A)
                                    mutex_lock(A->child_mutex)
                                    list_for_each_entry(child, ...) {
                                      /* child == B */
                                      ctx = B->ctx;
                                      get_ctx(ctx);
                                      mutex_unlock(A->child_mutex);

        mutex_lock(A->child_mutex)
        list_del_init(B->child_list)
        mutex_unlock(A->child_mutex)

        /* ... */

      mutex_unlock(ctx->lock);
      put_ctx() /* >0 */
    free_task();
                                      mutex_lock(ctx->lock);
                                      mutex_lock(A->child_mutex);
                                      /* ... */
                                      mutex_unlock(A->child_mutex);
                                      mutex_unlock(ctx->lock)
                                      put_ctx() /* 0 */
                                        ctx->task && !TOMBSTONE
                                          put_task_struct() /* UAF */

This patch closes the hole by making perf_event_free_task() destroy the
task <-> ctx relation such that perf_event_release_kernel() will no longer
observe the now dead task.

Spotted-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fweisbec@gmail.com
Cc: oleg@redhat.com
Cc: stable@vger.kernel.org
Fixes: c6e5b73242d2 ("perf: Synchronously clean up child events")
Link: http://lkml.kernel.org/r/20170314155949.GE32474@worktop
Link: http://lkml.kernel.org/r/20170316125823.140295131@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1031bdf..4742909 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10415,6 +10415,17 @@ void perf_event_free_task(struct task_struct *task)
 			continue;
 
 		mutex_lock(&ctx->mutex);
+		raw_spin_lock_irq(&ctx->lock);
+		/*
+		 * Destroy the task <-> ctx relation and mark the context dead.
+		 *
+		 * This is important because even though the task hasn't been
+		 * exposed yet the context has been (through child_list).
+		 */
+		RCU_INIT_POINTER(task->perf_event_ctxp[ctxn], NULL);
+		WRITE_ONCE(ctx->task, TASK_TOMBSTONE);
+		put_task_struct(task); /* cannot be last */
+		raw_spin_unlock_irq(&ctx->lock);
 again:
 		list_for_each_entry_safe(event, tmp, &ctx->pinned_groups,
 				group_entry)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [tip:perf/urgent] perf/core: Fix event inheritance on fork()
  2017-03-16 12:47 ` [PATCH 2/4] perf: Fix event inheritance on fork() Peter Zijlstra
@ 2017-03-16 15:19   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot for Peter Zijlstra @ 2017-03-16 15:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, alexander.shishkin, hpa, mingo, eranian, jolsa, dvyukov,
	mathieu.desnoyers, peterz, acme, linux-kernel, oleg, fweisbec,
	vincent.weaver, torvalds, acme

Commit-ID:  e7cc4865f0f31698ef2f7aac01a50e78968985b7
Gitweb:     http://git.kernel.org/tip/e7cc4865f0f31698ef2f7aac01a50e78968985b7
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 16 Mar 2017 13:47:49 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 16 Mar 2017 14:16:52 +0100

perf/core: Fix event inheritance on fork()

While hunting for clues to a use-after-free, Oleg spotted that
perf_event_init_context() can loose an error value with the result
that fork() can succeed even though we did not fully inherit the perf
event context.

Spotted-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: oleg@redhat.com
Cc: stable@vger.kernel.org
Fixes: 889ff0150661 ("perf/core: Split context's event group list into pinned and non-pinned lists")
Link: http://lkml.kernel.org/r/20170316125823.190342547@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4742909..fc7c9a8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10679,7 +10679,7 @@ static int perf_event_init_context(struct task_struct *child, int ctxn)
 		ret = inherit_task_group(event, parent, parent_ctx,
 					 child, ctxn, &inherited_all);
 		if (ret)
-			break;
+			goto out_unlock;
 	}
 
 	/*
@@ -10695,7 +10695,7 @@ static int perf_event_init_context(struct task_struct *child, int ctxn)
 		ret = inherit_task_group(event, parent, parent_ctx,
 					 child, ctxn, &inherited_all);
 		if (ret)
-			break;
+			goto out_unlock;
 	}
 
 	raw_spin_lock_irqsave(&parent_ctx->lock, flags);
@@ -10723,6 +10723,7 @@ static int perf_event_init_context(struct task_struct *child, int ctxn)
 	}
 
 	raw_spin_unlock_irqrestore(&parent_ctx->lock, flags);
+out_unlock:
 	mutex_unlock(&parent_ctx->mutex);
 
 	perf_unpin_context(parent_ctx);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [tip:perf/urgent] perf/core: Simplify perf_event_free_task()
  2017-03-16 12:47 ` [PATCH 3/4] perf: Simplify perf_event_free_task() Peter Zijlstra
@ 2017-03-16 15:20   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot for Peter Zijlstra @ 2017-03-16 15:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, mathieu.desnoyers, oleg, acme, torvalds, peterz,
	vincent.weaver, acme, mingo, jolsa, hpa, linux-kernel, dvyukov,
	alexander.shishkin, eranian

Commit-ID:  15121c789e001168decac6483d192bdb7ea29e74
Gitweb:     http://git.kernel.org/tip/15121c789e001168decac6483d192bdb7ea29e74
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 16 Mar 2017 13:47:50 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 16 Mar 2017 14:16:53 +0100

perf/core: Simplify perf_event_free_task()

We have ctx->event_list that contains all events; no need to
repeatedly iterate the group lists to find them all.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fweisbec@gmail.com
Link: http://lkml.kernel.org/r/20170316125823.239678244@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index fc7c9a8..5f21e5e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10426,21 +10426,11 @@ void perf_event_free_task(struct task_struct *task)
 		WRITE_ONCE(ctx->task, TASK_TOMBSTONE);
 		put_task_struct(task); /* cannot be last */
 		raw_spin_unlock_irq(&ctx->lock);
-again:
-		list_for_each_entry_safe(event, tmp, &ctx->pinned_groups,
-				group_entry)
-			perf_free_event(event, ctx);
 
-		list_for_each_entry_safe(event, tmp, &ctx->flexible_groups,
-				group_entry)
+		list_for_each_entry_safe(event, tmp, &ctx->event_list, event_entry)
 			perf_free_event(event, ctx);
 
-		if (!list_empty(&ctx->pinned_groups) ||
-				!list_empty(&ctx->flexible_groups))
-			goto again;
-
 		mutex_unlock(&ctx->mutex);
-
 		put_ctx(ctx);
 	}
 }

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [tip:perf/urgent] perf/core: Better explain the inherit magic
  2017-03-16 12:47 ` [PATCH 4/4] perf: Better explain the inherit magic Peter Zijlstra
@ 2017-03-16 15:21   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot for Peter Zijlstra @ 2017-03-16 15:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, oleg, peterz, torvalds, alexander.shishkin, acme,
	mathieu.desnoyers, dvyukov, eranian, acme, hpa, vincent.weaver,
	linux-kernel, jolsa, mingo

Commit-ID:  d8a8cfc76919b6c830305266b23ba671623f37ff
Gitweb:     http://git.kernel.org/tip/d8a8cfc76919b6c830305266b23ba671623f37ff
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 16 Mar 2017 13:47:51 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 16 Mar 2017 14:16:53 +0100

perf/core: Better explain the inherit magic

While going through the event inheritance code Oleg got confused.

Add some comments to better explain the silent dissapearance of
orphaned events.

So what happens is that at perf_event_release_kernel() time; when an
event looses its connection to userspace (and ceases to exist from the
user's perspective) we can still have an arbitrary amount of inherited
copies of the event. We want to synchronously find and remove all
these child events.

Since that requires a bit of lock juggling, there is the possibility
that concurrent clone()s will create new child events. Therefore we
first mark the parent event as DEAD, which marks all the extant child
events as orphaned.

We then avoid copying orphaned events; in order to avoid getting more
of them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fweisbec@gmail.com
Link: http://lkml.kernel.org/r/20170316125823.289567442@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 36 +++++++++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5f21e5e..7298e14 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4254,7 +4254,7 @@ int perf_event_release_kernel(struct perf_event *event)
 
 	raw_spin_lock_irq(&ctx->lock);
 	/*
-	 * Mark this even as STATE_DEAD, there is no external reference to it
+	 * Mark this event as STATE_DEAD, there is no external reference to it
 	 * anymore.
 	 *
 	 * Anybody acquiring event->child_mutex after the below loop _must_
@@ -10468,7 +10468,12 @@ const struct perf_event_attr *perf_event_attrs(struct perf_event *event)
 }
 
 /*
- * inherit a event from parent task to child task:
+ * Inherit a event from parent task to child task.
+ *
+ * Returns:
+ *  - valid pointer on success
+ *  - NULL for orphaned events
+ *  - IS_ERR() on error
  */
 static struct perf_event *
 inherit_event(struct perf_event *parent_event,
@@ -10562,6 +10567,16 @@ inherit_event(struct perf_event *parent_event,
 	return child_event;
 }
 
+/*
+ * Inherits an event group.
+ *
+ * This will quietly suppress orphaned events; !inherit_event() is not an error.
+ * This matches with perf_event_release_kernel() removing all child events.
+ *
+ * Returns:
+ *  - 0 on success
+ *  - <0 on error
+ */
 static int inherit_group(struct perf_event *parent_event,
 	      struct task_struct *parent,
 	      struct perf_event_context *parent_ctx,
@@ -10576,6 +10591,11 @@ static int inherit_group(struct perf_event *parent_event,
 				 child, NULL, child_ctx);
 	if (IS_ERR(leader))
 		return PTR_ERR(leader);
+	/*
+	 * @leader can be NULL here because of is_orphaned_event(). In this
+	 * case inherit_event() will create individual events, similar to what
+	 * perf_group_detach() would do anyway.
+	 */
 	list_for_each_entry(sub, &parent_event->sibling_list, group_entry) {
 		child_ctr = inherit_event(sub, parent, parent_ctx,
 					    child, leader, child_ctx);
@@ -10585,6 +10605,17 @@ static int inherit_group(struct perf_event *parent_event,
 	return 0;
 }
 
+/*
+ * Creates the child task context and tries to inherit the event-group.
+ *
+ * Clears @inherited_all on !attr.inherited or error. Note that we'll leave
+ * inherited_all set when we 'fail' to inherit an orphaned event; this is
+ * consistent with perf_event_release_kernel() removing all child events.
+ *
+ * Returns:
+ *  - 0 on success
+ *  - <0 on error
+ */
 static int
 inherit_task_group(struct perf_event *event, struct task_struct *parent,
 		   struct perf_event_context *parent_ctx,
@@ -10607,7 +10638,6 @@ inherit_task_group(struct perf_event *event, struct task_struct *parent,
 		 * First allocate and initialize a context for the
 		 * child.
 		 */
-
 		child_ctx = alloc_perf_context(parent_ctx->pmu, child);
 		if (!child_ctx)
 			return -ENOMEM;

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use-after-free in perf_release
  2017-03-16 13:57                           ` Peter Zijlstra
@ 2017-03-16 16:41                             ` Oleg Nesterov
  0 siblings, 0 replies; 46+ messages in thread
From: Oleg Nesterov @ 2017-03-16 16:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, LKML, Mathieu Desnoyers, syzkaller

On 03/16, Peter Zijlstra wrote:
>
> On Wed, Mar 15, 2017 at 05:43:02PM +0100, Oleg Nesterov wrote:
> > 	static inline int list_is_first(const struct list_head *list,
> > 					const struct list_head *head)
> > 	{
> > 		return head->next == list;
> > 	}
> >
> > won't be symmetrical with list_is_last() we already have.
>
> This is the one that makes sense to me though; that is, the current
> list_is_last() doesn't make sense to me.
>
> I would expect:
>
> static inline int list_is_last(const struct list_head *list,
> 				const struct list_head *head)
> {
> 	return head->prev == list
> }

Yes!

> because @head is the list argument (yes, I know, horrible naming!).

and perhaps it could have more users if we redefine it to dereference
"head" which is likely more "stable", iow less likely can go away.

But after the quick grep I came to conclusion it is not possible to
audit the users it already has.

Oleg.

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2017-03-16 16:43 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-16 12:47 [PATCH 0/4] perf patches Peter Zijlstra
2017-03-16 12:47 ` [PATCH 1/4] perf: Fix use-after-free in perf_release() Peter Zijlstra
2017-03-16 15:19   ` [tip:perf/urgent] perf/core: " tip-bot for Peter Zijlstra
2017-03-16 12:47 ` [PATCH 2/4] perf: Fix event inheritance on fork() Peter Zijlstra
2017-03-16 15:19   ` [tip:perf/urgent] perf/core: " tip-bot for Peter Zijlstra
2017-03-16 12:47 ` [PATCH 3/4] perf: Simplify perf_event_free_task() Peter Zijlstra
2017-03-16 15:20   ` [tip:perf/urgent] perf/core: " tip-bot for Peter Zijlstra
2017-03-16 12:47 ` [PATCH 4/4] perf: Better explain the inherit magic Peter Zijlstra
2017-03-16 15:21   ` [tip:perf/urgent] perf/core: " tip-bot for Peter Zijlstra
2017-03-16 13:20 ` [PATCH 0/4] perf patches Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2017-03-06  9:57 perf: use-after-free in perf_release Dmitry Vyukov
2017-03-06 12:13 ` Peter Zijlstra
2017-03-06 12:17   ` Dmitry Vyukov
2017-03-06 12:23     ` Peter Zijlstra
2017-03-06 12:27       ` Dmitry Vyukov
2017-03-06 12:47         ` Peter Zijlstra
2017-03-06 13:14 ` Peter Zijlstra
2017-03-06 13:34   ` Dmitry Vyukov
2017-03-07  9:08     ` Peter Zijlstra
2017-03-07  9:26       ` Dmitry Vyukov
2017-03-07  9:37         ` Peter Zijlstra
2017-03-07  9:43           ` Dmitry Vyukov
2017-03-07 10:00             ` Peter Zijlstra
2017-03-07 13:16   ` Peter Zijlstra
2017-03-07 13:27     ` Peter Zijlstra
2017-03-07 14:04   ` Oleg Nesterov
2017-03-07 14:17     ` Dmitry Vyukov
2017-03-07 16:51       ` Oleg Nesterov
2017-03-07 17:29         ` Peter Zijlstra
2017-03-14 12:55         ` Peter Zijlstra
2017-03-14 13:24           ` Oleg Nesterov
2017-03-14 13:47             ` Peter Zijlstra
2017-03-14 14:03           ` Oleg Nesterov
2017-03-14 14:07             ` Peter Zijlstra
2017-03-14 14:30               ` Oleg Nesterov
2017-03-14 15:02                 ` Peter Zijlstra
2017-03-14 15:07                   ` Peter Zijlstra
2017-03-14 15:37                     ` Oleg Nesterov
2017-03-14 15:46                       ` Peter Zijlstra
2017-03-14 15:19                   ` Oleg Nesterov
2017-03-14 15:26                     ` Peter Zijlstra
2017-03-14 15:59                       ` Peter Zijlstra
2017-03-15 16:43                         ` Oleg Nesterov
2017-03-16 12:05                           ` Peter Zijlstra
2017-03-16 13:57                           ` Peter Zijlstra
2017-03-16 16:41                             ` Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.