linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* use-after-free in __perf_install_in_context
@ 2015-12-04 20:04 Dmitry Vyukov
  2015-12-04 20:32 ` Alexei Starovoitov
                   ` (3 more replies)
  0 siblings, 4 replies; 46+ messages in thread
From: Dmitry Vyukov @ 2015-12-04 20:04 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, LKML
  Cc: syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

Hello,

While running syzkaller fuzzer I am seeing lots of the following
use-after-free reports. Unfortunately all my numerous attempts to
reproduce them in a controlled environment failed. They pop up during
fuzzing periodically (once in several hours in a single VM), but
whenever I try to stress-replay what happened in the VM before the
report, the use-after-free does not reproduce. Can somebody
knowledgeable in perf subsystem look at the report? Maybe it is
possible to figure out what happened based purely on the report. I can
pretty reliably test any proposed fixes.
All reports look like this one. Then it is usually followed by other
reports and eventually kernel hangs or dies. What happens in the
fuzzer is essentially random syscalls with random arguments, tasks
born and die concurrently and so on. I was able to reproduce it by
restricting syscalls only to perf_event_open, perf ioctls and bpf
syscall.


==================================================================
BUG: KASAN: use-after-free in __lock_acquire+0x4e99/0x5100 at addr
ffff880038706e60
Read of size 8 by task syzkaller_execu/6513
=============================================================================
BUG kmalloc-1024 (Not tainted): kasan: bad access detected
-----------------------------------------------------------------------------

Disabling lock debugging due to kernel taint
INFO: Allocated in alloc_perf_context+0x4c/0x100 age=263 cpu=1 pid=6428
[<     inline     >] kzalloc include/linux/slab.h:602
[<      none      >] alloc_perf_context+0x4c/0x100 kernel/events/core.c:3399
[<      none      >] find_get_context+0x187/0x830 kernel/events/core.c:3506
[<      none      >] SYSC_perf_event_open+0xe50/0x21a0 kernel/events/core.c:8375
[<      none      >] SyS_perf_event_open+0x39/0x50 kernel/events/core.c:8236
[<      none      >] tracesys_phase2+0x88/0x8d arch/x86/entry/entry_64.S:269

INFO: Freed in free_ctx+0x4b/0x70 age=174 cpu=2 pid=8105
[<      none      >] kfree+0x26f/0x3e0 mm/slub.c:3632
[<      none      >] free_ctx+0x4b/0x70 kernel/events/core.c:872
[<     inline     >] __rcu_reclaim kernel/rcu/rcu.h:118
[<     inline     >] rcu_do_batch kernel/rcu/tree.c:2693
[<     inline     >] invoke_rcu_callbacks kernel/rcu/tree.c:2961
[<     inline     >] __rcu_process_callbacks kernel/rcu/tree.c:2928
[<      none      >] rcu_process_callbacks+0x631/0x19e0 kernel/rcu/tree.c:2945
[<      none      >] __do_softirq+0x2e5/0xb40 kernel/softirq.c:273
[<     inline     >] invoke_softirq kernel/softirq.c:350
[<      none      >] irq_exit+0x165/0x1e0 kernel/softirq.c:391
[<     inline     >] exiting_irq ./arch/x86/include/asm/apic.h:653
[<      none      >] smp_apic_timer_interrupt+0x88/0xc0
arch/x86/kernel/apic/apic.c:926
[<      none      >] apic_timer_interrupt+0x87/0x90
arch/x86/entry/entry_64.S:678

INFO: Slab 0xffffea0000e1c000 objects=24 used=16 fp=0xffff880038706e40
flags=0x1fffc0000004080
INFO: Object 0xffff880038706e40 @offset=28224 fp=0xffff8800387078c0
CPU: 1 PID: 6513 Comm: syzkaller_execu Tainted: G    B           4.4.0-rc3+ #144
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 0000000000000001 ffff8800641ff680 ffffffff82c74978 0000000041b58ab3
 ffffffff878cbafd ffffffff82c748c6 ffff88006459c380 ffffffff878ec293
 ffff88003e806f80 0000000000000008 ffff880038706e40 ffff8800641ff680

Call Trace:
 [<ffffffff81798654>] __asan_report_load8_noabort+0x54/0x70
mm/kasan/report.c:280
 [<ffffffff814097e9>] __lock_acquire+0x4e99/0x5100 kernel/locking/lockdep.c:3092
 [<ffffffff8140c36d>] lock_acquire+0x19d/0x3f0 kernel/locking/lockdep.c:3585
 [<     inline     >] __raw_spin_lock include/linux/spinlock_api_smp.h:144
 [<ffffffff8691aab1>] _raw_spin_lock+0x31/0x40 kernel/locking/spinlock.c:151
 [<     inline     >] perf_ctx_lock kernel/events/core.c:351
 [<ffffffff81638db9>] __perf_install_in_context+0x109/0xa00
kernel/events/core.c:2074
 [<ffffffff816230da>] remote_function+0x14a/0x200 kernel/events/core.c:74
 [<ffffffff814c9db7>] generic_exec_single+0x2a7/0x490 kernel/smp.c:156
 [<ffffffff814ca980>] smp_call_function_single+0x200/0x310 kernel/smp.c:300
 [<ffffffff816214f3>] task_function_call+0x123/0x160 kernel/events/core.c:101
 [<ffffffff81629511>] perf_install_in_context+0x201/0x340
kernel/events/core.c:2155
 [<ffffffff8164dac5>] SYSC_perf_event_open+0x1465/0x21a0
kernel/events/core.c:8540
 [<ffffffff81656c29>] SyS_perf_event_open+0x39/0x50 kernel/events/core.c:8236
 [<ffffffff8691b9f8>] tracesys_phase2+0x88/0x8d arch/x86/entry/entry_64.S:269
==================================================================

On commit 31ade3b83e1821da5fbb2f11b5b3d4ab2ec39db8.

Thank you

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-04 20:04 use-after-free in __perf_install_in_context Dmitry Vyukov
@ 2015-12-04 20:32 ` Alexei Starovoitov
  2015-12-04 21:00   ` Dmitry Vyukov
  2015-12-07 15:36 ` Peter Zijlstra
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 46+ messages in thread
From: Alexei Starovoitov @ 2015-12-04 20:32 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, LKML,
	syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Fri, Dec 04, 2015 at 09:04:35PM +0100, Dmitry Vyukov wrote:
> Hello,
> 
> While running syzkaller fuzzer I am seeing lots of the following
> use-after-free reports. Unfortunately all my numerous attempts to
> reproduce them in a controlled environment failed. They pop up during
> fuzzing periodically (once in several hours in a single VM), but
> whenever I try to stress-replay what happened in the VM before the
> report, the use-after-free does not reproduce. Can somebody
> knowledgeable in perf subsystem look at the report? Maybe it is
> possible to figure out what happened based purely on the report. I can
> pretty reliably test any proposed fixes.
> All reports look like this one. Then it is usually followed by other
> reports and eventually kernel hangs or dies. What happens in the
> fuzzer is essentially random syscalls with random arguments, tasks
> born and die concurrently and so on. I was able to reproduce it by
> restricting syscalls only to perf_event_open, perf ioctls and bpf
> syscall.

For the sake of trying to narrow it down:
does the error disappear when you stop using bpf syscall in your fuzzing?
If yes, then I could have missed some interaction between perf_event_free,
kprobe free and bpf_prog_free.
There was a race there before.
May be there is still something else.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-04 20:32 ` Alexei Starovoitov
@ 2015-12-04 21:00   ` Dmitry Vyukov
  2015-12-07 11:04     ` Dmitry Vyukov
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2015-12-04 21:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, LKML,
	syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Fri, Dec 4, 2015 at 9:32 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Fri, Dec 04, 2015 at 09:04:35PM +0100, Dmitry Vyukov wrote:
>> Hello,
>>
>> While running syzkaller fuzzer I am seeing lots of the following
>> use-after-free reports. Unfortunately all my numerous attempts to
>> reproduce them in a controlled environment failed. They pop up during
>> fuzzing periodically (once in several hours in a single VM), but
>> whenever I try to stress-replay what happened in the VM before the
>> report, the use-after-free does not reproduce. Can somebody
>> knowledgeable in perf subsystem look at the report? Maybe it is
>> possible to figure out what happened based purely on the report. I can
>> pretty reliably test any proposed fixes.
>> All reports look like this one. Then it is usually followed by other
>> reports and eventually kernel hangs or dies. What happens in the
>> fuzzer is essentially random syscalls with random arguments, tasks
>> born and die concurrently and so on. I was able to reproduce it by
>> restricting syscalls only to perf_event_open, perf ioctls and bpf
>> syscall.
>
> For the sake of trying to narrow it down:
> does the error disappear when you stop using bpf syscall in your fuzzing?
> If yes, then I could have missed some interaction between perf_event_free,
> kprobe free and bpf_prog_free.
> There was a race there before.
> May be there is still something else.


It is a good question. I will test it.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-04 21:00   ` Dmitry Vyukov
@ 2015-12-07 11:04     ` Dmitry Vyukov
  2015-12-07 11:06       ` Dmitry Vyukov
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2015-12-07 11:04 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, LKML,
	syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Fri, Dec 4, 2015 at 10:00 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Dec 4, 2015 at 9:32 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> On Fri, Dec 04, 2015 at 09:04:35PM +0100, Dmitry Vyukov wrote:
>>> Hello,
>>>
>>> While running syzkaller fuzzer I am seeing lots of the following
>>> use-after-free reports. Unfortunately all my numerous attempts to
>>> reproduce them in a controlled environment failed. They pop up during
>>> fuzzing periodically (once in several hours in a single VM), but
>>> whenever I try to stress-replay what happened in the VM before the
>>> report, the use-after-free does not reproduce. Can somebody
>>> knowledgeable in perf subsystem look at the report? Maybe it is
>>> possible to figure out what happened based purely on the report. I can
>>> pretty reliably test any proposed fixes.
>>> All reports look like this one. Then it is usually followed by other
>>> reports and eventually kernel hangs or dies. What happens in the
>>> fuzzer is essentially random syscalls with random arguments, tasks
>>> born and die concurrently and so on. I was able to reproduce it by
>>> restricting syscalls only to perf_event_open, perf ioctls and bpf
>>> syscall.
>>
>> For the sake of trying to narrow it down:
>> does the error disappear when you stop using bpf syscall in your fuzzing?
>> If yes, then I could have missed some interaction between perf_event_free,
>> kprobe free and bpf_prog_free.
>> There was a race there before.
>> May be there is still something else.
>
>
> It is a good question. I will test it.


Testing without bpf, so far hit the following warning on a non-tainted kernel:


------------[ cut here ]------------
WARNING: CPU: 0 PID: 26259 at kernel/events/core.c:2696
task_ctx_sched_out+0x12c/0x1c0()
Modules linked in:
CPU: 0 PID: 26259 Comm: syzkaller_execu Not tainted 4.4.0-rc3+ #150
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 0000000000000000 ffff88006579f9a8 ffffffff82c6f2a8 0000000041b58ab3
 ffffffff8788bf8d ffffffff82c6f1f6 ffff88003dd32d00 ffff88003469b9c0
 ffffffff81626130 ffff88006579fa10 0000000000000000 0000000000006693
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff82c6f2a8>] dump_stack+0xb2/0xfa lib/dump_stack.c:50
 [<ffffffff812ca446>] warn_slowpath_common+0xe6/0x170 kernel/panic.c:460
 [<ffffffff812ca699>] warn_slowpath_null+0x29/0x30 kernel/panic.c:493
 [<ffffffff8163302c>] task_ctx_sched_out+0x12c/0x1c0 kernel/events/core.c:2696
 [<     inline     >] perf_event_exit_task_context kernel/events/core.c:8815
 [<ffffffff8165669a>] perf_event_exit_task+0x27a/0xae0 kernel/events/core.c:8881
 [<ffffffff812d4a12>] do_exit+0x892/0x3050 kernel/exit.c:759
 [<ffffffff812d732c>] do_group_exit+0xec/0x390 kernel/exit.c:880
 [<ffffffff81302177>] get_signal+0x677/0x1bf0 kernel/signal.c:2307
 [<ffffffff8118645e>] do_signal+0x7e/0x2170 arch/x86/kernel/signal.c:709
 [<ffffffff81003a1e>] exit_to_usermode_loop+0xfe/0x1e0
arch/x86/entry/common.c:247
 [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:282
 [<ffffffff8100733b>] syscall_return_slowpath+0x16b/0x240
arch/x86/entry/common.c:344
 [<ffffffff868dafe2>] int_ret_from_sys_call+0x25/0x9f
arch/x86/entry/entry_64.S:281
---[ end trace c5e9cee9f12b7d5f ]---



static void task_ctx_sched_out(struct perf_event_context *ctx)
{
        struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);

        if (!cpuctx->task_ctx)
                return;

        if (WARN_ON_ONCE(ctx != cpuctx->task_ctx))
                return;

        ctx_sched_out(ctx, cpuctx, EVENT_ALL);
        cpuctx->task_ctx = NULL;
}

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-07 11:04     ` Dmitry Vyukov
@ 2015-12-07 11:06       ` Dmitry Vyukov
  2015-12-07 11:24         ` Dmitry Vyukov
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2015-12-07 11:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, LKML,
	syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Mon, Dec 7, 2015 at 12:04 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Dec 4, 2015 at 10:00 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Fri, Dec 4, 2015 at 9:32 PM, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>>> On Fri, Dec 04, 2015 at 09:04:35PM +0100, Dmitry Vyukov wrote:
>>>> Hello,
>>>>
>>>> While running syzkaller fuzzer I am seeing lots of the following
>>>> use-after-free reports. Unfortunately all my numerous attempts to
>>>> reproduce them in a controlled environment failed. They pop up during
>>>> fuzzing periodically (once in several hours in a single VM), but
>>>> whenever I try to stress-replay what happened in the VM before the
>>>> report, the use-after-free does not reproduce. Can somebody
>>>> knowledgeable in perf subsystem look at the report? Maybe it is
>>>> possible to figure out what happened based purely on the report. I can
>>>> pretty reliably test any proposed fixes.
>>>> All reports look like this one. Then it is usually followed by other
>>>> reports and eventually kernel hangs or dies. What happens in the
>>>> fuzzer is essentially random syscalls with random arguments, tasks
>>>> born and die concurrently and so on. I was able to reproduce it by
>>>> restricting syscalls only to perf_event_open, perf ioctls and bpf
>>>> syscall.
>>>
>>> For the sake of trying to narrow it down:
>>> does the error disappear when you stop using bpf syscall in your fuzzing?
>>> If yes, then I could have missed some interaction between perf_event_free,
>>> kprobe free and bpf_prog_free.
>>> There was a race there before.
>>> May be there is still something else.
>>
>>
>> It is a good question. I will test it.
>
>
> Testing without bpf, so far hit the following warning on a non-tainted kernel:
>
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 26259 at kernel/events/core.c:2696
> task_ctx_sched_out+0x12c/0x1c0()
> Modules linked in:
> CPU: 0 PID: 26259 Comm: syzkaller_execu Not tainted 4.4.0-rc3+ #150
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  0000000000000000 ffff88006579f9a8 ffffffff82c6f2a8 0000000041b58ab3
>  ffffffff8788bf8d ffffffff82c6f1f6 ffff88003dd32d00 ffff88003469b9c0
>  ffffffff81626130 ffff88006579fa10 0000000000000000 0000000000006693
> Call Trace:
>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  [<ffffffff82c6f2a8>] dump_stack+0xb2/0xfa lib/dump_stack.c:50
>  [<ffffffff812ca446>] warn_slowpath_common+0xe6/0x170 kernel/panic.c:460
>  [<ffffffff812ca699>] warn_slowpath_null+0x29/0x30 kernel/panic.c:493
>  [<ffffffff8163302c>] task_ctx_sched_out+0x12c/0x1c0 kernel/events/core.c:2696
>  [<     inline     >] perf_event_exit_task_context kernel/events/core.c:8815
>  [<ffffffff8165669a>] perf_event_exit_task+0x27a/0xae0 kernel/events/core.c:8881
>  [<ffffffff812d4a12>] do_exit+0x892/0x3050 kernel/exit.c:759
>  [<ffffffff812d732c>] do_group_exit+0xec/0x390 kernel/exit.c:880
>  [<ffffffff81302177>] get_signal+0x677/0x1bf0 kernel/signal.c:2307
>  [<ffffffff8118645e>] do_signal+0x7e/0x2170 arch/x86/kernel/signal.c:709
>  [<ffffffff81003a1e>] exit_to_usermode_loop+0xfe/0x1e0
> arch/x86/entry/common.c:247
>  [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:282
>  [<ffffffff8100733b>] syscall_return_slowpath+0x16b/0x240
> arch/x86/entry/common.c:344
>  [<ffffffff868dafe2>] int_ret_from_sys_call+0x25/0x9f
> arch/x86/entry/entry_64.S:281
> ---[ end trace c5e9cee9f12b7d5f ]---
>
>
>
> static void task_ctx_sched_out(struct perf_event_context *ctx)
> {
>         struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
>
>         if (!cpuctx->task_ctx)
>                 return;
>
>         if (WARN_ON_ONCE(ctx != cpuctx->task_ctx))
>                 return;
>
>         ctx_sched_out(ctx, cpuctx, EVENT_ALL);
>         cpuctx->task_ctx = NULL;
> }



Also this one. I saw them regularly, but I though that they are
induced by the use-after-frees, but now I see them on a non-tainted
kernel:


------------[ cut here ]------------
WARNING: CPU: 1 PID: 25357 at arch/x86/kernel/hw_breakpoint.c:160
arch_uninstall_hw_breakpoint+0x1e1/0x2a0()
Can't find any breakpoint slot
Modules linked in:
CPU: 1 PID: 25357 Comm: syzkaller_execu Not tainted 4.4.0-rc3+ #150
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 0000000000000001 ffff8800333c74e8 ffffffff82c6f2a8 0000000041b58ab3
 ffffffff8788bf8d ffffffff82c6f1f6 ffffffff8696a5e0 ffff8800333c7560
 ffffffff8696a620 00000000000000a0 0000000000000009 ffff8800333c74c8
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff82c6f2a8>] dump_stack+0xb2/0xfa lib/dump_stack.c:50
 [<ffffffff812ca446>] warn_slowpath_common+0xe6/0x170 kernel/panic.c:460
 [<ffffffff812ca579>] warn_slowpath_fmt+0xa9/0xd0 kernel/panic.c:472
 [<ffffffff811a1c31>] arch_uninstall_hw_breakpoint+0x1e1/0x2a0
arch/x86/kernel/hw_breakpoint.c:160
 [<ffffffff8165e1b5>] hw_breakpoint_del+0x15/0x20
kernel/events/hw_breakpoint.c:595
 [<ffffffff81631aa4>] event_sched_out+0x704/0x1270 kernel/events/core.c:1580
 [<ffffffff81632681>] group_sched_out+0x71/0x1f0 kernel/events/core.c:1606
 [<ffffffff81632d02>] ctx_sched_out+0x502/0x700 kernel/events/core.c:2414
 [<     inline     >] perf_event_context_sched_out kernel/events/core.c:2593
 [<ffffffff81645524>] __perf_event_task_sched_out+0x4d4/0x1800
kernel/events/core.c:2678
 [<     inline     >] perf_event_task_sched_out include/linux/perf_event.h:946
 [<     inline     >] prepare_task_switch kernel/sched/core.c:2489
 [<     inline     >] context_switch kernel/sched/core.c:2647
 [<ffffffff868c986f>] __schedule+0xa0f/0x1f40 kernel/sched/core.c:3156
 [<ffffffff868cae0d>] schedule+0x6d/0x210 kernel/sched/core.c:3185
 [<     inline     >] freezable_schedule include/linux/freezer.h:171
 [<ffffffff814c09c6>] futex_wait_queue_me+0x306/0x680 kernel/futex.c:2188
 [<ffffffff814c238c>] futex_wait+0x22c/0x6b0 kernel/futex.c:2303
 [<ffffffff814c6c3d>] do_futex+0x20d/0x19d0 kernel/futex.c:3067
 [<     inline     >] SYSC_futex kernel/futex.c:3127
 [<ffffffff814c8533>] SyS_futex+0x133/0x2b0 kernel/futex.c:3095
 [<ffffffff868dafb8>] tracesys_phase2+0x88/0x8d arch/x86/entry/entry_64.S:269
---[ end trace 624ab9515f02dd2b ]---

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-07 11:06       ` Dmitry Vyukov
@ 2015-12-07 11:24         ` Dmitry Vyukov
  0 siblings, 0 replies; 46+ messages in thread
From: Dmitry Vyukov @ 2015-12-07 11:24 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, LKML,
	syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Mon, Dec 7, 2015 at 12:06 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Mon, Dec 7, 2015 at 12:04 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Fri, Dec 4, 2015 at 10:00 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> On Fri, Dec 4, 2015 at 9:32 PM, Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>>>> On Fri, Dec 04, 2015 at 09:04:35PM +0100, Dmitry Vyukov wrote:
>>>>> Hello,
>>>>>
>>>>> While running syzkaller fuzzer I am seeing lots of the following
>>>>> use-after-free reports. Unfortunately all my numerous attempts to
>>>>> reproduce them in a controlled environment failed. They pop up during
>>>>> fuzzing periodically (once in several hours in a single VM), but
>>>>> whenever I try to stress-replay what happened in the VM before the
>>>>> report, the use-after-free does not reproduce. Can somebody
>>>>> knowledgeable in perf subsystem look at the report? Maybe it is
>>>>> possible to figure out what happened based purely on the report. I can
>>>>> pretty reliably test any proposed fixes.
>>>>> All reports look like this one. Then it is usually followed by other
>>>>> reports and eventually kernel hangs or dies. What happens in the
>>>>> fuzzer is essentially random syscalls with random arguments, tasks
>>>>> born and die concurrently and so on. I was able to reproduce it by
>>>>> restricting syscalls only to perf_event_open, perf ioctls and bpf
>>>>> syscall.
>>>>
>>>> For the sake of trying to narrow it down:
>>>> does the error disappear when you stop using bpf syscall in your fuzzing?
>>>> If yes, then I could have missed some interaction between perf_event_free,
>>>> kprobe free and bpf_prog_free.
>>>> There was a race there before.
>>>> May be there is still something else.
>>>
>>>
>>> It is a good question. I will test it.
>>
>>
>> Testing without bpf, so far hit the following warning on a non-tainted kernel:
>>
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 26259 at kernel/events/core.c:2696
>> task_ctx_sched_out+0x12c/0x1c0()
>> Modules linked in:
>> CPU: 0 PID: 26259 Comm: syzkaller_execu Not tainted 4.4.0-rc3+ #150
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>  0000000000000000 ffff88006579f9a8 ffffffff82c6f2a8 0000000041b58ab3
>>  ffffffff8788bf8d ffffffff82c6f1f6 ffff88003dd32d00 ffff88003469b9c0
>>  ffffffff81626130 ffff88006579fa10 0000000000000000 0000000000006693
>> Call Trace:
>>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>>  [<ffffffff82c6f2a8>] dump_stack+0xb2/0xfa lib/dump_stack.c:50
>>  [<ffffffff812ca446>] warn_slowpath_common+0xe6/0x170 kernel/panic.c:460
>>  [<ffffffff812ca699>] warn_slowpath_null+0x29/0x30 kernel/panic.c:493
>>  [<ffffffff8163302c>] task_ctx_sched_out+0x12c/0x1c0 kernel/events/core.c:2696
>>  [<     inline     >] perf_event_exit_task_context kernel/events/core.c:8815
>>  [<ffffffff8165669a>] perf_event_exit_task+0x27a/0xae0 kernel/events/core.c:8881
>>  [<ffffffff812d4a12>] do_exit+0x892/0x3050 kernel/exit.c:759
>>  [<ffffffff812d732c>] do_group_exit+0xec/0x390 kernel/exit.c:880
>>  [<ffffffff81302177>] get_signal+0x677/0x1bf0 kernel/signal.c:2307
>>  [<ffffffff8118645e>] do_signal+0x7e/0x2170 arch/x86/kernel/signal.c:709
>>  [<ffffffff81003a1e>] exit_to_usermode_loop+0xfe/0x1e0
>> arch/x86/entry/common.c:247
>>  [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:282
>>  [<ffffffff8100733b>] syscall_return_slowpath+0x16b/0x240
>> arch/x86/entry/common.c:344
>>  [<ffffffff868dafe2>] int_ret_from_sys_call+0x25/0x9f
>> arch/x86/entry/entry_64.S:281
>> ---[ end trace c5e9cee9f12b7d5f ]---
>>
>>
>>
>> static void task_ctx_sched_out(struct perf_event_context *ctx)
>> {
>>         struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
>>
>>         if (!cpuctx->task_ctx)
>>                 return;
>>
>>         if (WARN_ON_ONCE(ctx != cpuctx->task_ctx))
>>                 return;
>>
>>         ctx_sched_out(ctx, cpuctx, EVENT_ALL);
>>         cpuctx->task_ctx = NULL;
>> }
>
>
>
> Also this one. I saw them regularly, but I though that they are
> induced by the use-after-frees, but now I see them on a non-tainted
> kernel:
>
>
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 25357 at arch/x86/kernel/hw_breakpoint.c:160
> arch_uninstall_hw_breakpoint+0x1e1/0x2a0()
> Can't find any breakpoint slot
> Modules linked in:
> CPU: 1 PID: 25357 Comm: syzkaller_execu Not tainted 4.4.0-rc3+ #150
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  0000000000000001 ffff8800333c74e8 ffffffff82c6f2a8 0000000041b58ab3
>  ffffffff8788bf8d ffffffff82c6f1f6 ffffffff8696a5e0 ffff8800333c7560
>  ffffffff8696a620 00000000000000a0 0000000000000009 ffff8800333c74c8
> Call Trace:
>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  [<ffffffff82c6f2a8>] dump_stack+0xb2/0xfa lib/dump_stack.c:50
>  [<ffffffff812ca446>] warn_slowpath_common+0xe6/0x170 kernel/panic.c:460
>  [<ffffffff812ca579>] warn_slowpath_fmt+0xa9/0xd0 kernel/panic.c:472
>  [<ffffffff811a1c31>] arch_uninstall_hw_breakpoint+0x1e1/0x2a0
> arch/x86/kernel/hw_breakpoint.c:160
>  [<ffffffff8165e1b5>] hw_breakpoint_del+0x15/0x20
> kernel/events/hw_breakpoint.c:595
>  [<ffffffff81631aa4>] event_sched_out+0x704/0x1270 kernel/events/core.c:1580
>  [<ffffffff81632681>] group_sched_out+0x71/0x1f0 kernel/events/core.c:1606
>  [<ffffffff81632d02>] ctx_sched_out+0x502/0x700 kernel/events/core.c:2414
>  [<     inline     >] perf_event_context_sched_out kernel/events/core.c:2593
>  [<ffffffff81645524>] __perf_event_task_sched_out+0x4d4/0x1800
> kernel/events/core.c:2678
>  [<     inline     >] perf_event_task_sched_out include/linux/perf_event.h:946
>  [<     inline     >] prepare_task_switch kernel/sched/core.c:2489
>  [<     inline     >] context_switch kernel/sched/core.c:2647
>  [<ffffffff868c986f>] __schedule+0xa0f/0x1f40 kernel/sched/core.c:3156
>  [<ffffffff868cae0d>] schedule+0x6d/0x210 kernel/sched/core.c:3185
>  [<     inline     >] freezable_schedule include/linux/freezer.h:171
>  [<ffffffff814c09c6>] futex_wait_queue_me+0x306/0x680 kernel/futex.c:2188
>  [<ffffffff814c238c>] futex_wait+0x22c/0x6b0 kernel/futex.c:2303
>  [<ffffffff814c6c3d>] do_futex+0x20d/0x19d0 kernel/futex.c:3067
>  [<     inline     >] SYSC_futex kernel/futex.c:3127
>  [<ffffffff814c8533>] SyS_futex+0x133/0x2b0 kernel/futex.c:3095
>  [<ffffffff868dafb8>] tracesys_phase2+0x88/0x8d arch/x86/entry/entry_64.S:269
> ---[ end trace 624ab9515f02dd2b ]---



Hit exactly the same use-after-free report without using bpf (just
perf_event_open and perf ioctl's).

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-04 20:04 use-after-free in __perf_install_in_context Dmitry Vyukov
  2015-12-04 20:32 ` Alexei Starovoitov
@ 2015-12-07 15:36 ` Peter Zijlstra
  2015-12-07 16:09   ` Dmitry Vyukov
  2015-12-08 16:22 ` use-after-free in __perf_install_in_context Peter Zijlstra
  2015-12-08 16:27 ` Peter Zijlstra
  3 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2015-12-07 15:36 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Fri, Dec 04, 2015 at 09:04:35PM +0100, Dmitry Vyukov wrote:
> ==================================================================
> BUG: KASAN: use-after-free in __lock_acquire+0x4e99/0x5100 at addr
> ffff880038706e60
> Read of size 8 by task syzkaller_execu/6513
> =============================================================================
> BUG kmalloc-1024 (Not tainted): kasan: bad access detected
> -----------------------------------------------------------------------------
> 
> Disabling lock debugging due to kernel taint
> INFO: Allocated in alloc_perf_context+0x4c/0x100 age=263 cpu=1 pid=6428
> [<     inline     >] kzalloc include/linux/slab.h:602
> [<      none      >] alloc_perf_context+0x4c/0x100 kernel/events/core.c:3399
> [<      none      >] find_get_context+0x187/0x830 kernel/events/core.c:3506
> [<      none      >] SYSC_perf_event_open+0xe50/0x21a0 kernel/events/core.c:8375
> [<      none      >] SyS_perf_event_open+0x39/0x50 kernel/events/core.c:8236
> [<      none      >] tracesys_phase2+0x88/0x8d arch/x86/entry/entry_64.S:269
> 
> INFO: Freed in free_ctx+0x4b/0x70 age=174 cpu=2 pid=8105
> [<      none      >] kfree+0x26f/0x3e0 mm/slub.c:3632
> [<      none      >] free_ctx+0x4b/0x70 kernel/events/core.c:872
> [<     inline     >] __rcu_reclaim kernel/rcu/rcu.h:118
> [<     inline     >] rcu_do_batch kernel/rcu/tree.c:2693
> [<     inline     >] invoke_rcu_callbacks kernel/rcu/tree.c:2961
> [<     inline     >] __rcu_process_callbacks kernel/rcu/tree.c:2928
> [<      none      >] rcu_process_callbacks+0x631/0x19e0 kernel/rcu/tree.c:2945
> [<      none      >] __do_softirq+0x2e5/0xb40 kernel/softirq.c:273
> [<     inline     >] invoke_softirq kernel/softirq.c:350
> [<      none      >] irq_exit+0x165/0x1e0 kernel/softirq.c:391
> [<     inline     >] exiting_irq ./arch/x86/include/asm/apic.h:653
> [<      none      >] smp_apic_timer_interrupt+0x88/0xc0
> arch/x86/kernel/apic/apic.c:926
> [<      none      >] apic_timer_interrupt+0x87/0x90
> arch/x86/entry/entry_64.S:678

So it would be _awesome_ if we could somehow extend this callchain to
include the site that calls call_rcu().

In any case, this looks like we got the refcounting wrong some place,
I'll go audit that stuff.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-07 15:36 ` Peter Zijlstra
@ 2015-12-07 16:09   ` Dmitry Vyukov
  2015-12-08  3:24     ` Alexei Starovoitov
  2015-12-08 16:44     ` Peter Zijlstra
  0 siblings, 2 replies; 46+ messages in thread
From: Dmitry Vyukov @ 2015-12-07 16:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Mon, Dec 7, 2015 at 4:36 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Dec 04, 2015 at 09:04:35PM +0100, Dmitry Vyukov wrote:
>> ==================================================================
>> BUG: KASAN: use-after-free in __lock_acquire+0x4e99/0x5100 at addr
>> ffff880038706e60
>> Read of size 8 by task syzkaller_execu/6513
>> =============================================================================
>> BUG kmalloc-1024 (Not tainted): kasan: bad access detected
>> -----------------------------------------------------------------------------
>>
>> Disabling lock debugging due to kernel taint
>> INFO: Allocated in alloc_perf_context+0x4c/0x100 age=263 cpu=1 pid=6428
>> [<     inline     >] kzalloc include/linux/slab.h:602
>> [<      none      >] alloc_perf_context+0x4c/0x100 kernel/events/core.c:3399
>> [<      none      >] find_get_context+0x187/0x830 kernel/events/core.c:3506
>> [<      none      >] SYSC_perf_event_open+0xe50/0x21a0 kernel/events/core.c:8375
>> [<      none      >] SyS_perf_event_open+0x39/0x50 kernel/events/core.c:8236
>> [<      none      >] tracesys_phase2+0x88/0x8d arch/x86/entry/entry_64.S:269
>>
>> INFO: Freed in free_ctx+0x4b/0x70 age=174 cpu=2 pid=8105
>> [<      none      >] kfree+0x26f/0x3e0 mm/slub.c:3632
>> [<      none      >] free_ctx+0x4b/0x70 kernel/events/core.c:872
>> [<     inline     >] __rcu_reclaim kernel/rcu/rcu.h:118
>> [<     inline     >] rcu_do_batch kernel/rcu/tree.c:2693
>> [<     inline     >] invoke_rcu_callbacks kernel/rcu/tree.c:2961
>> [<     inline     >] __rcu_process_callbacks kernel/rcu/tree.c:2928
>> [<      none      >] rcu_process_callbacks+0x631/0x19e0 kernel/rcu/tree.c:2945
>> [<      none      >] __do_softirq+0x2e5/0xb40 kernel/softirq.c:273
>> [<     inline     >] invoke_softirq kernel/softirq.c:350
>> [<      none      >] irq_exit+0x165/0x1e0 kernel/softirq.c:391
>> [<     inline     >] exiting_irq ./arch/x86/include/asm/apic.h:653
>> [<      none      >] smp_apic_timer_interrupt+0x88/0xc0
>> arch/x86/kernel/apic/apic.c:926
>> [<      none      >] apic_timer_interrupt+0x87/0x90
>> arch/x86/entry/entry_64.S:678
>
> So it would be _awesome_ if we could somehow extend this callchain to
> include the site that calls call_rcu().

We have a patch for KASAN in works that adds so-called stack depot
which allows to map a stack trace onto uint32 id. Then we can plumb
these ids not only to memory blocks, but also to rcu callbacks, thread
pool items, etc. But it's not yet ready.
If your audit does not give any results, can you give me a patch that
prints rcu callback submission stacks in KASAN reports?


> In any case, this looks like we got the refcounting wrong some place,
> I'll go audit that stuff.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-07 16:09   ` Dmitry Vyukov
@ 2015-12-08  3:24     ` Alexei Starovoitov
  2015-12-08 16:12       ` Dmitry Vyukov
  2015-12-08 16:44     ` Peter Zijlstra
  1 sibling, 1 reply; 46+ messages in thread
From: Alexei Starovoitov @ 2015-12-08  3:24 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, LKML,
	syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Mon, Dec 07, 2015 at 05:09:21PM +0100, Dmitry Vyukov wrote:
> > So it would be _awesome_ if we could somehow extend this callchain to
> > include the site that calls call_rcu().
> 
> We have a patch for KASAN in works that adds so-called stack depot
> which allows to map a stack trace onto uint32 id. Then we can plumb

I was hacking something similar to categorize stack traces with u32 id.
How are you planning to limit the number of such stack traces ?
and what is the interface for user space to get stack trace from an id?


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-08  3:24     ` Alexei Starovoitov
@ 2015-12-08 16:12       ` Dmitry Vyukov
  2015-12-08 17:54         ` Alexei Starovoitov
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2015-12-08 16:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, LKML,
	syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Tue, Dec 8, 2015 at 4:24 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Mon, Dec 07, 2015 at 05:09:21PM +0100, Dmitry Vyukov wrote:
>> > So it would be _awesome_ if we could somehow extend this callchain to
>> > include the site that calls call_rcu().
>>
>> We have a patch for KASAN in works that adds so-called stack depot
>> which allows to map a stack trace onto uint32 id. Then we can plumb
>
> I was hacking something similar to categorize stack traces with u32 id.
> How are you planning to limit the number of such stack traces ?
> and what is the interface for user space to get stack trace from an id?


We don't limit number of stack traces. Kernel does not seem to use
data-driven recursion extensively, so there is limited number of
stacks. Though, probably we will need to strip non-interrupt part for
interrupt stacks, otherwise that can produce unbounded number of
different stacks.
There is no interface for user-space, it is used only inside of kernel
to save stacks for memory blocks (rcu callbacks, thread pool items in
the future).
The design is based on what we successfully and extensively use in
user-space sanitizers for years. Current code is here:
https://github.com/ramosian-glider/kasan/commit/fb0eefd212366401ed5ad244233ef379a27bfb46

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-04 20:04 use-after-free in __perf_install_in_context Dmitry Vyukov
  2015-12-04 20:32 ` Alexei Starovoitov
  2015-12-07 15:36 ` Peter Zijlstra
@ 2015-12-08 16:22 ` Peter Zijlstra
  2015-12-08 18:57   ` Ingo Molnar
  2015-12-08 16:27 ` Peter Zijlstra
  3 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2015-12-08 16:22 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Fri, Dec 04, 2015 at 09:04:35PM +0100, Dmitry Vyukov wrote:
> Hello,
> 
> While running syzkaller fuzzer I am seeing lots of the following
> use-after-free reports. Unfortunately all my numerous attempts to
> reproduce them in a controlled environment failed. They pop up during
> fuzzing periodically (once in several hours in a single VM), but
> whenever I try to stress-replay what happened in the VM before the
> report, the use-after-free does not reproduce. 

Typical that :/

> Can somebody knowledgeable in perf subsystem look at the report? Maybe
> it is possible to figure out what happened based purely on the report.
> I can pretty reliably test any proposed fixes.

So I'm still going over the code, but meanwhile I tried reproducing this
using the perf_fuzzer and some debug code, but no luck with that.

Since you seem to be able to reproduce, could you do a run with the
below patch in to see if it tickles something?


---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 36babfd..58e504c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -872,13 +872,22 @@ static void free_ctx(struct rcu_head *head)
 	kfree(ctx);
 }
 
+#define for_each_task_context_nr(ctxn)					\
+	for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
+
 static void put_ctx(struct perf_event_context *ctx)
 {
 	if (atomic_dec_and_test(&ctx->refcount)) {
 		if (ctx->parent_ctx)
 			put_ctx(ctx->parent_ctx);
-		if (ctx->task)
+		if (ctx->task) {
+			int ctxn;
+
+			for_each_task_context_nr(ctxn)
+				WARN_ON(ctx->task->perf_event_ctxp[ctxn] == ctx);
+
 			put_task_struct(ctx->task);
+		}
 		call_rcu(&ctx->rcu_head, free_ctx);
 	}
 }
@@ -2649,9 +2658,6 @@ static void perf_pmu_sched_task(struct task_struct *prev,
 static void perf_event_switch(struct task_struct *task,
 			      struct task_struct *next_prev, bool sched_in);
 
-#define for_each_task_context_nr(ctxn)					\
-	for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
-
 /*
  * Called from scheduler to remove the events of the current task,
  * with interrupts disabled.
@@ -8931,6 +8937,8 @@ again:
 
 		mutex_unlock(&ctx->mutex);
 
+		task->perf_event_ctxp[ctxn] = NULL;
+
 		put_ctx(ctx);
 	}
 }

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-04 20:04 use-after-free in __perf_install_in_context Dmitry Vyukov
                   ` (2 preceding siblings ...)
  2015-12-08 16:22 ` use-after-free in __perf_install_in_context Peter Zijlstra
@ 2015-12-08 16:27 ` Peter Zijlstra
  2015-12-08 16:50   ` Dmitry Vyukov
  3 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2015-12-08 16:27 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Fri, Dec 04, 2015 at 09:04:35PM +0100, Dmitry Vyukov wrote:

> I was able to reproduce it by
> restricting syscalls only to perf_event_open, perf ioctls and bpf
> syscall.

Does that include cpu hotplug?

Sasha just reminded me there is a swevent use-after-free issue on
hotplug.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-07 16:09   ` Dmitry Vyukov
  2015-12-08  3:24     ` Alexei Starovoitov
@ 2015-12-08 16:44     ` Peter Zijlstra
  2015-12-08 19:14       ` Dmitry Vyukov
  1 sibling, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2015-12-08 16:44 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Mon, Dec 07, 2015 at 05:09:21PM +0100, Dmitry Vyukov wrote:
> If your audit does not give any results, can you give me a patch that
> prints rcu callback submission stacks in KASAN reports?

Just because my brain is fried for today, I figured I'd give it a go.

Completely untested..

---
 include/linux/slub_def.h |  2 ++
 kernel/rcu/tree_plugin.h |  1 +
 mm/slub.c                | 34 ++++++++++++++++++++++++++++++++--
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 33885118523c..445b586c0bfa 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -129,4 +129,6 @@ static inline void *virt_to_obj(struct kmem_cache *s,
 void object_err(struct kmem_cache *s, struct page *page,
 		u8 *object, char *reason);
 
+void object_set_indirect(const void *addr);
+
 #endif /* _LINUX_SLUB_DEF_H */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 630c19772630..4e1e79e01e34 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -659,6 +659,7 @@ static void rcu_preempt_do_callbacks(void)
  */
 void call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
+	object_set_indirect(head);
 	__call_rcu(head, func, rcu_state_p, -1, 0);
 }
 EXPORT_SYMBOL_GPL(call_rcu);
diff --git a/mm/slub.c b/mm/slub.c
index 46997517406e..6977dc7cffcd 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -196,7 +196,12 @@ struct track {
 	unsigned long when;	/* When did the operation occur */
 };
 
-enum track_item { TRACK_ALLOC, TRACK_FREE };
+enum track_item {
+	TRACK_ALLOC = 0,
+	TRACK_FREE,
+	TRACK_INDIRECT,
+	TRACK_NR,
+};
 
 #ifdef CONFIG_SYSFS
 static int sysfs_slab_add(struct kmem_cache *);
@@ -551,6 +556,7 @@ static void init_tracking(struct kmem_cache *s, void *object)
 
 	set_track(s, object, TRACK_FREE, 0UL);
 	set_track(s, object, TRACK_ALLOC, 0UL);
+	set_track(s, object, TRACK_INDIRECT, 0UL);
 }
 
 static void print_track(const char *s, struct track *t)
@@ -579,6 +585,7 @@ static void print_tracking(struct kmem_cache *s, void *object)
 
 	print_track("Allocated", get_track(s, object, TRACK_ALLOC));
 	print_track("Freed", get_track(s, object, TRACK_FREE));
+	print_track("Indirect", get_track(s, object, TRACK_INDIRECT));
 }
 
 static void print_page_info(struct page *page)
@@ -652,6 +659,29 @@ static void print_trailer(struct kmem_cache *s, struct page *page, u8 *p)
 	dump_stack();
 }
 
+void object_set_indirect(const void *addr)
+{
+	if ((addr >= (void *)PAGE_OFFSET) &&
+		(addr < high_memory)) {
+		struct page *page = virt_to_head_page(addr);
+
+		if (PageSlab(page)) {
+			void *object;
+			struct kmem_cache *cache = page->slab_cache;
+			void *last_object;
+
+			object = virt_to_obj(cache, page_address(page), addr);
+			last_object = page_address(page) +
+				page->objects * cache->size;
+
+			if (unlikely(object > last_object))
+				object = last_object; /* we hit into padding */
+
+			set_track(cache, object, TRACK_INDIRECT, (unsigned long)addr);
+		}
+	}
+}
+
 void object_err(struct kmem_cache *s, struct page *page,
 			u8 *object, char *reason)
 {
@@ -767,7 +797,7 @@ static int check_pad_bytes(struct kmem_cache *s, struct page *page, u8 *p)
 
 	if (s->flags & SLAB_STORE_USER)
 		/* We also have user information there */
-		off += 2 * sizeof(struct track);
+		off += TRACK_NR * sizeof(struct track);
 
 	if (s->size == off)
 		return 1;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-08 16:27 ` Peter Zijlstra
@ 2015-12-08 16:50   ` Dmitry Vyukov
  0 siblings, 0 replies; 46+ messages in thread
From: Dmitry Vyukov @ 2015-12-08 16:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Tue, Dec 8, 2015 at 5:27 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Dec 04, 2015 at 09:04:35PM +0100, Dmitry Vyukov wrote:
>
>> I was able to reproduce it by
>> restricting syscalls only to perf_event_open, perf ioctls and bpf
>> syscall.
>
> Does that include cpu hotplug?

I have CONFIG_HOTPLUG_CPU enabled. But I don't do actual plug/unplug.


> Sasha just reminded me there is a swevent use-after-free issue on
> hotplug.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-08 16:12       ` Dmitry Vyukov
@ 2015-12-08 17:54         ` Alexei Starovoitov
  2015-12-08 17:56           ` Dmitry Vyukov
  0 siblings, 1 reply; 46+ messages in thread
From: Alexei Starovoitov @ 2015-12-08 17:54 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, LKML,
	syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Tue, Dec 08, 2015 at 05:12:04PM +0100, Dmitry Vyukov wrote:
> On Tue, Dec 8, 2015 at 4:24 AM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Mon, Dec 07, 2015 at 05:09:21PM +0100, Dmitry Vyukov wrote:
> >> > So it would be _awesome_ if we could somehow extend this callchain to
> >> > include the site that calls call_rcu().
> >>
> >> We have a patch for KASAN in works that adds so-called stack depot
> >> which allows to map a stack trace onto uint32 id. Then we can plumb
> >
> > I was hacking something similar to categorize stack traces with u32 id.
> > How are you planning to limit the number of such stack traces ?
> > and what is the interface for user space to get stack trace from an id?
> 
> 
> We don't limit number of stack traces. Kernel does not seem to use
> data-driven recursion extensively, so there is limited number of
> stacks. Though, probably we will need to strip non-interrupt part for
> interrupt stacks, otherwise that can produce unbounded number of
> different stacks.
> There is no interface for user-space, it is used only inside of kernel
> to save stacks for memory blocks (rcu callbacks, thread pool items in
> the future).
> The design is based on what we successfully and extensively use in
> user-space sanitizers for years. Current code is here:
> https://github.com/ramosian-glider/kasan/commit/fb0eefd212366401ed5ad244233ef379a27bfb46

why did you pick approach to never free accumulated stacks?
That limits usability a lot, since once kasan starts using it only
reboot will free the memory. ouch.
what worked for user space doesn't work for kernel.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-08 17:54         ` Alexei Starovoitov
@ 2015-12-08 17:56           ` Dmitry Vyukov
  2015-12-08 18:05             ` Alexei Starovoitov
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2015-12-08 17:56 UTC (permalink / raw)
  To: syzkaller
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, LKML,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Tue, Dec 8, 2015 at 6:54 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Tue, Dec 08, 2015 at 05:12:04PM +0100, Dmitry Vyukov wrote:
>> On Tue, Dec 8, 2015 at 4:24 AM, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>> > On Mon, Dec 07, 2015 at 05:09:21PM +0100, Dmitry Vyukov wrote:
>> >> > So it would be _awesome_ if we could somehow extend this callchain to
>> >> > include the site that calls call_rcu().
>> >>
>> >> We have a patch for KASAN in works that adds so-called stack depot
>> >> which allows to map a stack trace onto uint32 id. Then we can plumb
>> >
>> > I was hacking something similar to categorize stack traces with u32 id.
>> > How are you planning to limit the number of such stack traces ?
>> > and what is the interface for user space to get stack trace from an id?
>>
>>
>> We don't limit number of stack traces. Kernel does not seem to use
>> data-driven recursion extensively, so there is limited number of
>> stacks. Though, probably we will need to strip non-interrupt part for
>> interrupt stacks, otherwise that can produce unbounded number of
>> different stacks.
>> There is no interface for user-space, it is used only inside of kernel
>> to save stacks for memory blocks (rcu callbacks, thread pool items in
>> the future).
>> The design is based on what we successfully and extensively use in
>> user-space sanitizers for years. Current code is here:
>> https://github.com/ramosian-glider/kasan/commit/fb0eefd212366401ed5ad244233ef379a27bfb46
>
> why did you pick approach to never free accumulated stacks?
> That limits usability a lot, since once kasan starts using it only
> reboot will free the memory. ouch.
> what worked for user space doesn't work for kernel.


Freeing and reusing will slow down and complicate code significantly.
And it is not yet proved to be necessary.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-08 17:56           ` Dmitry Vyukov
@ 2015-12-08 18:05             ` Alexei Starovoitov
  2015-12-08 18:35               ` Dmitry Vyukov
  0 siblings, 1 reply; 46+ messages in thread
From: Alexei Starovoitov @ 2015-12-08 18:05 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzkaller, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	LKML, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Tue, Dec 08, 2015 at 06:56:23PM +0100, Dmitry Vyukov wrote:
> On Tue, Dec 8, 2015 at 6:54 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Tue, Dec 08, 2015 at 05:12:04PM +0100, Dmitry Vyukov wrote:
> >> On Tue, Dec 8, 2015 at 4:24 AM, Alexei Starovoitov
> >> <alexei.starovoitov@gmail.com> wrote:
> >> > On Mon, Dec 07, 2015 at 05:09:21PM +0100, Dmitry Vyukov wrote:
> >> >> > So it would be _awesome_ if we could somehow extend this callchain to
> >> >> > include the site that calls call_rcu().
> >> >>
> >> >> We have a patch for KASAN in works that adds so-called stack depot
> >> >> which allows to map a stack trace onto uint32 id. Then we can plumb
> >> >
> >> > I was hacking something similar to categorize stack traces with u32 id.
> >> > How are you planning to limit the number of such stack traces ?
> >> > and what is the interface for user space to get stack trace from an id?
> >>
> >>
> >> We don't limit number of stack traces. Kernel does not seem to use
> >> data-driven recursion extensively, so there is limited number of
> >> stacks. Though, probably we will need to strip non-interrupt part for
> >> interrupt stacks, otherwise that can produce unbounded number of
> >> different stacks.
> >> There is no interface for user-space, it is used only inside of kernel
> >> to save stacks for memory blocks (rcu callbacks, thread pool items in
> >> the future).
> >> The design is based on what we successfully and extensively use in
> >> user-space sanitizers for years. Current code is here:
> >> https://github.com/ramosian-glider/kasan/commit/fb0eefd212366401ed5ad244233ef379a27bfb46
> >
> > why did you pick approach to never free accumulated stacks?
> > That limits usability a lot, since once kasan starts using it only
> > reboot will free the memory. ouch.
> > what worked for user space doesn't work for kernel.
> 
> 
> Freeing and reusing will slow down and complicate code significantly.
> And it is not yet proved to be necessary.

It's a joke, right? allocating kernel pages without ability to free?!


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-08 18:05             ` Alexei Starovoitov
@ 2015-12-08 18:35               ` Dmitry Vyukov
  2015-12-08 19:56                 ` Alexei Starovoitov
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2015-12-08 18:35 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: syzkaller, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	LKML, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Tue, Dec 8, 2015 at 7:05 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Tue, Dec 08, 2015 at 06:56:23PM +0100, Dmitry Vyukov wrote:
>> On Tue, Dec 8, 2015 at 6:54 PM, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>> > On Tue, Dec 08, 2015 at 05:12:04PM +0100, Dmitry Vyukov wrote:
>> >> On Tue, Dec 8, 2015 at 4:24 AM, Alexei Starovoitov
>> >> <alexei.starovoitov@gmail.com> wrote:
>> >> > On Mon, Dec 07, 2015 at 05:09:21PM +0100, Dmitry Vyukov wrote:
>> >> >> > So it would be _awesome_ if we could somehow extend this callchain to
>> >> >> > include the site that calls call_rcu().
>> >> >>
>> >> >> We have a patch for KASAN in works that adds so-called stack depot
>> >> >> which allows to map a stack trace onto uint32 id. Then we can plumb
>> >> >
>> >> > I was hacking something similar to categorize stack traces with u32 id.
>> >> > How are you planning to limit the number of such stack traces ?
>> >> > and what is the interface for user space to get stack trace from an id?
>> >>
>> >>
>> >> We don't limit number of stack traces. Kernel does not seem to use
>> >> data-driven recursion extensively, so there is limited number of
>> >> stacks. Though, probably we will need to strip non-interrupt part for
>> >> interrupt stacks, otherwise that can produce unbounded number of
>> >> different stacks.
>> >> There is no interface for user-space, it is used only inside of kernel
>> >> to save stacks for memory blocks (rcu callbacks, thread pool items in
>> >> the future).
>> >> The design is based on what we successfully and extensively use in
>> >> user-space sanitizers for years. Current code is here:
>> >> https://github.com/ramosian-glider/kasan/commit/fb0eefd212366401ed5ad244233ef379a27bfb46
>> >
>> > why did you pick approach to never free accumulated stacks?
>> > That limits usability a lot, since once kasan starts using it only
>> > reboot will free the memory. ouch.
>> > what worked for user space doesn't work for kernel.
>>
>>
>> Freeing and reusing will slow down and complicate code significantly.
>> And it is not yet proved to be necessary.
>
> It's a joke, right? allocating kernel pages without ability to free?!


The plan is to smash kernel much earlier than it will run out of memory.


I think this scheme will work pretty well.
I've counted 34403 calls to
kmalloc/kfree/kmem_cache_alloc/kmem_cache_free in kernel. Multiply
this by 2 to account for different stacks leading to the same
malloc/free and assuming that we store 16-byte header and 20 4-byte
frames, this gives us about 6MB. I can live with that. I can live with
60MB as well.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-08 16:22 ` use-after-free in __perf_install_in_context Peter Zijlstra
@ 2015-12-08 18:57   ` Ingo Molnar
  2015-12-09  9:05     ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Ingo Molnar @ 2015-12-08 18:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo, LKML,
	syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Dec 04, 2015 at 09:04:35PM +0100, Dmitry Vyukov wrote:
> > Hello,
> > 
> > While running syzkaller fuzzer I am seeing lots of the following
> > use-after-free reports. Unfortunately all my numerous attempts to
> > reproduce them in a controlled environment failed. They pop up during
> > fuzzing periodically (once in several hours in a single VM), but
> > whenever I try to stress-replay what happened in the VM before the
> > report, the use-after-free does not reproduce. 
> 
> Typical that :/
> 
> > Can somebody knowledgeable in perf subsystem look at the report? Maybe
> > it is possible to figure out what happened based purely on the report.
> > I can pretty reliably test any proposed fixes.
> 
> So I'm still going over the code, but meanwhile I tried reproducing this
> using the perf_fuzzer and some debug code, but no luck with that.
> 
> Since you seem to be able to reproduce, could you do a run with the
> below patch in to see if it tickles something?

Btw., could we add more redundancy / debug code to the refcounting code? It seems 
to be a frequent source of very hard to find/fix races/bugs - so it should be ripe 
for some extra debug infrastructure ...

Thanks,


	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-08 16:44     ` Peter Zijlstra
@ 2015-12-08 19:14       ` Dmitry Vyukov
  2015-12-10 19:57         ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2015-12-08 19:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Tue, Dec 8, 2015 at 5:44 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Dec 07, 2015 at 05:09:21PM +0100, Dmitry Vyukov wrote:
>> If your audit does not give any results, can you give me a patch that
>> prints rcu callback submission stacks in KASAN reports?
>
> Just because my brain is fried for today, I figured I'd give it a go.
>
> Completely untested..
>
> ---
>  include/linux/slub_def.h |  2 ++
>  kernel/rcu/tree_plugin.h |  1 +
>  mm/slub.c                | 34 ++++++++++++++++++++++++++++++++--
>  3 files changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index 33885118523c..445b586c0bfa 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -129,4 +129,6 @@ static inline void *virt_to_obj(struct kmem_cache *s,
>  void object_err(struct kmem_cache *s, struct page *page,
>                 u8 *object, char *reason);
>
> +void object_set_indirect(const void *addr);
> +
>  #endif /* _LINUX_SLUB_DEF_H */
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 630c19772630..4e1e79e01e34 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -659,6 +659,7 @@ static void rcu_preempt_do_callbacks(void)
>   */
>  void call_rcu(struct rcu_head *head, rcu_callback_t func)
>  {
> +       object_set_indirect(head);
>         __call_rcu(head, func, rcu_state_p, -1, 0);
>  }
>  EXPORT_SYMBOL_GPL(call_rcu);
> diff --git a/mm/slub.c b/mm/slub.c
> index 46997517406e..6977dc7cffcd 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -196,7 +196,12 @@ struct track {
>         unsigned long when;     /* When did the operation occur */
>  };
>
> -enum track_item { TRACK_ALLOC, TRACK_FREE };
> +enum track_item {
> +       TRACK_ALLOC = 0,
> +       TRACK_FREE,
> +       TRACK_INDIRECT,
> +       TRACK_NR,
> +};
>
>  #ifdef CONFIG_SYSFS
>  static int sysfs_slab_add(struct kmem_cache *);
> @@ -551,6 +556,7 @@ static void init_tracking(struct kmem_cache *s, void *object)
>
>         set_track(s, object, TRACK_FREE, 0UL);
>         set_track(s, object, TRACK_ALLOC, 0UL);
> +       set_track(s, object, TRACK_INDIRECT, 0UL);
>  }
>
>  static void print_track(const char *s, struct track *t)
> @@ -579,6 +585,7 @@ static void print_tracking(struct kmem_cache *s, void *object)
>
>         print_track("Allocated", get_track(s, object, TRACK_ALLOC));
>         print_track("Freed", get_track(s, object, TRACK_FREE));
> +       print_track("Indirect", get_track(s, object, TRACK_INDIRECT));
>  }
>
>  static void print_page_info(struct page *page)
> @@ -652,6 +659,29 @@ static void print_trailer(struct kmem_cache *s, struct page *page, u8 *p)
>         dump_stack();
>  }
>
> +void object_set_indirect(const void *addr)
> +{
> +       if ((addr >= (void *)PAGE_OFFSET) &&
> +               (addr < high_memory)) {
> +               struct page *page = virt_to_head_page(addr);
> +
> +               if (PageSlab(page)) {
> +                       void *object;
> +                       struct kmem_cache *cache = page->slab_cache;
> +                       void *last_object;
> +
> +                       object = virt_to_obj(cache, page_address(page), addr);
> +                       last_object = page_address(page) +
> +                               page->objects * cache->size;
> +
> +                       if (unlikely(object > last_object))
> +                               object = last_object; /* we hit into padding */
> +
> +                       set_track(cache, object, TRACK_INDIRECT, (unsigned long)addr);
> +               }
> +       }
> +}
> +
>  void object_err(struct kmem_cache *s, struct page *page,
>                         u8 *object, char *reason)
>  {
> @@ -767,7 +797,7 @@ static int check_pad_bytes(struct kmem_cache *s, struct page *page, u8 *p)
>
>         if (s->flags & SLAB_STORE_USER)
>                 /* We also have user information there */
> -               off += 2 * sizeof(struct track);
> +               off += TRACK_NR * sizeof(struct track);
>
>         if (s->size == off)
>                 return 1;


Peter,

Tested with your patches.
The additional WARNING does not fire.
For the rcu stacks, I had to change two more 2's to TRACK_NR and also
moved memorization from call_rcu to __call_rcu, but now it is working.
Two reports with indirect stack:


BUG: KASAN: use-after-free in __lock_acquire+0x4e99/0x5100 at addr
ffff88003096c640
BUG kmalloc-1024 (Not tainted): kasan: bad access detected
-----------------------------------------------------------------------------

irq event stamp: 136
hardirqs last  enabled at (135): [<ffffffff814c12e4>]
__raw_spin_lock_init+0x24/0x120 kernel/locking/spinlock_debug.c:23
hardirqs last disabled at (136): [<ffffffff8183c967>]
__slab_alloc+0x37/0x90 mm/slub.c:2487
softirqs last  enabled at (38): [<ffffffff8138417d>]
__do_softirq+0x5ed/0xb40 kernel/softirq.c:299
softirqs last disabled at (31): [<     inline     >] invoke_softirq
kernel/softirq.c:350
softirqs last disabled at (31): [<ffffffff813849d5>]
irq_exit+0x165/0x1e0 kernel/softirq.c:391

INFO: Allocated in alloc_perf_context+0x4c/0x100 age=220 cpu=1 pid=15649
[<      none      >] ___slab_alloc+0x648/0x8c0 mm/slub.c:2468
[<      none      >] __slab_alloc+0x4c/0x90 mm/slub.c:2497
[<     inline     >] slab_alloc_node mm/slub.c:2560
[<     inline     >] slab_alloc mm/slub.c:2602
[<      none      >] kmem_cache_alloc_trace+0x23c/0x3f0 mm/slub.c:2619
[<     inline     >] kmalloc include/linux/slab.h:458
[<     inline     >] kzalloc include/linux/slab.h:602
[<      none      >] alloc_perf_context+0x4c/0x100 kernel/events/core.c:3405
[<      none      >] find_get_context+0x187/0x830 kernel/events/core.c:3512
[<      none      >] SYSC_perf_event_open+0xe50/0x21a0 kernel/events/core.c:8381
[<      none      >] SyS_perf_event_open+0x39/0x50 kernel/events/core.c:8242
[<      none      >] tracesys_phase2+0x88/0x8d arch/x86/entry/entry_64.S:269

INFO: Freed in free_ctx+0x4b/0x70 age=89 cpu=3 pid=21
[<      none      >] __slab_free+0x21e/0x3e0 mm/slub.c:2678
[<     inline     >] slab_free mm/slub.c:2833
[<      none      >] kfree+0x26f/0x3e0 mm/slub.c:3662
[<      none      >] free_ctx+0x4b/0x70 kernel/events/core.c:872
[<     inline     >] __rcu_reclaim kernel/rcu/rcu.h:118
[<     inline     >] rcu_do_batch kernel/rcu/tree.c:2693
[<     inline     >] invoke_rcu_callbacks kernel/rcu/tree.c:2961
[<     inline     >] __rcu_process_callbacks kernel/rcu/tree.c:2928
[<      none      >] rcu_process_callbacks+0x631/0x19e0 kernel/rcu/tree.c:2945
[<      none      >] __do_softirq+0x2e5/0xb40 kernel/softirq.c:273
[<      none      >] run_ksoftirqd+0x20/0x50 kernel/softirq.c:662
[<      none      >] smpboot_thread_fn+0x41d/0x850 kernel/smpboot.c:163
[<      none      >] kthread+0x21d/0x2e0 drivers/block/aoe/aoecmd.c:1314
[<      none      >] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:468

INFO: Indirect in __call_rcu.constprop.63+0x7e/0xe30 age=126 cpu=3 pid=15645
[<      none      >] __call_rcu.constprop.63+0x7e/0xe30 kernel/rcu/tree.c:3048
[<      none      >] call_rcu_sched+0x26/0x30 kernel/rcu/tree.c:3117
[<      none      >] put_ctx+0x150/0x250 kernel/events/core.c:891
[<     inline     >] __free_event kernel/events/core.c:3707
[<      none      >] _free_event+0x2c7/0xaa0 kernel/events/core.c:3738
[<      none      >] put_event+0x294/0x4d0 kernel/events/core.c:3834
[<      none      >] perf_release+0x3c/0x60 kernel/events/core.c:3849
[<      none      >] __fput+0x244/0x860 fs/file_table.c:208
[<      none      >] ____fput+0x15/0x20 fs/file_table.c:244
[<      none      >] task_work_run+0x130/0x240 kernel/task_work.c:115
[<     inline     >] exit_task_work include/linux/task_work.h:21
[<      none      >] do_exit+0x885/0x3050 kernel/exit.c:750
[<      none      >] do_group_exit+0xec/0x390 kernel/exit.c:880
[<      none      >] get_signal+0x677/0x1bf0 kernel/signal.c:2307
[<      none      >] do_signal+0x7e/0x2170 arch/x86/kernel/signal.c:709
[<      none      >] exit_to_usermode_loop+0xfe/0x1e0
arch/x86/entry/common.c:247
[<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:282
[<      none      >] syscall_return_slowpath+0x16b/0x240
arch/x86/entry/common.c:344
[<      none      >] int_ret_from_sys_call+0x25/0x9f
arch/x86/entry/entry_64.S:281

INFO: Slab 0xffffea0000c25a00 objects=21 used=1 fp=0xffff88003096d7a8
flags=0x1fffc0000004080
INFO: Object 0xffff88003096c620 @offset=17952 fp=0x          (null)
CPU: 1 PID: 15687 Comm: syzkaller_execu Tainted: G    B
4.4.0-rc3+ #155
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 0000000000000001 ffff88003794f680 ffffffff82e0f8b8 0000000041b58ab3
 ffffffff87a9a27d ffffffff82e0f806 ffff880033c62e80 ffffffff87abb3d1
 ffff88003e806d00 0000000000000008 ffff88003096c620 ffff88003794f680

Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff82e0f8b8>] dump_stack+0xb2/0xfa lib/dump_stack.c:50
 [<ffffffff818369dc>] print_trailer+0x12c/0x210 mm/slub.c:659
 [<ffffffff81841a3f>] object_err+0x2f/0x40 mm/slub.c:689
 [<     inline     >] print_address_description mm/kasan/report.c:138
 [<ffffffff818457a9>] kasan_report_error+0x5d9/0x860 mm/kasan/report.c:251
 [<     inline     >] kasan_report mm/kasan/report.c:274
 [<ffffffff81845bd4>] __asan_report_load8_noabort+0x54/0x70
mm/kasan/report.c:295
 [<ffffffff814b25f9>] __lock_acquire+0x4e99/0x5100 kernel/locking/lockdep.c:3092
 [<ffffffff814b517d>] lock_acquire+0x19d/0x3f0 kernel/locking/lockdep.c:3585
 [<     inline     >] __raw_spin_lock include/linux/spinlock_api_smp.h:144
 [<ffffffff86a89591>] _raw_spin_lock+0x31/0x40 kernel/locking/spinlock.c:151
 [<     inline     >] perf_ctx_lock kernel/events/core.c:351
 [<ffffffff816e15d9>] __perf_install_in_context+0x109/0xa00
kernel/events/core.c:2083
 [<ffffffff816cb7ba>] remote_function+0x14a/0x200 kernel/events/core.c:74
 [<ffffffff81572787>] generic_exec_single+0x2a7/0x490 kernel/smp.c:156
 [<ffffffff81573350>] smp_call_function_single+0x200/0x310 kernel/smp.c:300
 [<ffffffff816c9bd3>] task_function_call+0x123/0x160 kernel/events/core.c:101
 [<ffffffff816d1bf1>] perf_install_in_context+0x201/0x340
kernel/events/core.c:2164
 [<ffffffff816f62e5>] SYSC_perf_event_open+0x1465/0x21a0
kernel/events/core.c:8546
 [<ffffffff816ff449>] SyS_perf_event_open+0x39/0x50 kernel/events/core.c:8242
 [<ffffffff86a8a3b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185







BUG: KASAN: use-after-free in __lock_acquire+0x4e99/0x5100 at addr
ffff88005f77aee0
Read of size 8 by task syzkaller_execu/30337
=============================================================================
BUG kmalloc-1024 (Not tainted): kasan: bad access detected
-----------------------------------------------------------------------------

Disabling lock debugging due to kernel taint
INFO: Allocated in alloc_perf_context+0x4c/0x100 age=124 cpu=3 pid=30312
[<      none      >] ___slab_alloc+0x648/0x8c0 mm/slub.c:2468
[<      none      >] __slab_alloc+0x4c/0x90 mm/slub.c:2497
[<     inline     >] slab_alloc_node mm/slub.c:2560
[<     inline     >] slab_alloc mm/slub.c:2602
[<      none      >] kmem_cache_alloc_trace+0x23c/0x3f0 mm/slub.c:2619
[<     inline     >] kmalloc include/linux/slab.h:458
[<     inline     >] kzalloc include/linux/slab.h:602
[<      none      >] alloc_perf_context+0x4c/0x100 kernel/events/core.c:3405
[<      none      >] find_get_context+0x187/0x830 kernel/events/core.c:3512
[<      none      >] SYSC_perf_event_open+0xe50/0x21a0 kernel/events/core.c:8381
[<      none      >] SyS_perf_event_open+0x39/0x50 kernel/events/core.c:8242
[<      none      >] tracesys_phase2+0x88/0x8d arch/x86/entry/entry_64.S:269

INFO: Freed in free_ctx+0x4b/0x70 age=74 cpu=2 pid=16
[<      none      >] __slab_free+0x21e/0x3e0 mm/slub.c:2678
[<     inline     >] slab_free mm/slub.c:2833
[<      none      >] kfree+0x26f/0x3e0 mm/slub.c:3662
[<      none      >] free_ctx+0x4b/0x70 kernel/events/core.c:872
[<     inline     >] __rcu_reclaim kernel/rcu/rcu.h:118
[<     inline     >] rcu_do_batch kernel/rcu/tree.c:2693
[<     inline     >] invoke_rcu_callbacks kernel/rcu/tree.c:2961
[<     inline     >] __rcu_process_callbacks kernel/rcu/tree.c:2928
[<      none      >] rcu_process_callbacks+0x631/0x19e0 kernel/rcu/tree.c:2945
[<      none      >] __do_softirq+0x2e5/0xb40 kernel/softirq.c:273
[<      none      >] run_ksoftirqd+0x20/0x50 kernel/softirq.c:662
[<      none      >] smpboot_thread_fn+0x41d/0x850 kernel/smpboot.c:163
[<      none      >] kthread+0x21d/0x2e0 drivers/block/aoe/aoecmd.c:1314
[<      none      >] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:468

INFO: Indirect in __call_rcu.constprop.63+0x7e/0xe30 age=88 cpu=2 pid=30312
[<      none      >] __call_rcu.constprop.63+0x7e/0xe30 kernel/rcu/tree.c:3048
[<      none      >] call_rcu_sched+0x26/0x30 kernel/rcu/tree.c:3117
[<      none      >] put_ctx+0x150/0x250 kernel/events/core.c:891
[<     inline     >] perf_event_exit_task_context kernel/events/core.c:8860
[<      none      >] perf_event_exit_task+0x706/0xae0 kernel/events/core.c:8887
[<      none      >] do_exit+0x892/0x3050 kernel/exit.c:759
[<      none      >] do_group_exit+0xec/0x390 kernel/exit.c:880
[<      none      >] get_signal+0x677/0x1bf0 kernel/signal.c:2307
[<      none      >] do_signal+0x7e/0x2170 arch/x86/kernel/signal.c:709
[<      none      >] exit_to_usermode_loop+0xfe/0x1e0
arch/x86/entry/common.c:247
[<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:282
[<      none      >] syscall_return_slowpath+0x16b/0x240
arch/x86/entry/common.c:344
[<      none      >] int_ret_from_sys_call+0x25/0x9f
arch/x86/entry/entry_64.S:281

INFO: Slab 0xffffea00017dde00 objects=21 used=8 fp=0xffff88005f77aec0
flags=0x5fffc0000004080
INFO: Object 0xffff88005f77aec0 @offset=11968 fp=0xffff88005f77b498
CPU: 3 PID: 30337 Comm: syzkaller_execu Tainted: G    B
4.4.0-rc3+ #155
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 0000000000000003 ffff880065547680 ffffffff82e0f8b8 0000000041b58ab3
 ffffffff87a9a27d ffffffff82e0f806 ffff880064972e80 ffffffff87abb3d1
 ffff88003e806d00 0000000000000008 ffff88005f77aec0 ffff880065547680

Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff82e0f8b8>] dump_stack+0xb2/0xfa lib/dump_stack.c:50
 [<ffffffff818369dc>] print_trailer+0x12c/0x210 mm/slub.c:659
 [<ffffffff81841a3f>] object_err+0x2f/0x40 mm/slub.c:689
 [<     inline     >] print_address_description mm/kasan/report.c:138
 [<ffffffff818457a9>] kasan_report_error+0x5d9/0x860 mm/kasan/report.c:251
 [<     inline     >] kasan_report mm/kasan/report.c:274
 [<ffffffff81845bd4>] __asan_report_load8_noabort+0x54/0x70
mm/kasan/report.c:295
 [<ffffffff814b25f9>] __lock_acquire+0x4e99/0x5100 kernel/locking/lockdep.c:3092
 [<ffffffff814b517d>] lock_acquire+0x19d/0x3f0 kernel/locking/lockdep.c:3585
 [<     inline     >] __raw_spin_lock include/linux/spinlock_api_smp.h:144
 [<ffffffff86a89591>] _raw_spin_lock+0x31/0x40 kernel/locking/spinlock.c:151
 [<     inline     >] perf_ctx_lock kernel/events/core.c:351
 [<ffffffff816e15d9>] __perf_install_in_context+0x109/0xa00
kernel/events/core.c:2083
 [<ffffffff816cb7ba>] remote_function+0x14a/0x200 kernel/events/core.c:74
 [<ffffffff81572787>] generic_exec_single+0x2a7/0x490 kernel/smp.c:156
 [<ffffffff81573350>] smp_call_function_single+0x200/0x310 kernel/smp.c:300
 [<ffffffff816c9bd3>] task_function_call+0x123/0x160 kernel/events/core.c:101
 [<ffffffff816d1bf1>] perf_install_in_context+0x201/0x340
kernel/events/core.c:2164
 [<ffffffff816f62e5>] SYSC_perf_event_open+0x1465/0x21a0
kernel/events/core.c:8546
 [<ffffffff816ff449>] SyS_perf_event_open+0x39/0x50 kernel/events/core.c:8242
 [<ffffffff86a8a3b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-08 18:35               ` Dmitry Vyukov
@ 2015-12-08 19:56                 ` Alexei Starovoitov
  2015-12-09  9:17                   ` Dmitry Vyukov
  0 siblings, 1 reply; 46+ messages in thread
From: Alexei Starovoitov @ 2015-12-08 19:56 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzkaller, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	LKML, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Tue, Dec 08, 2015 at 07:35:20PM +0100, Dmitry Vyukov wrote:
> On Tue, Dec 8, 2015 at 7:05 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Tue, Dec 08, 2015 at 06:56:23PM +0100, Dmitry Vyukov wrote:
> >> On Tue, Dec 8, 2015 at 6:54 PM, Alexei Starovoitov
> >> <alexei.starovoitov@gmail.com> wrote:
> >> > On Tue, Dec 08, 2015 at 05:12:04PM +0100, Dmitry Vyukov wrote:
> >> >> On Tue, Dec 8, 2015 at 4:24 AM, Alexei Starovoitov
> >> >> <alexei.starovoitov@gmail.com> wrote:
> >> >> > On Mon, Dec 07, 2015 at 05:09:21PM +0100, Dmitry Vyukov wrote:
> >> >> >> > So it would be _awesome_ if we could somehow extend this callchain to
> >> >> >> > include the site that calls call_rcu().
> >> >> >>
> >> >> >> We have a patch for KASAN in works that adds so-called stack depot
> >> >> >> which allows to map a stack trace onto uint32 id. Then we can plumb
> >> >> >
> >> >> > I was hacking something similar to categorize stack traces with u32 id.
> >> >> > How are you planning to limit the number of such stack traces ?
> >> >> > and what is the interface for user space to get stack trace from an id?
> >> >>
> >> >>
> >> >> We don't limit number of stack traces. Kernel does not seem to use
> >> >> data-driven recursion extensively, so there is limited number of
> >> >> stacks. Though, probably we will need to strip non-interrupt part for
> >> >> interrupt stacks, otherwise that can produce unbounded number of
> >> >> different stacks.
> >> >> There is no interface for user-space, it is used only inside of kernel
> >> >> to save stacks for memory blocks (rcu callbacks, thread pool items in
> >> >> the future).
> >> >> The design is based on what we successfully and extensively use in
> >> >> user-space sanitizers for years. Current code is here:
> >> >> https://github.com/ramosian-glider/kasan/commit/fb0eefd212366401ed5ad244233ef379a27bfb46
> >> >
> >> > why did you pick approach to never free accumulated stacks?
> >> > That limits usability a lot, since once kasan starts using it only
> >> > reboot will free the memory. ouch.
> >> > what worked for user space doesn't work for kernel.
> >>
> >>
> >> Freeing and reusing will slow down and complicate code significantly.
> >> And it is not yet proved to be necessary.
> >
> > It's a joke, right? allocating kernel pages without ability to free?!
> 
> 
> The plan is to smash kernel much earlier than it will run out of memory.
> 
> 
> I think this scheme will work pretty well.
> I've counted 34403 calls to
> kmalloc/kfree/kmem_cache_alloc/kmem_cache_free in kernel. Multiply
> this by 2 to account for different stacks leading to the same
> malloc/free and assuming that we store 16-byte header and 20 4-byte
> frames, this gives us about 6MB. I can live with that. I can live with
> 60MB as well.

so you're saying you want to add hundreds lines of code to the kernel only
to be used by kasan for this very specific and narrow use case 
instead of designing generic 'stacktrace<->id' mechanism?
That's not how kernel operates. Every kernel developer must think about
code reuse. We cannot bloat kernel with unique hacks.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-08 18:57   ` Ingo Molnar
@ 2015-12-09  9:05     ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2015-12-09  9:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dmitry Vyukov, Ingo Molnar, Arnaldo Carvalho de Melo, LKML,
	syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Tue, Dec 08, 2015 at 07:57:38PM +0100, Ingo Molnar wrote:
> Btw., could we add more redundancy / debug code to the refcounting code? It seems 
> to be a frequent source of very hard to find/fix races/bugs - so it should be ripe 
> for some extra debug infrastructure ...

I'll try, but its not easy. The biggest problem so far is the
ctx::parent relation. We cannot easily track the reverse of that because
of locking (we'd need to hold two ctx->lock, and we cannot because
perf_event_context_sched_out()).

The other two relations we have reverse maps for:

  task->perf_event_ctxp[] <-> ctx->task
  event->ctx              <-> ctx->event_list

Also, all 3 relations are (more or less) protected under ctx->lock:

  - unclone_ctx():
	removes the ctx->parent link, with ctx->lock held

  - perf_remove_from_context():
	removes the event from ctx::event_list, with ctx->lock held;
	leaves event->ctx set, because nasty games with ctx migration,
	it will get set to the new context, but can never be NULL.

  - perf_event_exit_task_context():
	clears task->perf_event_ctxp[], with ctx->lock held;
	leaves ctx->task set, still looking into this.




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-08 19:56                 ` Alexei Starovoitov
@ 2015-12-09  9:17                   ` Dmitry Vyukov
  2015-12-10  3:54                     ` Alexei Starovoitov
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2015-12-09  9:17 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: syzkaller, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	LKML, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Tue, Dec 8, 2015 at 8:56 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Tue, Dec 08, 2015 at 07:35:20PM +0100, Dmitry Vyukov wrote:
>> On Tue, Dec 8, 2015 at 7:05 PM, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>> > On Tue, Dec 08, 2015 at 06:56:23PM +0100, Dmitry Vyukov wrote:
>> >> On Tue, Dec 8, 2015 at 6:54 PM, Alexei Starovoitov
>> >> <alexei.starovoitov@gmail.com> wrote:
>> >> > On Tue, Dec 08, 2015 at 05:12:04PM +0100, Dmitry Vyukov wrote:
>> >> >> On Tue, Dec 8, 2015 at 4:24 AM, Alexei Starovoitov
>> >> >> <alexei.starovoitov@gmail.com> wrote:
>> >> >> > On Mon, Dec 07, 2015 at 05:09:21PM +0100, Dmitry Vyukov wrote:
>> >> >> >> > So it would be _awesome_ if we could somehow extend this callchain to
>> >> >> >> > include the site that calls call_rcu().
>> >> >> >>
>> >> >> >> We have a patch for KASAN in works that adds so-called stack depot
>> >> >> >> which allows to map a stack trace onto uint32 id. Then we can plumb
>> >> >> >
>> >> >> > I was hacking something similar to categorize stack traces with u32 id.
>> >> >> > How are you planning to limit the number of such stack traces ?
>> >> >> > and what is the interface for user space to get stack trace from an id?
>> >> >>
>> >> >>
>> >> >> We don't limit number of stack traces. Kernel does not seem to use
>> >> >> data-driven recursion extensively, so there is limited number of
>> >> >> stacks. Though, probably we will need to strip non-interrupt part for
>> >> >> interrupt stacks, otherwise that can produce unbounded number of
>> >> >> different stacks.
>> >> >> There is no interface for user-space, it is used only inside of kernel
>> >> >> to save stacks for memory blocks (rcu callbacks, thread pool items in
>> >> >> the future).
>> >> >> The design is based on what we successfully and extensively use in
>> >> >> user-space sanitizers for years. Current code is here:
>> >> >> https://github.com/ramosian-glider/kasan/commit/fb0eefd212366401ed5ad244233ef379a27bfb46
>> >> >
>> >> > why did you pick approach to never free accumulated stacks?
>> >> > That limits usability a lot, since once kasan starts using it only
>> >> > reboot will free the memory. ouch.
>> >> > what worked for user space doesn't work for kernel.
>> >>
>> >>
>> >> Freeing and reusing will slow down and complicate code significantly.
>> >> And it is not yet proved to be necessary.
>> >
>> > It's a joke, right? allocating kernel pages without ability to free?!
>>
>>
>> The plan is to smash kernel much earlier than it will run out of memory.
>>
>>
>> I think this scheme will work pretty well.
>> I've counted 34403 calls to
>> kmalloc/kfree/kmem_cache_alloc/kmem_cache_free in kernel. Multiply
>> this by 2 to account for different stacks leading to the same
>> malloc/free and assuming that we store 16-byte header and 20 4-byte
>> frames, this gives us about 6MB. I can live with that. I can live with
>> 60MB as well.
>
> so you're saying you want to add hundreds lines of code to the kernel only
> to be used by kasan for this very specific and narrow use case
> instead of designing generic 'stacktrace<->id' mechanism?
> That's not how kernel operates. Every kernel developer must think about
> code reuse. We cannot bloat kernel with unique hacks.


We would happily share this code with other subsystems, or even better
reuse an existing solutions. But to the best of my knowledge there is
no such existing solution, and I still know basically nothing about
what you were hacking, why and what are your requirements.
Our requirements are:
 - map stack trace to 4-byte id
 - relatively memory efficient for storing of ~100K traces
 - the only performance-critical operation is mapping of an already
existing stack (in particular no atomic RMW on this path)
Non-requirements:
 - any user-space interface
 - removal of stack traces (they all will be reused in future)
We plan to use in KASAN, KTSAN (already uses it), KMSAN.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-09  9:17                   ` Dmitry Vyukov
@ 2015-12-10  3:54                     ` Alexei Starovoitov
  2015-12-10  9:02                       ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Alexei Starovoitov @ 2015-12-10  3:54 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzkaller, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	LKML, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Wed, Dec 09, 2015 at 10:17:17AM +0100, Dmitry Vyukov wrote:
> 
> We would happily share this code with other subsystems, or even better
> reuse an existing solutions. But to the best of my knowledge there is
> no such existing solution, and I still know basically nothing about
> what you were hacking, why and what are your requirements.

I need 'stacktrace <-> id' facility to be able to do map[stack]++
where stack is either kernel stack or user stack or combination of both.
Number of stacks can be large, so this 'stack depot' need to be limited
either by total memory or by number of stacks.
Also stacks need to have configurable depth limit to improve find time,
save memory, etc.
Once program is done the stack depot need to be freed.
The user interface bit of translating id to stack is tbd.

> Our requirements are:
>  - map stack trace to 4-byte id
>  - relatively memory efficient for storing of ~100K traces
>  - the only performance-critical operation is mapping of an already
> existing stack (in particular no atomic RMW on this path)
> Non-requirements:
>  - any user-space interface
>  - removal of stack traces (they all will be reused in future)

Freeing memory is a requirement regardless.
Even when kernel running with kasan, there must be a way to stop
stack collection and free that memory.
You cannot treat kernel as your test program or 'device under test'.
It's the other way around. kasan must nicely co-exist with kernel.
Right now it adds some constant overhead, fine, but saying I'll just
keep taking memory from kernel for my own kasan needs is not ok.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-10  3:54                     ` Alexei Starovoitov
@ 2015-12-10  9:02                       ` Peter Zijlstra
  2015-12-10 17:03                         ` Alexei Starovoitov
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2015-12-10  9:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Dmitry Vyukov, syzkaller, Ingo Molnar, Arnaldo Carvalho de Melo,
	LKML, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Wed, Dec 09, 2015 at 07:54:35PM -0800, Alexei Starovoitov wrote:
> Freeing memory is a requirement regardless.
> Even when kernel running with kasan, there must be a way to stop
> stack collection and free that memory.
> You cannot treat kernel as your test program or 'device under test'.

Relax, that is exactly what lockdep does. It cannot dynamically allocate
things because allocators use lock etc..

Its fine to build up state for debug bits, esp. if its bounded, like the
number of unique callchains.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-10  9:02                       ` Peter Zijlstra
@ 2015-12-10 17:03                         ` Alexei Starovoitov
  2015-12-11  8:14                           ` Ingo Molnar
  0 siblings, 1 reply; 46+ messages in thread
From: Alexei Starovoitov @ 2015-12-10 17:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, syzkaller, Ingo Molnar, Arnaldo Carvalho de Melo,
	LKML, Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Thu, Dec 10, 2015 at 10:02:51AM +0100, Peter Zijlstra wrote:
> On Wed, Dec 09, 2015 at 07:54:35PM -0800, Alexei Starovoitov wrote:
> > Freeing memory is a requirement regardless.
> > Even when kernel running with kasan, there must be a way to stop
> > stack collection and free that memory.
> > You cannot treat kernel as your test program or 'device under test'.
> 
> Relax, that is exactly what lockdep does. It cannot dynamically allocate
> things because allocators use lock etc..
> 
> Its fine to build up state for debug bits, esp. if its bounded, like the
> number of unique callchains.

except the code in question is doing unbounded alloc_pages()


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-08 19:14       ` Dmitry Vyukov
@ 2015-12-10 19:57         ` Peter Zijlstra
  2015-12-15 13:09           ` Dmitry Vyukov
                             ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Peter Zijlstra @ 2015-12-10 19:57 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Tue, Dec 08, 2015 at 08:14:58PM +0100, Dmitry Vyukov wrote:
> Tested with your patches.
> The additional WARNING does not fire.
> For the rcu stacks, I had to change two more 2's to TRACK_NR and also
> moved memorization from call_rcu to __call_rcu, but now it is working.
> Two reports with indirect stack:

Ah nice, and sorry for the oversights. Obviously one can add this
indirect marker to __queue_work() as well to also cover the workqueue
offloading some other sites do.


>  [<ffffffff81845bd4>] __asan_report_load8_noabort+0x54/0x70 mm/kasan/report.c:295
>  [<ffffffff814b25f9>] __lock_acquire+0x4e99/0x5100 kernel/locking/lockdep.c:3092
>  [<ffffffff814b517d>] lock_acquire+0x19d/0x3f0 kernel/locking/lockdep.c:3585
>  [<     inline     >] __raw_spin_lock include/linux/spinlock_api_smp.h:144
>  [<ffffffff86a89591>] _raw_spin_lock+0x31/0x40 kernel/locking/spinlock.c:151
>  [<     inline     >] perf_ctx_lock kernel/events/core.c:351

So this here takes locks on cpuctx->ctx.lock and cpuctx->task_ctx->lock
and it is the latter that is pointing out into space.

>  [<ffffffff816e15d9>] __perf_install_in_context+0x109/0xa00 kernel/events/core.c:2083
>  [<ffffffff816cb7ba>] remote_function+0x14a/0x200 kernel/events/core.c:74
>  [<ffffffff81572787>] generic_exec_single+0x2a7/0x490 kernel/smp.c:156
>  [<ffffffff81573350>] smp_call_function_single+0x200/0x310 kernel/smp.c:300
>  [<ffffffff816c9bd3>] task_function_call+0x123/0x160 kernel/events/core.c:101
>  [<ffffffff816d1bf1>] perf_install_in_context+0x201/0x340 kernel/events/core.c:2164
>  [<ffffffff816f62e5>] SYSC_perf_event_open+0x1465/0x21a0 kernel/events/core.c:8546
>  [<ffffffff816ff449>] SyS_perf_event_open+0x39/0x50 kernel/events/core.c:8242
>  [<ffffffff86a8a3b6>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185


Now I managed to produce a related WARN:

[ 2338.884942] ------------[ cut here ]------------
[ 2338.890112] WARNING: CPU: 13 PID: 35162 at ../kernel/events/core.c:2702 task_ctx_sched_out+0x6b/0x80()
[ 2338.900504] Modules linked in:
[ 2338.903933] CPU: 13 PID: 35162 Comm: bash Not tainted 4.4.0-rc4-dirty #244
[ 2338.911610] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[ 2338.923071]  ffffffff81f1468e ffff8807c6457cb8 ffffffff815c680c 0000000000000000
[ 2338.931382]  ffff8807c6457cf0 ffffffff810c8a56 ffffe8ffff8c1bd0 ffff8808132ed400
[ 2338.939678]  0000000000000286 ffff880813170380 ffff8808132ed400 ffff8807c6457d00
[ 2338.947987] Call Trace:
[ 2338.950726]  [<ffffffff815c680c>] dump_stack+0x4e/0x82
[ 2338.956474]  [<ffffffff810c8a56>] warn_slowpath_common+0x86/0xc0
[ 2338.963195]  [<ffffffff810c8b4a>] warn_slowpath_null+0x1a/0x20
[ 2338.969720]  [<ffffffff811a49cb>] task_ctx_sched_out+0x6b/0x80
[ 2338.976244]  [<ffffffff811a62d2>] perf_event_exec+0xe2/0x180
[ 2338.982575]  [<ffffffff8121fb6f>] setup_new_exec+0x6f/0x1b0
[ 2338.988810]  [<ffffffff8126de83>] load_elf_binary+0x393/0x1660
[ 2338.995339]  [<ffffffff811dc772>] ? get_user_pages+0x52/0x60
[ 2339.001669]  [<ffffffff8121e297>] search_binary_handler+0x97/0x200
[ 2339.008581]  [<ffffffff8121f8b3>] do_execveat_common.isra.33+0x543/0x6e0
[ 2339.016072]  [<ffffffff8121fcea>] SyS_execve+0x3a/0x50
[ 2339.021819]  [<ffffffff819fc165>] stub_execve+0x5/0x5
[ 2339.027469]  [<ffffffff819fbeb2>] ? entry_SYSCALL_64_fastpath+0x12/0x71
[ 2339.034860] ---[ end trace ee1337c59a0ddeac ]---


Which is an existing WARN_ON_ONCE() indicating that cpuctx->task_ctx is
not what we expected it to be.

The below patch appears (its so far managed to run longer than previous
attempts, but I'm sure it'll explode the moment I've send this email) to
have cured it.

I'm not sure I can explain your problem with this, but I figure its
worth a try.


---
 kernel/events/core.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index c3d61b92d805..d5293325d8c5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3136,15 +3136,16 @@ static int event_enable_on_exec(struct perf_event *event,
  * Enable all of a task's events that have been marked enable-on-exec.
  * This expects task == current.
  */
-static void perf_event_enable_on_exec(struct perf_event_context *ctx)
+static void perf_event_enable_on_exec(int ctxn)
 {
-	struct perf_event_context *clone_ctx = NULL;
+	struct perf_event_context *ctx, *clone_ctx = NULL;
 	struct perf_event *event;
 	unsigned long flags;
 	int enabled = 0;
 	int ret;
 
 	local_irq_save(flags);
+	ctx = current->perf_event_ctxp[ctxn];
 	if (!ctx || !ctx->nr_events)
 		goto out;
 
@@ -3187,17 +3188,11 @@ static void perf_event_enable_on_exec(struct perf_event_context *ctx)
 
 void perf_event_exec(void)
 {
-	struct perf_event_context *ctx;
 	int ctxn;
 
 	rcu_read_lock();
-	for_each_task_context_nr(ctxn) {
-		ctx = current->perf_event_ctxp[ctxn];
-		if (!ctx)
-			continue;
-
-		perf_event_enable_on_exec(ctx);
-	}
+	for_each_task_context_nr(ctxn)
+		perf_event_enable_on_exec(ctxn);
 	rcu_read_unlock();
 }
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-10 17:03                         ` Alexei Starovoitov
@ 2015-12-11  8:14                           ` Ingo Molnar
  2015-12-15 13:11                             ` Dmitry Vyukov
  0 siblings, 1 reply; 46+ messages in thread
From: Ingo Molnar @ 2015-12-11  8:14 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Dmitry Vyukov, syzkaller, Ingo Molnar,
	Arnaldo Carvalho de Melo, LKML, Kostya Serebryany,
	Alexander Potapenko, Eric Dumazet, Sasha Levin


* Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Thu, Dec 10, 2015 at 10:02:51AM +0100, Peter Zijlstra wrote:
> > On Wed, Dec 09, 2015 at 07:54:35PM -0800, Alexei Starovoitov wrote:
> > > Freeing memory is a requirement regardless.
> > > Even when kernel running with kasan, there must be a way to stop
> > > stack collection and free that memory.
> > > You cannot treat kernel as your test program or 'device under test'.
> > 
> > Relax, that is exactly what lockdep does. It cannot dynamically allocate
> > things because allocators use lock etc..
> > 
> > Its fine to build up state for debug bits, esp. if its bounded, like the
> > number of unique callchains.
> 
> except the code in question is doing unbounded alloc_pages()

Yes, but the trick is to still have a bound sized debug pool - which runs out of 
entries gracefully.

Which in practice is plenty enough for most types of testing, and is a lot more 
robust than any dynamic scheme.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-10 19:57         ` Peter Zijlstra
@ 2015-12-15 13:09           ` Dmitry Vyukov
  2015-12-17 14:06           ` Peter Zijlstra
  2016-01-06 18:46           ` [tip:perf/core] perf: Fix race in perf_event_exec() tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 46+ messages in thread
From: Dmitry Vyukov @ 2015-12-15 13:09 UTC (permalink / raw)
  To: syzkaller
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, Kostya Serebryany,
	Alexander Potapenko, Eric Dumazet, Sasha Levin

On Thu, Dec 10, 2015 at 8:57 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Dec 08, 2015 at 08:14:58PM +0100, Dmitry Vyukov wrote:
>> Tested with your patches.
>> The additional WARNING does not fire.
>> For the rcu stacks, I had to change two more 2's to TRACK_NR and also
>> moved memorization from call_rcu to __call_rcu, but now it is working.
>> Two reports with indirect stack:
>
> Ah nice, and sorry for the oversights. Obviously one can add this
> indirect marker to __queue_work() as well to also cover the workqueue
> offloading some other sites do.
>
>
>>  [<ffffffff81845bd4>] __asan_report_load8_noabort+0x54/0x70 mm/kasan/report.c:295
>>  [<ffffffff814b25f9>] __lock_acquire+0x4e99/0x5100 kernel/locking/lockdep.c:3092
>>  [<ffffffff814b517d>] lock_acquire+0x19d/0x3f0 kernel/locking/lockdep.c:3585
>>  [<     inline     >] __raw_spin_lock include/linux/spinlock_api_smp.h:144
>>  [<ffffffff86a89591>] _raw_spin_lock+0x31/0x40 kernel/locking/spinlock.c:151
>>  [<     inline     >] perf_ctx_lock kernel/events/core.c:351
>
> So this here takes locks on cpuctx->ctx.lock and cpuctx->task_ctx->lock
> and it is the latter that is pointing out into space.
>
>>  [<ffffffff816e15d9>] __perf_install_in_context+0x109/0xa00 kernel/events/core.c:2083
>>  [<ffffffff816cb7ba>] remote_function+0x14a/0x200 kernel/events/core.c:74
>>  [<ffffffff81572787>] generic_exec_single+0x2a7/0x490 kernel/smp.c:156
>>  [<ffffffff81573350>] smp_call_function_single+0x200/0x310 kernel/smp.c:300
>>  [<ffffffff816c9bd3>] task_function_call+0x123/0x160 kernel/events/core.c:101
>>  [<ffffffff816d1bf1>] perf_install_in_context+0x201/0x340 kernel/events/core.c:2164
>>  [<ffffffff816f62e5>] SYSC_perf_event_open+0x1465/0x21a0 kernel/events/core.c:8546
>>  [<ffffffff816ff449>] SyS_perf_event_open+0x39/0x50 kernel/events/core.c:8242
>>  [<ffffffff86a8a3b6>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185
>
>
> Now I managed to produce a related WARN:
>
> [ 2338.884942] ------------[ cut here ]------------
> [ 2338.890112] WARNING: CPU: 13 PID: 35162 at ../kernel/events/core.c:2702 task_ctx_sched_out+0x6b/0x80()
> [ 2338.900504] Modules linked in:
> [ 2338.903933] CPU: 13 PID: 35162 Comm: bash Not tainted 4.4.0-rc4-dirty #244
> [ 2338.911610] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
> [ 2338.923071]  ffffffff81f1468e ffff8807c6457cb8 ffffffff815c680c 0000000000000000
> [ 2338.931382]  ffff8807c6457cf0 ffffffff810c8a56 ffffe8ffff8c1bd0 ffff8808132ed400
> [ 2338.939678]  0000000000000286 ffff880813170380 ffff8808132ed400 ffff8807c6457d00
> [ 2338.947987] Call Trace:
> [ 2338.950726]  [<ffffffff815c680c>] dump_stack+0x4e/0x82
> [ 2338.956474]  [<ffffffff810c8a56>] warn_slowpath_common+0x86/0xc0
> [ 2338.963195]  [<ffffffff810c8b4a>] warn_slowpath_null+0x1a/0x20
> [ 2338.969720]  [<ffffffff811a49cb>] task_ctx_sched_out+0x6b/0x80
> [ 2338.976244]  [<ffffffff811a62d2>] perf_event_exec+0xe2/0x180
> [ 2338.982575]  [<ffffffff8121fb6f>] setup_new_exec+0x6f/0x1b0
> [ 2338.988810]  [<ffffffff8126de83>] load_elf_binary+0x393/0x1660
> [ 2338.995339]  [<ffffffff811dc772>] ? get_user_pages+0x52/0x60
> [ 2339.001669]  [<ffffffff8121e297>] search_binary_handler+0x97/0x200
> [ 2339.008581]  [<ffffffff8121f8b3>] do_execveat_common.isra.33+0x543/0x6e0
> [ 2339.016072]  [<ffffffff8121fcea>] SyS_execve+0x3a/0x50
> [ 2339.021819]  [<ffffffff819fc165>] stub_execve+0x5/0x5
> [ 2339.027469]  [<ffffffff819fbeb2>] ? entry_SYSCALL_64_fastpath+0x12/0x71
> [ 2339.034860] ---[ end trace ee1337c59a0ddeac ]---
>
>
> Which is an existing WARN_ON_ONCE() indicating that cpuctx->task_ctx is
> not what we expected it to be.
>
> The below patch appears (its so far managed to run longer than previous
> attempts, but I'm sure it'll explode the moment I've send this email) to
> have cured it.
>
> I'm not sure I can explain your problem with this, but I figure its
> worth a try.
>
>
> ---
>  kernel/events/core.c | 15 +++++----------
>  1 file changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index c3d61b92d805..d5293325d8c5 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3136,15 +3136,16 @@ static int event_enable_on_exec(struct perf_event *event,
>   * Enable all of a task's events that have been marked enable-on-exec.
>   * This expects task == current.
>   */
> -static void perf_event_enable_on_exec(struct perf_event_context *ctx)
> +static void perf_event_enable_on_exec(int ctxn)
>  {
> -       struct perf_event_context *clone_ctx = NULL;
> +       struct perf_event_context *ctx, *clone_ctx = NULL;
>         struct perf_event *event;
>         unsigned long flags;
>         int enabled = 0;
>         int ret;
>
>         local_irq_save(flags);
> +       ctx = current->perf_event_ctxp[ctxn];
>         if (!ctx || !ctx->nr_events)
>                 goto out;
>
> @@ -3187,17 +3188,11 @@ static void perf_event_enable_on_exec(struct perf_event_context *ctx)
>
>  void perf_event_exec(void)
>  {
> -       struct perf_event_context *ctx;
>         int ctxn;
>
>         rcu_read_lock();
> -       for_each_task_context_nr(ctxn) {
> -               ctx = current->perf_event_ctxp[ctxn];
> -               if (!ctx)
> -                       continue;
> -
> -               perf_event_enable_on_exec(ctx);
> -       }
> +       for_each_task_context_nr(ctxn)
> +               perf_event_enable_on_exec(ctxn);
>         rcu_read_unlock();
>  }


Hi Peter,

I've applied your patch. I did not see the crashes since than. But
I've started seeing lots of stalls/hangs, see the other email titled
"perf: stalls in perf_install_in_context/perf_remove_from_context". I
am able to reproduce the stalls on stock
6764e5ebd5c62236d082f9ae030674467d0b2779 without any patches on top,
so the stalls are not related to your patch.
I am not sure why I started seeing the stalls only now. It is either a
recent change in the tree, or maybe fuzzer figured out how to trigger
the stall and started doing it frequently, or maybe your fix for the
crashes allowed fuzzer to drill deeper into perf... or maybe I just
missed stall reports between numerous crash reports. Anyway, testing
is kind of inconclusive now. Please take a look at the stall issue.

Thanks

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-11  8:14                           ` Ingo Molnar
@ 2015-12-15 13:11                             ` Dmitry Vyukov
  0 siblings, 0 replies; 46+ messages in thread
From: Dmitry Vyukov @ 2015-12-15 13:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexei Starovoitov, Peter Zijlstra, syzkaller, Ingo Molnar,
	Arnaldo Carvalho de Melo, LKML, Kostya Serebryany,
	Alexander Potapenko, Eric Dumazet, Sasha Levin

On Fri, Dec 11, 2015 at 9:14 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
>> On Thu, Dec 10, 2015 at 10:02:51AM +0100, Peter Zijlstra wrote:
>> > On Wed, Dec 09, 2015 at 07:54:35PM -0800, Alexei Starovoitov wrote:
>> > > Freeing memory is a requirement regardless.
>> > > Even when kernel running with kasan, there must be a way to stop
>> > > stack collection and free that memory.
>> > > You cannot treat kernel as your test program or 'device under test'.
>> >
>> > Relax, that is exactly what lockdep does. It cannot dynamically allocate
>> > things because allocators use lock etc..
>> >
>> > Its fine to build up state for debug bits, esp. if its bounded, like the
>> > number of unique callchains.
>>
>> except the code in question is doing unbounded alloc_pages()
>
> Yes, but the trick is to still have a bound sized debug pool - which runs out of
> entries gracefully.
>
> Which in practice is plenty enough for most types of testing, and is a lot more
> robust than any dynamic scheme.

A hard upper bound on consumed memory would work for us without
introducing any slowdown and without increasing code complexity. So it
sounds good to me.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-10 19:57         ` Peter Zijlstra
  2015-12-15 13:09           ` Dmitry Vyukov
@ 2015-12-17 14:06           ` Peter Zijlstra
  2015-12-17 14:08             ` Dmitry Vyukov
  2016-01-06 18:46           ` [tip:perf/core] perf: Fix race in perf_event_exec() tip-bot for Peter Zijlstra
  2 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2015-12-17 14:06 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Thu, Dec 10, 2015 at 08:57:40PM +0100, Peter Zijlstra wrote:
> I'm not sure I can explain your problem with this, but I figure its
> worth a try.

Did it make a difference?

Also, does the syzkaller thing do cpu-hotplug during its runs?

> ---
>  kernel/events/core.c | 15 +++++----------
>  1 file changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index c3d61b92d805..d5293325d8c5 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3136,15 +3136,16 @@ static int event_enable_on_exec(struct perf_event *event,
>   * Enable all of a task's events that have been marked enable-on-exec.
>   * This expects task == current.
>   */
> -static void perf_event_enable_on_exec(struct perf_event_context *ctx)
> +static void perf_event_enable_on_exec(int ctxn)
>  {
> -	struct perf_event_context *clone_ctx = NULL;
> +	struct perf_event_context *ctx, *clone_ctx = NULL;
>  	struct perf_event *event;
>  	unsigned long flags;
>  	int enabled = 0;
>  	int ret;
>  
>  	local_irq_save(flags);
> +	ctx = current->perf_event_ctxp[ctxn];
>  	if (!ctx || !ctx->nr_events)
>  		goto out;
>  
> @@ -3187,17 +3188,11 @@ static void perf_event_enable_on_exec(struct perf_event_context *ctx)
>  
>  void perf_event_exec(void)
>  {
> -	struct perf_event_context *ctx;
>  	int ctxn;
>  
>  	rcu_read_lock();
> -	for_each_task_context_nr(ctxn) {
> -		ctx = current->perf_event_ctxp[ctxn];
> -		if (!ctx)
> -			continue;
> -
> -		perf_event_enable_on_exec(ctx);
> -	}
> +	for_each_task_context_nr(ctxn)
> +		perf_event_enable_on_exec(ctxn);
>  	rcu_read_unlock();
>  }
>  

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-17 14:06           ` Peter Zijlstra
@ 2015-12-17 14:08             ` Dmitry Vyukov
  2015-12-17 14:26               ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2015-12-17 14:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Thu, Dec 17, 2015 at 3:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Dec 10, 2015 at 08:57:40PM +0100, Peter Zijlstra wrote:
>> I'm not sure I can explain your problem with this, but I figure its
>> worth a try.
>
> Did it make a difference?
>
> Also, does the syzkaller thing do cpu-hotplug during its runs?

See my email on Dec 15.
I have CONFIG_HOTPLUG enabled, but I don't do actual hotplug (well,
actually I don't know how to do hotplug, but I still think I am not
doing it).



>> ---
>>  kernel/events/core.c | 15 +++++----------
>>  1 file changed, 5 insertions(+), 10 deletions(-)
>>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index c3d61b92d805..d5293325d8c5 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -3136,15 +3136,16 @@ static int event_enable_on_exec(struct perf_event *event,
>>   * Enable all of a task's events that have been marked enable-on-exec.
>>   * This expects task == current.
>>   */
>> -static void perf_event_enable_on_exec(struct perf_event_context *ctx)
>> +static void perf_event_enable_on_exec(int ctxn)
>>  {
>> -     struct perf_event_context *clone_ctx = NULL;
>> +     struct perf_event_context *ctx, *clone_ctx = NULL;
>>       struct perf_event *event;
>>       unsigned long flags;
>>       int enabled = 0;
>>       int ret;
>>
>>       local_irq_save(flags);
>> +     ctx = current->perf_event_ctxp[ctxn];
>>       if (!ctx || !ctx->nr_events)
>>               goto out;
>>
>> @@ -3187,17 +3188,11 @@ static void perf_event_enable_on_exec(struct perf_event_context *ctx)
>>
>>  void perf_event_exec(void)
>>  {
>> -     struct perf_event_context *ctx;
>>       int ctxn;
>>
>>       rcu_read_lock();
>> -     for_each_task_context_nr(ctxn) {
>> -             ctx = current->perf_event_ctxp[ctxn];
>> -             if (!ctx)
>> -                     continue;
>> -
>> -             perf_event_enable_on_exec(ctx);
>> -     }
>> +     for_each_task_context_nr(ctxn)
>> +             perf_event_enable_on_exec(ctxn);
>>       rcu_read_unlock();
>>  }
>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-17 14:08             ` Dmitry Vyukov
@ 2015-12-17 14:26               ` Peter Zijlstra
  2015-12-17 14:28                 ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2015-12-17 14:26 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Thu, Dec 17, 2015 at 03:08:49PM +0100, Dmitry Vyukov wrote:
> On Thu, Dec 17, 2015 at 3:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, Dec 10, 2015 at 08:57:40PM +0100, Peter Zijlstra wrote:
> >> I'm not sure I can explain your problem with this, but I figure its
> >> worth a try.
> >
> > Did it make a difference?
> >
> > Also, does the syzkaller thing do cpu-hotplug during its runs?
> 
> See my email on Dec 15.

I have 4 emails from you on that day (and one private), they pertain to
getting stuck in perf_{install,remove}_from_context(). None of those
emails seem to contain an answer to either of the two above questions.

  CACT4Y+Za4L4q6FZLXjyGbnSYRwa7RJZYcG_s562q+CrNy9C75w@mail.gmail.com
  CACT4Y+YUkYgCuQFW6eBTtdfRqdF7EEtrB3VaYJk+LUXE_-iGRQ@mail.gmail.com
  CACT4Y+bvHWuyDZjoFhLAOiK1rfNQcfuSJCY0oS5xA=u8j04bXQ@mail.gmail.com
  CACT4Y+an67KnTsVVzYVvrYKW9yg7PbDuC8WyB38PiWRoSt0UCQ@mail.gmail.com

and in private:

  CACT4Y+Zb1vU192JbCnyS9qY3cpsvfAUJdpBVUsmosueCtQjSXQ@mail.gmail.com

Did I miss an email?

This is the thread about the difficult to deterministically reproduce
use-after-free.

> I have CONFIG_HOTPLUG enabled, but I don't do actual hotplug (well,
> actually I don't know how to do hotplug, but I still think I am not
> doing it).

You can do hotplug like:

echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/online

so if syzkaller doesn't know about that, and I suppose you not
knowing this means syzkaller doesn't either, this isn't happening.

Thanks, that disqualifies a bunch of code.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-17 14:26               ` Peter Zijlstra
@ 2015-12-17 14:28                 ` Peter Zijlstra
  2015-12-17 14:35                   ` Dmitry Vyukov
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2015-12-17 14:28 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Thu, Dec 17, 2015 at 03:26:17PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 03:08:49PM +0100, Dmitry Vyukov wrote:
> > On Thu, Dec 17, 2015 at 3:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Thu, Dec 10, 2015 at 08:57:40PM +0100, Peter Zijlstra wrote:
> > >> I'm not sure I can explain your problem with this, but I figure its
> > >> worth a try.
> > >
> > > Did it make a difference?
> > >
> > > Also, does the syzkaller thing do cpu-hotplug during its runs?
> > 
> > See my email on Dec 15.
> 
> I have 4 emails from you on that day (and one private), they pertain to
> getting stuck in perf_{install,remove}_from_context(). None of those
> emails seem to contain an answer to either of the two above questions.
> 
>   CACT4Y+Za4L4q6FZLXjyGbnSYRwa7RJZYcG_s562q+CrNy9C75w@mail.gmail.com
>   CACT4Y+YUkYgCuQFW6eBTtdfRqdF7EEtrB3VaYJk+LUXE_-iGRQ@mail.gmail.com
>   CACT4Y+bvHWuyDZjoFhLAOiK1rfNQcfuSJCY0oS5xA=u8j04bXQ@mail.gmail.com
>   CACT4Y+an67KnTsVVzYVvrYKW9yg7PbDuC8WyB38PiWRoSt0UCQ@mail.gmail.com
> 
> and in private:
> 
>   CACT4Y+Zb1vU192JbCnyS9qY3cpsvfAUJdpBVUsmosueCtQjSXQ@mail.gmail.com
> 
> Did I miss an email?

Yes, I seem to have missed this one:

  CACT4Y+Y_YdEd-YGdWq-wQV6S+1M2nKA6atYO2DOFa8-wtQ8QWg@mail.gmail.com

found it, thanks!

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-17 14:28                 ` Peter Zijlstra
@ 2015-12-17 14:35                   ` Dmitry Vyukov
  2015-12-17 14:43                     ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2015-12-17 14:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Thu, Dec 17, 2015 at 3:28 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > >> I'm not sure I can explain your problem with this, but I figure its
>> > >> worth a try.
>> > >
>> > > Did it make a difference?
>> > >
>> > > Also, does the syzkaller thing do cpu-hotplug during its runs?
>> >
>> > See my email on Dec 15.
>>
>> I have 4 emails from you on that day (and one private), they pertain to
>> getting stuck in perf_{install,remove}_from_context(). None of those
>> emails seem to contain an answer to either of the two above questions.
>>
>>   CACT4Y+Za4L4q6FZLXjyGbnSYRwa7RJZYcG_s562q+CrNy9C75w@mail.gmail.com
>>   CACT4Y+YUkYgCuQFW6eBTtdfRqdF7EEtrB3VaYJk+LUXE_-iGRQ@mail.gmail.com
>>   CACT4Y+bvHWuyDZjoFhLAOiK1rfNQcfuSJCY0oS5xA=u8j04bXQ@mail.gmail.com
>>   CACT4Y+an67KnTsVVzYVvrYKW9yg7PbDuC8WyB38PiWRoSt0UCQ@mail.gmail.com
>>
>> and in private:
>>
>>   CACT4Y+Zb1vU192JbCnyS9qY3cpsvfAUJdpBVUsmosueCtQjSXQ@mail.gmail.com
>>
>> Did I miss an email?
>
> Yes, I seem to have missed this one:
>
>   CACT4Y+Y_YdEd-YGdWq-wQV6S+1M2nKA6atYO2DOFa8-wtQ8QWg@mail.gmail.com
>
> found it, thanks!

Yes, that one.
In short, I did not see use-after-frees but perf_event_open fuzzing
started hanging VMs very frequently, so testing is inconclusive.

And, no, nobody touches anything in /sys/devices.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-17 14:35                   ` Dmitry Vyukov
@ 2015-12-17 14:43                     ` Peter Zijlstra
  2015-12-31 17:15                       ` Dmitry Vyukov
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2015-12-17 14:43 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Thu, Dec 17, 2015 at 03:35:32PM +0100, Dmitry Vyukov wrote:
> In short, I did not see use-after-frees but perf_event_open fuzzing
> started hanging VMs very frequently, so testing is inconclusive.

Right, I'll get back to staring at that one.

Running your test case on bare metal got my machine reproducably stuck
but in a different place than you've reported.

Takes a while too, so far no luck in explaining/debugging it. All my
added debug code remains silent :/

I'll go think up more .. 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-17 14:43                     ` Peter Zijlstra
@ 2015-12-31 17:15                       ` Dmitry Vyukov
  2016-01-05 12:17                         ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2015-12-31 17:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Thu, Dec 17, 2015 at 3:43 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Dec 17, 2015 at 03:35:32PM +0100, Dmitry Vyukov wrote:
>> In short, I did not see use-after-frees but perf_event_open fuzzing
>> started hanging VMs very frequently, so testing is inconclusive.
>
> Right, I'll get back to staring at that one.
>
> Running your test case on bare metal got my machine reproducably stuck
> but in a different place than you've reported.
>
> Takes a while too, so far no luck in explaining/debugging it. All my
> added debug code remains silent :/
>
> I'll go think up more ..

Hi Peter,

Any progress with this?
I can't continue to test perf subsytem, as it reliably hangs my machines.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2015-12-31 17:15                       ` Dmitry Vyukov
@ 2016-01-05 12:17                         ` Peter Zijlstra
  2016-01-08  8:40                           ` Dmitry Vyukov
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2016-01-05 12:17 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Thu, Dec 31, 2015 at 06:15:41PM +0100, Dmitry Vyukov wrote:
> On Thu, Dec 17, 2015 at 3:43 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, Dec 17, 2015 at 03:35:32PM +0100, Dmitry Vyukov wrote:
> >> In short, I did not see use-after-frees but perf_event_open fuzzing
> >> started hanging VMs very frequently, so testing is inconclusive.
> >
> > Right, I'll get back to staring at that one.
> >
> > Running your test case on bare metal got my machine reproducably stuck
> > but in a different place than you've reported.
> >
> > Takes a while too, so far no luck in explaining/debugging it. All my
> > added debug code remains silent :/
> >
> > I'll go think up more ..
> 
> Hi Peter,
> 
> Any progress with this?
> I can't continue to test perf subsytem, as it reliably hangs my machines.

Holidays got in the way, I'll continue asap when I've cleared some of
the easy mail backlog from the past two weeks.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [tip:perf/core] perf: Fix race in perf_event_exec()
  2015-12-10 19:57         ` Peter Zijlstra
  2015-12-15 13:09           ` Dmitry Vyukov
  2015-12-17 14:06           ` Peter Zijlstra
@ 2016-01-06 18:46           ` tip-bot for Peter Zijlstra
  2016-01-06 18:56             ` Eric Dumazet
  2 siblings, 1 reply; 46+ messages in thread
From: tip-bot for Peter Zijlstra @ 2016-01-06 18:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: kcc, glider, torvalds, dvyukov, sasha.levin, peterz, syzkaller,
	edumazet, linux-kernel, acme, vincent.weaver, tglx, eranian,
	acme, hpa, mingo, jolsa

Commit-ID:  c127449944659543e5e2423002f08f0af98dba5c
Gitweb:     http://git.kernel.org/tip/c127449944659543e5e2423002f08f0af98dba5c
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 10 Dec 2015 20:57:40 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 6 Jan 2016 10:52:38 +0100

perf: Fix race in perf_event_exec()

I managed to tickle this warning:

  [ 2338.884942] ------------[ cut here ]------------
  [ 2338.890112] WARNING: CPU: 13 PID: 35162 at ../kernel/events/core.c:2702 task_ctx_sched_out+0x6b/0x80()
  [ 2338.900504] Modules linked in:
  [ 2338.903933] CPU: 13 PID: 35162 Comm: bash Not tainted 4.4.0-rc4-dirty #244
  [ 2338.911610] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
  [ 2338.923071]  ffffffff81f1468e ffff8807c6457cb8 ffffffff815c680c 0000000000000000
  [ 2338.931382]  ffff8807c6457cf0 ffffffff810c8a56 ffffe8ffff8c1bd0 ffff8808132ed400
  [ 2338.939678]  0000000000000286 ffff880813170380 ffff8808132ed400 ffff8807c6457d00
  [ 2338.947987] Call Trace:
  [ 2338.950726]  [<ffffffff815c680c>] dump_stack+0x4e/0x82
  [ 2338.956474]  [<ffffffff810c8a56>] warn_slowpath_common+0x86/0xc0
  [ 2338.963195]  [<ffffffff810c8b4a>] warn_slowpath_null+0x1a/0x20
  [ 2338.969720]  [<ffffffff811a49cb>] task_ctx_sched_out+0x6b/0x80
  [ 2338.976244]  [<ffffffff811a62d2>] perf_event_exec+0xe2/0x180
  [ 2338.982575]  [<ffffffff8121fb6f>] setup_new_exec+0x6f/0x1b0
  [ 2338.988810]  [<ffffffff8126de83>] load_elf_binary+0x393/0x1660
  [ 2338.995339]  [<ffffffff811dc772>] ? get_user_pages+0x52/0x60
  [ 2339.001669]  [<ffffffff8121e297>] search_binary_handler+0x97/0x200
  [ 2339.008581]  [<ffffffff8121f8b3>] do_execveat_common.isra.33+0x543/0x6e0
  [ 2339.016072]  [<ffffffff8121fcea>] SyS_execve+0x3a/0x50
  [ 2339.021819]  [<ffffffff819fc165>] stub_execve+0x5/0x5
  [ 2339.027469]  [<ffffffff819fbeb2>] ? entry_SYSCALL_64_fastpath+0x12/0x71
  [ 2339.034860] ---[ end trace ee1337c59a0ddeac ]---

Which is a WARN_ON_ONCE() indicating that cpuctx->task_ctx is not
what we expected it to be.

This is because context switches can swap the task_struct::perf_event_ctxp[]
pointer around. Therefore you have to either disable preemption when looking
at current, or hold ctx->lock.

Fix perf_event_enable_on_exec(), it loads current->perf_event_ctxp[]
before disabling interrupts, therefore a preemption in the right place
can swap contexts around and we're using the wrong one.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: syzkaller <syzkaller@googlegroups.com>
Link: http://lkml.kernel.org/r/20151210195740.GG6357@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 39cf4a4..fd7de04 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3154,15 +3154,16 @@ static int event_enable_on_exec(struct perf_event *event,
  * Enable all of a task's events that have been marked enable-on-exec.
  * This expects task == current.
  */
-static void perf_event_enable_on_exec(struct perf_event_context *ctx)
+static void perf_event_enable_on_exec(int ctxn)
 {
-	struct perf_event_context *clone_ctx = NULL;
+	struct perf_event_context *ctx, *clone_ctx = NULL;
 	struct perf_event *event;
 	unsigned long flags;
 	int enabled = 0;
 	int ret;
 
 	local_irq_save(flags);
+	ctx = current->perf_event_ctxp[ctxn];
 	if (!ctx || !ctx->nr_events)
 		goto out;
 
@@ -3205,17 +3206,11 @@ out:
 
 void perf_event_exec(void)
 {
-	struct perf_event_context *ctx;
 	int ctxn;
 
 	rcu_read_lock();
-	for_each_task_context_nr(ctxn) {
-		ctx = current->perf_event_ctxp[ctxn];
-		if (!ctx)
-			continue;
-
-		perf_event_enable_on_exec(ctx);
-	}
+	for_each_task_context_nr(ctxn)
+		perf_event_enable_on_exec(ctxn);
 	rcu_read_unlock();
 }
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [tip:perf/core] perf: Fix race in perf_event_exec()
  2016-01-06 18:46           ` [tip:perf/core] perf: Fix race in perf_event_exec() tip-bot for Peter Zijlstra
@ 2016-01-06 18:56             ` Eric Dumazet
  2016-01-07 13:40               ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Dumazet @ 2016-01-06 18:56 UTC (permalink / raw)
  To: syzkaller, Eric Dumazet, LKML, Peter Zijlstra, acme,
	Linus Torvalds, Alexander Potapenko, Kostya Serebryany,
	Dmitry Vyukov, Sasha Levin, H. Peter Anvin, jolsa, Ingo Molnar,
	vincent.weaver, Thomas Gleixner, acme, Stephane Eranian
  Cc: linux-tip-commits

On Wed, Jan 6, 2016 at 1:46 PM, tip-bot for Peter Zijlstra
<tipbot@zytor.com> wrote:

>
> This is because context switches can swap the task_struct::perf_event_ctxp[]
> pointer around. Therefore you have to either disable preemption when looking
> at current, or hold ctx->lock.
>


>
>  void perf_event_exec(void)
>  {
> -       struct perf_event_context *ctx;
>         int ctxn;
>
>         rcu_read_lock();

Do we still need this rcu_read_lock(), if perf_event_enable_on_exec()
uses  local_irq_save( ?

> -       for_each_task_context_nr(ctxn) {
> -               ctx = current->perf_event_ctxp[ctxn];
> -               if (!ctx)
> -                       continue;
> -
> -               perf_event_enable_on_exec(ctx);
> -       }
> +       for_each_task_context_nr(ctxn)
> +               perf_event_enable_on_exec(ctxn);
>         rcu_read_unlock();
>  }
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [tip:perf/core] perf: Fix race in perf_event_exec()
  2016-01-06 18:56             ` Eric Dumazet
@ 2016-01-07 13:40               ` Peter Zijlstra
  2016-01-07 16:26                 ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2016-01-07 13:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: syzkaller, LKML, acme, Linus Torvalds, Alexander Potapenko,
	Kostya Serebryany, Dmitry Vyukov, Sasha Levin, H. Peter Anvin,
	jolsa, Ingo Molnar, vincent.weaver, Thomas Gleixner, acme,
	Stephane Eranian, linux-tip-commits, Paul McKenney

On Wed, Jan 06, 2016 at 01:56:56PM -0500, Eric Dumazet wrote:
> On Wed, Jan 6, 2016 at 1:46 PM, tip-bot for Peter Zijlstra
> <tipbot@zytor.com> wrote:
> 
> >
> > This is because context switches can swap the task_struct::perf_event_ctxp[]
> > pointer around. Therefore you have to either disable preemption when looking
> > at current, or hold ctx->lock.
> >
> 
> 
> >
> >  void perf_event_exec(void)
> >  {
> > -       struct perf_event_context *ctx;
> >         int ctxn;
> >
> >         rcu_read_lock();
> 
> Do we still need this rcu_read_lock(), if perf_event_enable_on_exec()
> uses  local_irq_save( ?

Strictly speaking we should not rely on the fact that RCU grace periods
do not progress with IRQs disabled, so yes.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [tip:perf/core] perf: Fix race in perf_event_exec()
  2016-01-07 13:40               ` Peter Zijlstra
@ 2016-01-07 16:26                 ` Paul E. McKenney
  2016-01-07 16:36                   ` Eric Dumazet
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2016-01-07 16:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric Dumazet, syzkaller, LKML, acme, Linus Torvalds,
	Alexander Potapenko, Kostya Serebryany, Dmitry Vyukov,
	Sasha Levin, H. Peter Anvin, jolsa, Ingo Molnar, vincent.weaver,
	Thomas Gleixner, acme, Stephane Eranian, linux-tip-commits

On Thu, Jan 07, 2016 at 02:40:08PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 06, 2016 at 01:56:56PM -0500, Eric Dumazet wrote:
> > On Wed, Jan 6, 2016 at 1:46 PM, tip-bot for Peter Zijlstra
> > <tipbot@zytor.com> wrote:
> > 
> > >
> > > This is because context switches can swap the task_struct::perf_event_ctxp[]
> > > pointer around. Therefore you have to either disable preemption when looking
> > > at current, or hold ctx->lock.
> > >
> > 
> > 
> > >
> > >  void perf_event_exec(void)
> > >  {
> > > -       struct perf_event_context *ctx;
> > >         int ctxn;
> > >
> > >         rcu_read_lock();
> > 
> > Do we still need this rcu_read_lock(), if perf_event_enable_on_exec()
> > uses  local_irq_save( ?
> 
> Strictly speaking we should not rely on the fact that RCU grace periods
> do not progress with IRQs disabled, so yes.

What Peter said!

The current implementation would let you get away with IRQs disabled
(aside from lockdep splats), but it is easy to imagine implementations
that do not, hence the restriction.

Is the extra rcu_read_lock() and rcu_read_unlock()  causing a performance
problem?  If so, one option would be use of synchronize_sched() and
call_rcu_sched(), which explicitly recognize disabled IRQs as read-side
critical sections.  Of course, this might well put you in a position
where you would like to use IRQ disabling in some cases, BH disabling
(or softirq context) in others, and rcu_read_lock() in yet other cases.

If so, the good news is that RCU actually now supports this efficiently.
There is the shiny new synchronize_rcu_mult() macro that can be used
as follows to wait for all three types of grace periods:

	synchronize_rcu_mult(call_rcu, call_rcu_bh, call_rcu_sched);

This would wait concurrently for all three grace periods to elapse.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [tip:perf/core] perf: Fix race in perf_event_exec()
  2016-01-07 16:26                 ` Paul E. McKenney
@ 2016-01-07 16:36                   ` Eric Dumazet
  2016-01-07 16:46                     ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Dumazet @ 2016-01-07 16:36 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Peter Zijlstra, syzkaller, LKML, acme, Linus Torvalds,
	Alexander Potapenko, Kostya Serebryany, Dmitry Vyukov,
	Sasha Levin, H. Peter Anvin, jolsa, Ingo Molnar, vincent.weaver,
	Thomas Gleixner, acme, Stephane Eranian, linux-tip-commits

On Thu, Jan 7, 2016 at 11:26 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Thu, Jan 07, 2016 at 02:40:08PM +0100, Peter Zijlstra wrote:
>> On Wed, Jan 06, 2016 at 01:56:56PM -0500, Eric Dumazet wrote:
>> > On Wed, Jan 6, 2016 at 1:46 PM, tip-bot for Peter Zijlstra
>> > <tipbot@zytor.com> wrote:
>> >
>> > >
>> > > This is because context switches can swap the task_struct::perf_event_ctxp[]
>> > > pointer around. Therefore you have to either disable preemption when looking
>> > > at current, or hold ctx->lock.
>> > >
>> >
>> >
>> > >
>> > >  void perf_event_exec(void)
>> > >  {
>> > > -       struct perf_event_context *ctx;
>> > >         int ctxn;
>> > >
>> > >         rcu_read_lock();
>> >
>> > Do we still need this rcu_read_lock(), if perf_event_enable_on_exec()
>> > uses  local_irq_save( ?
>>
>> Strictly speaking we should not rely on the fact that RCU grace periods
>> do not progress with IRQs disabled, so yes.
>
> What Peter said!
>
> The current implementation would let you get away with IRQs disabled
> (aside from lockdep splats), but it is easy to imagine implementations
> that do not, hence the restriction.
>
> Is the extra rcu_read_lock() and rcu_read_unlock()  causing a performance
> problem?  If so, one option would be use of synchronize_sched() and
> call_rcu_sched(), which explicitly recognize disabled IRQs as read-side
> critical sections.  Of course, this might well put you in a position
> where you would like to use IRQ disabling in some cases, BH disabling
> (or softirq context) in others, and rcu_read_lock() in yet other cases.
>

I guess I could not see why the rcu_read_lock() was needed around this loop.

rcu_read_lock();
for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
       perf_event_enable_on_exec(ctxn);
rcu_read_unlock();

But apparently it is needed in perf_event_enable_on_exec() and we avoid calling
rcu_read_lock() and rcu_read_unlock()  (perf_nr_task_contexts - 1 ) times

> If so, the good news is that RCU actually now supports this efficiently.
> There is the shiny new synchronize_rcu_mult() macro that can be used
> as follows to wait for all three types of grace periods:
>
>         synchronize_rcu_mult(call_rcu, call_rcu_bh, call_rcu_sched);
>
> This would wait concurrently for all three grace periods to elapse.
>

Interesting ;)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [tip:perf/core] perf: Fix race in perf_event_exec()
  2016-01-07 16:36                   ` Eric Dumazet
@ 2016-01-07 16:46                     ` Paul E. McKenney
  0 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2016-01-07 16:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Peter Zijlstra, syzkaller, LKML, acme, Linus Torvalds,
	Alexander Potapenko, Kostya Serebryany, Dmitry Vyukov,
	Sasha Levin, H. Peter Anvin, jolsa, Ingo Molnar, vincent.weaver,
	Thomas Gleixner, acme, Stephane Eranian, linux-tip-commits

On Thu, Jan 07, 2016 at 11:36:21AM -0500, Eric Dumazet wrote:
> On Thu, Jan 7, 2016 at 11:26 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Thu, Jan 07, 2016 at 02:40:08PM +0100, Peter Zijlstra wrote:
> >> On Wed, Jan 06, 2016 at 01:56:56PM -0500, Eric Dumazet wrote:
> >> > On Wed, Jan 6, 2016 at 1:46 PM, tip-bot for Peter Zijlstra
> >> > <tipbot@zytor.com> wrote:
> >> >
> >> > >
> >> > > This is because context switches can swap the task_struct::perf_event_ctxp[]
> >> > > pointer around. Therefore you have to either disable preemption when looking
> >> > > at current, or hold ctx->lock.
> >> > >
> >> >
> >> >
> >> > >
> >> > >  void perf_event_exec(void)
> >> > >  {
> >> > > -       struct perf_event_context *ctx;
> >> > >         int ctxn;
> >> > >
> >> > >         rcu_read_lock();
> >> >
> >> > Do we still need this rcu_read_lock(), if perf_event_enable_on_exec()
> >> > uses  local_irq_save( ?
> >>
> >> Strictly speaking we should not rely on the fact that RCU grace periods
> >> do not progress with IRQs disabled, so yes.
> >
> > What Peter said!
> >
> > The current implementation would let you get away with IRQs disabled
> > (aside from lockdep splats), but it is easy to imagine implementations
> > that do not, hence the restriction.
> >
> > Is the extra rcu_read_lock() and rcu_read_unlock()  causing a performance
> > problem?  If so, one option would be use of synchronize_sched() and
> > call_rcu_sched(), which explicitly recognize disabled IRQs as read-side
> > critical sections.  Of course, this might well put you in a position
> > where you would like to use IRQ disabling in some cases, BH disabling
> > (or softirq context) in others, and rcu_read_lock() in yet other cases.
> 
> I guess I could not see why the rcu_read_lock() was needed around this loop.
> 
> rcu_read_lock();
> for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
>        perf_event_enable_on_exec(ctxn);
> rcu_read_unlock();
> 
> But apparently it is needed in perf_event_enable_on_exec() and we avoid calling
> rcu_read_lock() and rcu_read_unlock()  (perf_nr_task_contexts - 1 ) times

If the overhead of rcu_read_lock() and rcu_read_unlock() really are
a noticeable fraction of the loop overhead, and if perf_nr_task_contexts
is large, then you might well have a problem.  I still suggest actually
measuring it, but you know me.  ;-)

> > If so, the good news is that RCU actually now supports this efficiently.
> > There is the shiny new synchronize_rcu_mult() macro that can be used
> > as follows to wait for all three types of grace periods:
> >
> >         synchronize_rcu_mult(call_rcu, call_rcu_bh, call_rcu_sched);
> >
> > This would wait concurrently for all three grace periods to elapse.
> 
> Interesting ;)

;-) ;-) ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2016-01-05 12:17                         ` Peter Zijlstra
@ 2016-01-08  8:40                           ` Dmitry Vyukov
  2016-01-08 10:28                             ` Dmitry Vyukov
  0 siblings, 1 reply; 46+ messages in thread
From: Dmitry Vyukov @ 2016-01-08  8:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Tue, Jan 5, 2016 at 1:17 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Dec 31, 2015 at 06:15:41PM +0100, Dmitry Vyukov wrote:
>> On Thu, Dec 17, 2015 at 3:43 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Thu, Dec 17, 2015 at 03:35:32PM +0100, Dmitry Vyukov wrote:
>> >> In short, I did not see use-after-frees but perf_event_open fuzzing
>> >> started hanging VMs very frequently, so testing is inconclusive.
>> >
>> > Right, I'll get back to staring at that one.
>> >
>> > Running your test case on bare metal got my machine reproducably stuck
>> > but in a different place than you've reported.
>> >
>> > Takes a while too, so far no luck in explaining/debugging it. All my
>> > added debug code remains silent :/
>> >
>> > I'll go think up more ..
>>
>> Hi Peter,
>>
>> Any progress with this?
>> I can't continue to test perf subsytem, as it reliably hangs my machines.
>
> Holidays got in the way, I'll continue asap when I've cleared some of
> the easy mail backlog from the past two weeks.


Hi Peter,

I've hit the following WARNING twice over night run. This is on
b06f3a168cdcd80026276898fd1fee443ef25743 (Jan 6) + the following patch
from you. I can't easily reproduce it. Does it say something to you?
It looks like it all happen in the context of a single perf_event_open
call.


------------[ cut here ]------------
WARNING: CPU: 2 PID: 2523 at kernel/events/core.c:1891
event_sched_in.isra.95+0x84b/0xcd0()
Modules linked in:
CPU: 2 PID: 2523 Comm: syz-executor Not tainted 4.4.0-rc8+ #206
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 00000000ffffffff ffff880042d7f7d8 ffffffff8289a56d 0000000000000000
 ffff8800590b9780 ffffffff85db2300 ffff880042d7f818 ffffffff812eb849
 ffffffff815ae34b ffffffff85db2300 0000000000000763 ffff88005f35f4e0
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff8289a56d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
 [<ffffffff812eb849>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:460
 [<ffffffff812eba79>] warn_slowpath_null+0x29/0x30 kernel/panic.c:493
 [<ffffffff815ae34b>] event_sched_in.isra.95+0x84b/0xcd0
kernel/events/core.c:1891
 [<ffffffff815b4164>] group_sched_in+0x114/0x410 kernel/events/core.c:1963
 [<     inline     >] ctx_pinned_sched_in kernel/events/core.c:2744
 [<ffffffff815b5085>] ctx_sched_in+0xc25/0x11b0 kernel/events/core.c:2807
 [<ffffffff815b5657>] perf_event_sched_in+0x47/0xa0 kernel/events/core.c:2073
 [<ffffffff815b587c>] perf_event_context_sched_in+0x1cc/0x380
kernel/events/core.c:2844
 [<ffffffff815bf4f3>] __perf_event_task_sched_in+0x3a3/0x480
kernel/events/core.c:2872
 [<     inline     >] perf_event_task_sched_in include/linux/perf_event.h:931
 [<ffffffff8136d40f>] finish_task_switch+0x3cf/0x610 kernel/sched/core.c:2576
 [<ffffffff85c7b08d>] __schedule+0x902016/01/07 20:52:38 executing program 2:
 [<ffffffff85c7cbc2>] preempt_schedule_common+0x42/0x70 kernel/sched/core.c:3250
 [<ffffffff85c7cc07>] _cond_resched+0x17/0x20 kernel/sched/core.c:4618
 [<     inline     >] slab_pre_alloc_hook mm/slub.c:1317
 [<     inline     >] slab_alloc_node mm/slub.c:2520
 [<     inline     >] slab_alloc mm/slub.c:2602
 [<ffffffff816cba4b>] kmem_cache_alloc_trace+0x24b/0x2f0 mm/slub.c:2619
 [<     inline     >] kmalloc include/linux/slab.h:458
 [<     inline     >] kzalloc include/linux/slab.h:602
 [<     inline     >] aa_alloc_file_context
security/apparmor/include/context.h:46
 [<ffffffff826a9c85>] apparmor_file_alloc_security+0x45/0xa0
security/apparmor/lsm.c:402
 [<ffffffff825e2a1d>] security_file_alloc+0x6d/0xa0 security/security.c:742
 [<ffffffff81716aba>] get_empty_filp+0x13a/0x3f0 fs/file_table.c:128
 [<ffffffff81716d90>] alloc_file+0x20/0x380 fs/file_table.c:163
 [<ffffffff817f4fdd>] anon_inode_getfile+0x19d/0x340 fs/anon_inodes.c:105
 [<ffffffff815c7daa>] SYSC_perf_event_open+0x12da/0x1fa0
kernel/events/core.c:8471
 [<ffffffff815cf609>] SyS_perf_event_open+0x39/0x50 kernel/events/core.c:8262
 [<ffffffff85c8a2b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
---[ end trace 326ea08cc5f1b454 ]---


diff --git a/kernel/events/core.c b/kernel/events/core.c
index ef2d6ea..fdf78fb 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -881,13 +881,22 @@ static void free_ctx(struct rcu_head *head)
        kfree(ctx);
 }

+#define for_each_task_context_nr(ctxn)                                 \
+       for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
+
 static void put_ctx(struct perf_event_context *ctx)
 {
        if (atomic_dec_and_test(&ctx->refcount)) {
                if (ctx->parent_ctx)
                        put_ctx(ctx->parent_ctx);
-               if (ctx->task)
+               if (ctx->task) {
+                       int ctxn;
+
+                       for_each_task_context_nr(ctxn)
+
WARN_ON(ctx->task->perf_event_ctxp[ctxn] == ctx);
+
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ef2d6ea..fdf78fb 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -881,13 +881,22 @@ static void free_ctx(struct rcu_head *head)
        kfree(ctx);
 }

+#define for_each_task_context_nr(ctxn)                                 \
+       for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
+
 static void put_ctx(struct perf_event_context *ctx)
 {
        if (atomic_dec_and_test(&ctx->refcount)) {
                if (ctx->parent_ctx)
                        put_ctx(ctx->parent_ctx);
-               if (ctx->task)
+               if (ctx->task) {
+                       int ctxn;
+
+                       for_each_task_context_nr(ctxn)
+
WARN_ON(ctx->task->perf_event_ctxp[ctxn] == ctx);
+
                        put_task_struct(ctx->task);
+               }
                call_rcu(&ctx->rcu_head, free_ctx);
        }
 }
@@ -2658,9 +2667,6 @@ static void perf_pmu_sched_task(struct task_struct *prev,
 static void perf_event_switch(struct task_struct *task,
                              struct task_struct *next_prev, bool sched_in);

-#define for_each_task_context_nr(ctxn)                                 \
-       for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
-
 /*
  * Called from scheduler to remove the events of the current task,
  * with interrupts disabled.
@@ -3154,15 +3160,16 @@ static int event_enable_on_exec(struct
perf_event *event,
  * Enable all of a task's events that have been marked enable-on-exec.
  * This expects task == current.
  */
-static void perf_event_enable_on_exec(struct perf_event_context *ctx)
+static void perf_event_enable_on_exec(int ctxn)
 {
-       struct perf_event_context *clone_ctx = NULL;
+       struct perf_event_context *ctx, *clone_ctx = NULL;
        struct perf_event *event;
        unsigned long flags;
        int enabled = 0;
        int ret;

        local_irq_save(flags);
+       ctx = current->perf_event_ctxp[ctxn];
        if (!ctx || !ctx->nr_events)
                goto out;

@@ -3205,17 +3212,11 @@ out:

 void perf_event_exec(void)
 {
-       struct perf_event_context *ctx;
        int ctxn;

        rcu_read_lock();
-       for_each_task_context_nr(ctxn) {
-               ctx = current->perf_event_ctxp[ctxn];
-               if (!ctx)
-                       continue;
-
-               perf_event_enable_on_exec(ctx);
-       }
+       for_each_task_context_nr(ctxn)
+               perf_event_enable_on_exec(ctxn);
        rcu_read_unlock();
 }

@@ -8967,6 +8968,8 @@ again:

                mutex_unlock(&ctx->mutex);

+               task->perf_event_ctxp[ctxn] = NULL;
+
                put_ctx(ctx);
        }
 }

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: use-after-free in __perf_install_in_context
  2016-01-08  8:40                           ` Dmitry Vyukov
@ 2016-01-08 10:28                             ` Dmitry Vyukov
  0 siblings, 0 replies; 46+ messages in thread
From: Dmitry Vyukov @ 2016-01-08 10:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Sasha Levin

On Fri, Jan 8, 2016 at 9:40 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, Jan 5, 2016 at 1:17 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Thu, Dec 31, 2015 at 06:15:41PM +0100, Dmitry Vyukov wrote:
>>> On Thu, Dec 17, 2015 at 3:43 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> > On Thu, Dec 17, 2015 at 03:35:32PM +0100, Dmitry Vyukov wrote:
>>> >> In short, I did not see use-after-frees but perf_event_open fuzzing
>>> >> started hanging VMs very frequently, so testing is inconclusive.
>>> >
>>> > Right, I'll get back to staring at that one.
>>> >
>>> > Running your test case on bare metal got my machine reproducably stuck
>>> > but in a different place than you've reported.
>>> >
>>> > Takes a while too, so far no luck in explaining/debugging it. All my
>>> > added debug code remains silent :/
>>> >
>>> > I'll go think up more ..
>>>
>>> Hi Peter,
>>>
>>> Any progress with this?
>>> I can't continue to test perf subsytem, as it reliably hangs my machines.
>>
>> Holidays got in the way, I'll continue asap when I've cleared some of
>> the easy mail backlog from the past two weeks.
>
>
> Hi Peter,
>
> I've hit the following WARNING twice over night run. This is on
> b06f3a168cdcd80026276898fd1fee443ef25743 (Jan 6) + the following patch
> from you. I can't easily reproduce it. Does it say something to you?
> It looks like it all happen in the context of a single perf_event_open
> call.
>
>
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 2523 at kernel/events/core.c:1891
> event_sched_in.isra.95+0x84b/0xcd0()
> Modules linked in:
> CPU: 2 PID: 2523 Comm: syz-executor Not tainted 4.4.0-rc8+ #206
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  00000000ffffffff ffff880042d7f7d8 ffffffff8289a56d 0000000000000000
>  ffff8800590b9780 ffffffff85db2300 ffff880042d7f818 ffffffff812eb849
>  ffffffff815ae34b ffffffff85db2300 0000000000000763 ffff88005f35f4e0
> Call Trace:
>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  [<ffffffff8289a56d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
>  [<ffffffff812eb849>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:460
>  [<ffffffff812eba79>] warn_slowpath_null+0x29/0x30 kernel/panic.c:493
>  [<ffffffff815ae34b>] event_sched_in.isra.95+0x84b/0xcd0
> kernel/events/core.c:1891
>  [<ffffffff815b4164>] group_sched_in+0x114/0x410 kernel/events/core.c:1963
>  [<     inline     >] ctx_pinned_sched_in kernel/events/core.c:2744
>  [<ffffffff815b5085>] ctx_sched_in+0xc25/0x11b0 kernel/events/core.c:2807
>  [<ffffffff815b5657>] perf_event_sched_in+0x47/0xa0 kernel/events/core.c:2073
>  [<ffffffff815b587c>] perf_event_context_sched_in+0x1cc/0x380
> kernel/events/core.c:2844
>  [<ffffffff815bf4f3>] __perf_event_task_sched_in+0x3a3/0x480
> kernel/events/core.c:2872
>  [<     inline     >] perf_event_task_sched_in include/linux/perf_event.h:931
>  [<ffffffff8136d40f>] finish_task_switch+0x3cf/0x610 kernel/sched/core.c:2576
>  [<ffffffff85c7b08d>] __schedule+0x902016/01/07 20:52:38 executing program 2:
>  [<ffffffff85c7cbc2>] preempt_schedule_common+0x42/0x70 kernel/sched/core.c:3250
>  [<ffffffff85c7cc07>] _cond_resched+0x17/0x20 kernel/sched/core.c:4618
>  [<     inline     >] slab_pre_alloc_hook mm/slub.c:1317
>  [<     inline     >] slab_alloc_node mm/slub.c:2520
>  [<     inline     >] slab_alloc mm/slub.c:2602
>  [<ffffffff816cba4b>] kmem_cache_alloc_trace+0x24b/0x2f0 mm/slub.c:2619
>  [<     inline     >] kmalloc include/linux/slab.h:458
>  [<     inline     >] kzalloc include/linux/slab.h:602
>  [<     inline     >] aa_alloc_file_context
> security/apparmor/include/context.h:46
>  [<ffffffff826a9c85>] apparmor_file_alloc_security+0x45/0xa0
> security/apparmor/lsm.c:402
>  [<ffffffff825e2a1d>] security_file_alloc+0x6d/0xa0 security/security.c:742
>  [<ffffffff81716aba>] get_empty_filp+0x13a/0x3f0 fs/file_table.c:128
>  [<ffffffff81716d90>] alloc_file+0x20/0x380 fs/file_table.c:163
>  [<ffffffff817f4fdd>] anon_inode_getfile+0x19d/0x340 fs/anon_inodes.c:105
>  [<ffffffff815c7daa>] SYSC_perf_event_open+0x12da/0x1fa0
> kernel/events/core.c:8471
>  [<ffffffff815cf609>] SyS_perf_event_open+0x39/0x50 kernel/events/core.c:8262
>  [<ffffffff85c8a2b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
> arch/x86/entry/entry_64.S:185
> ---[ end trace 326ea08cc5f1b454 ]---


I've added the following change in an attempt to catch this WARNING:

@@ -2835,6 +2841,10 @@ static void perf_event_context_sched_in(struct
perf_event_context *ctx,
        if (ctx->nr_events)
                cpuctx->task_ctx = ctx;

+       mdelay(10);
+       if (cpuctx->task_ctx != NULL && cpuctx->task_ctx != ctx)
+               pr_err("HERE!!!");
+
        perf_event_sched_in(cpuctx, cpuctx->task_ctx, task);

        perf_pmu_enable(ctx->pmu);

Does this change legitimate?
Now my program that stresses perf hangs in an interesting state.
Processes consume some CPU:

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND

                                   6671 nobody    20   0     0    0
0 Z  20.2  0.0  13:41.43 syz-executor

                     6542 nobody    20   0     0    0    0 Z  17.6
0.0  13:25.74 syz-executor

          6558 nobody    20   0     0    0    0 Z  17.6  0.0  13:32.36
syz-executor

Some are blocked on futexes:

# cat /proc/6671/task/**/stack
[<ffffffff812f5bc4>] do_exit+0x1ab4/0x2b20
[<ffffffff812f6da8>] do_group_exit+0x108/0x320
[<ffffffff812f6fdd>] SyS_exit_group+0x1d/0x20
[<ffffffff85c8a2f6>] entry_SYSCALL_64_fastpath+0x16/0x7a
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8148464d>] futex_wait_queue_me+0x30d/0x5e0
[<ffffffff81485125>] futex_wait+0x215/0x5f0
[<ffffffff81489738>] do_futex+0x218/0x1680
[<ffffffff8148acd3>] SyS_futex+0x133/0x2b0
[<ffffffff85c8a2f6>] entry_SYSCALL_64_fastpath+0x16/0x7a
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8148464d>] futex_wait_queue_me+0x30d/0x5e0
[<ffffffff81485125>] futex_wait+0x215/0x5f0
[<ffffffff81489738>] do_futex+0x218/0x1680
[<ffffffff8148acd3>] SyS_futex+0x133/0x2b0
[<ffffffff85c8a2f6>] entry_SYSCALL_64_fastpath+0x16/0x7a
[<ffffffffffffffff>] 0xffffffffffffffff

And some hang in exit_to_usermode_loop:

# cat /proc/6542/task/**/stack
[<ffffffff812f5bc4>] do_exit+0x1ab4/0x2b20
[<ffffffff812f6da8>] do_group_exit+0x108/0x320
[<ffffffff812f6fdd>] SyS_exit_group+0x1d/0x20
[<ffffffff85c8a2f6>] entry_SYSCALL_64_fastpath+0x16/0x7a
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff810038e5>] exit_to_usermode_loop+0xd5/0x1a0
[<ffffffff8100631f>] syscall_return_slowpath+0x19f/0x210
[<ffffffff85c8a462>] int_ret_from_sys_call+0x25/0x9f
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8148464d>] futex_wait_queue_me+0x30d/0x5e0
[<ffffffff81485125>] futex_wait+0x215/0x5f0
[<ffffffff81489738>] do_futex+0x218/0x1680
[<ffffffff8148acd3>] SyS_futex+0x133/0x2b0
[<ffffffff85c8a2f6>] entry_SYSCALL_64_fastpath+0x16/0x7a
[<ffffffffffffffff>] 0xffffffffffffffff

They are unkillable and one of the threaded did do_group_exit...

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2016-01-08 10:28 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-04 20:04 use-after-free in __perf_install_in_context Dmitry Vyukov
2015-12-04 20:32 ` Alexei Starovoitov
2015-12-04 21:00   ` Dmitry Vyukov
2015-12-07 11:04     ` Dmitry Vyukov
2015-12-07 11:06       ` Dmitry Vyukov
2015-12-07 11:24         ` Dmitry Vyukov
2015-12-07 15:36 ` Peter Zijlstra
2015-12-07 16:09   ` Dmitry Vyukov
2015-12-08  3:24     ` Alexei Starovoitov
2015-12-08 16:12       ` Dmitry Vyukov
2015-12-08 17:54         ` Alexei Starovoitov
2015-12-08 17:56           ` Dmitry Vyukov
2015-12-08 18:05             ` Alexei Starovoitov
2015-12-08 18:35               ` Dmitry Vyukov
2015-12-08 19:56                 ` Alexei Starovoitov
2015-12-09  9:17                   ` Dmitry Vyukov
2015-12-10  3:54                     ` Alexei Starovoitov
2015-12-10  9:02                       ` Peter Zijlstra
2015-12-10 17:03                         ` Alexei Starovoitov
2015-12-11  8:14                           ` Ingo Molnar
2015-12-15 13:11                             ` Dmitry Vyukov
2015-12-08 16:44     ` Peter Zijlstra
2015-12-08 19:14       ` Dmitry Vyukov
2015-12-10 19:57         ` Peter Zijlstra
2015-12-15 13:09           ` Dmitry Vyukov
2015-12-17 14:06           ` Peter Zijlstra
2015-12-17 14:08             ` Dmitry Vyukov
2015-12-17 14:26               ` Peter Zijlstra
2015-12-17 14:28                 ` Peter Zijlstra
2015-12-17 14:35                   ` Dmitry Vyukov
2015-12-17 14:43                     ` Peter Zijlstra
2015-12-31 17:15                       ` Dmitry Vyukov
2016-01-05 12:17                         ` Peter Zijlstra
2016-01-08  8:40                           ` Dmitry Vyukov
2016-01-08 10:28                             ` Dmitry Vyukov
2016-01-06 18:46           ` [tip:perf/core] perf: Fix race in perf_event_exec() tip-bot for Peter Zijlstra
2016-01-06 18:56             ` Eric Dumazet
2016-01-07 13:40               ` Peter Zijlstra
2016-01-07 16:26                 ` Paul E. McKenney
2016-01-07 16:36                   ` Eric Dumazet
2016-01-07 16:46                     ` Paul E. McKenney
2015-12-08 16:22 ` use-after-free in __perf_install_in_context Peter Zijlstra
2015-12-08 18:57   ` Ingo Molnar
2015-12-09  9:05     ` Peter Zijlstra
2015-12-08 16:27 ` Peter Zijlstra
2015-12-08 16:50   ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).