All of lore.kernel.org
 help / color / mirror / Atom feed
* KASAN: use-after-free Write in _free_event
@ 2018-07-09  5:52 syzbot
  2018-07-09 10:11 ` Alexander Shishkin
  2018-07-10  2:16 ` KASAN: use-after-free Write in _free_event syzbot
  0 siblings, 2 replies; 16+ messages in thread
From: syzbot @ 2018-07-09  5:52 UTC (permalink / raw)
  To: acme, alexander.shishkin, jolsa, linux-kernel, mingo, namhyung,
	peterz, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    b2d44d145d2a Merge tag '4.18-rc3-smb3fixes' of git://git.s..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=178cf50c400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=2ca6c7a31d407f86
dashboard link: https://syzkaller.appspot.com/bug?extid=a24c397a29ad22d86c98
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in atomic_dec_and_test  
include/asm-generic/atomic-instrumented.h:222 [inline]
BUG: KASAN: use-after-free in put_task_struct include/linux/sched/task.h:95  
[inline]
BUG: KASAN: use-after-free in _free_event+0x48d/0x1440  
kernel/events/core.c:4451
Write of size 4 at addr ffff8801999ee2a0 by task syz-executor2/19286

CPU: 1 PID: 19286 Comm: syz-executor2 Not tainted 4.18.0-rc3+ #136
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
  check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
  kasan_check_write+0x14/0x20 mm/kasan/kasan.c:278
  atomic_dec_and_test include/asm-generic/atomic-instrumented.h:222 [inline]
  put_task_struct include/linux/sched/task.h:95 [inline]
  _free_event+0x48d/0x1440 kernel/events/core.c:4451
  free_event+0xb4/0x180 kernel/events/core.c:4472
  perf_event_release_kernel+0x7d5/0x1050 kernel/events/core.c:4633
  perf_release+0x37/0x50 kernel/events/core.c:4647
  __fput+0x355/0x8b0 fs/file_table.c:209
  ____fput+0x15/0x20 fs/file_table.c:243
  task_work_run+0x1ec/0x2a0 kernel/task_work.c:113
  exit_task_work include/linux/task_work.h:22 [inline]
  do_exit+0x1b08/0x2750 kernel/exit.c:865
  do_group_exit+0x177/0x440 kernel/exit.c:968
  get_signal+0x88e/0x1970 kernel/signal.c:2468
  do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816
  exit_to_usermode_loop+0x2e0/0x370 arch/x86/entry/common.c:162
  prepare_exit_to_usermode+0x342/0x3b0 arch/x86/entry/common.c:197
  retint_user+0x8/0x18
RIP: 0033:          (null)
Code: Bad RIP value.
RSP: 002b:0000000020000048 EFLAGS: 00010217
RAX: 0000000000000000 RBX: 00007f62934726d4 RCX: 0000000000455e29
RDX: 0000000020000100 RSI: 0000000020000040 RDI: 0000000000000000
RBP: 000000000072bea0 R08: 0000000020000080 R09: 0000000000000000
R10: 0000000020000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004bbc2c R14: 00000000004c8f48 R15: 0000000000000000

Allocated by task 19280:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
  kmem_cache_alloc_node+0x144/0x780 mm/slab.c:3644
  alloc_task_struct_node kernel/fork.c:157 [inline]
  dup_task_struct kernel/fork.c:779 [inline]
  copy_process.part.39+0x16b5/0x7220 kernel/fork.c:1641
  copy_process kernel/fork.c:1616 [inline]
  _do_fork+0x291/0x12a0 kernel/fork.c:2099
  __do_sys_clone kernel/fork.c:2206 [inline]
  __se_sys_clone kernel/fork.c:2200 [inline]
  __x64_sys_clone+0xbf/0x150 kernel/fork.c:2200
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 19280:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
  free_task_struct kernel/fork.c:162 [inline]
  free_task+0x16e/0x1f0 kernel/fork.c:390
  copy_process.part.39+0x15c9/0x7220 kernel/fork.c:2034
  copy_process kernel/fork.c:1616 [inline]
  _do_fork+0x291/0x12a0 kernel/fork.c:2099
  __do_sys_clone kernel/fork.c:2206 [inline]
  __se_sys_clone kernel/fork.c:2200 [inline]
  __x64_sys_clone+0xbf/0x150 kernel/fork.c:2200
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801999ee280
  which belongs to the cache task_struct(81:syz2) of size 5952
The buggy address is located 32 bytes inside of
  5952-byte region [ffff8801999ee280, ffff8801999ef9c0)
The buggy address belongs to the page:
page:ffffea0006667b80 count:1 mapcount:0 mapping:ffff8801b0bb6300 index:0x0  
compound_mapcount: 0
flags: 0x2fffc0000008100(slab|head)
raw: 02fffc0000008100 ffffea00071a0888 ffff8801ab65ce48 ffff8801b0bb6300
raw: 0000000000000000 ffff8801999ee280 0000000100000001 ffff8801acaa4000
page dumped because: kasan: bad access detected
page->mem_cgroup:ffff8801acaa4000

Memory state around the buggy address:
  ffff8801999ee180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff8801999ee200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff8801999ee280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                ^
  ffff8801999ee300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801999ee380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Write in _free_event
  2018-07-09  5:52 KASAN: use-after-free Write in _free_event syzbot
@ 2018-07-09 10:11 ` Alexander Shishkin
  2018-07-09 11:02   ` Dmitry Vyukov
  2019-02-16 10:43   ` Xie XiuQi
  2018-07-10  2:16 ` KASAN: use-after-free Write in _free_event syzbot
  1 sibling, 2 replies; 16+ messages in thread
From: Alexander Shishkin @ 2018-07-09 10:11 UTC (permalink / raw)
  To: syzbot, acme, jolsa, linux-kernel, mingo, namhyung, peterz,
	syzkaller-bugs

syzbot <syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com> writes:

> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    b2d44d145d2a Merge tag '4.18-rc3-smb3fixes' of git://git.s..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=178cf50c400000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=2ca6c7a31d407f86
> dashboard link: https://syzkaller.appspot.com/bug?extid=a24c397a29ad22d86c98
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.

Is there a chance of getting a reproducer for this one?

Thanks,
--
Alex

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Write in _free_event
  2018-07-09 10:11 ` Alexander Shishkin
@ 2018-07-09 11:02   ` Dmitry Vyukov
  2019-02-16 10:43   ` Xie XiuQi
  1 sibling, 0 replies; 16+ messages in thread
From: Dmitry Vyukov @ 2018-07-09 11:02 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: syzbot, Arnaldo Carvalho de Melo, jolsa, LKML, Ingo Molnar,
	Namhyung Kim, Peter Zijlstra, syzkaller-bugs

On Mon, Jul 9, 2018 at 12:11 PM, Alexander Shishkin
<alexander.shishkin@linux.intel.com> wrote:
> syzbot <syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com> writes:
>
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    b2d44d145d2a Merge tag '4.18-rc3-smb3fixes' of git://git.s..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=178cf50c400000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=2ca6c7a31d407f86
>> dashboard link: https://syzkaller.appspot.com/bug?extid=a24c397a29ad22d86c98
>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>
> Is there a chance of getting a reproducer for this one?

There is a chance. Frequently syzbot finds a reproducer after some
time. What's the chance in this case I don't know. Since it happened
only once so far, probably not too high.

But there seems to be a good hint in the KASAN report: task was freed
right inside of fork/copy_process, probably some error happened, but
it seems to have been registered in some global list already and
perf_release discovered it there. Does it make sense? Was it
registered? What should have been prevented the task alive at the time
of access?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Write in _free_event
  2018-07-09  5:52 KASAN: use-after-free Write in _free_event syzbot
  2018-07-09 10:11 ` Alexander Shishkin
@ 2018-07-10  2:16 ` syzbot
  1 sibling, 0 replies; 16+ messages in thread
From: syzbot @ 2018-07-10  2:16 UTC (permalink / raw)
  To: acme, alexander.shishkin, dvyukov, jolsa, linux-kernel, mingo,
	namhyung, peterz, syzkaller-bugs

syzbot has found a reproducer for the following crash on:

HEAD commit:    092150a25cb7 Merge branch 'for-linus' of git://git.kernel...
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16627cc2400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=25856fac4e580aa7
dashboard link: https://syzkaller.appspot.com/bug?extid=a24c397a29ad22d86c98
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=14d45afc400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16dd4968400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in atomic_dec_and_test  
include/asm-generic/atomic-instrumented.h:222 [inline]
BUG: KASAN: use-after-free in put_task_struct include/linux/sched/task.h:95  
[inline]
BUG: KASAN: use-after-free in _free_event+0x48d/0x1440  
kernel/events/core.c:4451
Write of size 4 at addr ffff8801ab6360e0 by task syz-executor205/5237

CPU: 1 PID: 5237 Comm: syz-executor205 Not tainted 4.18.0-rc4+ #139
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
  check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
  kasan_check_write+0x14/0x20 mm/kasan/kasan.c:278
  atomic_dec_and_test include/asm-generic/atomic-instrumented.h:222 [inline]
  put_task_struct include/linux/sched/task.h:95 [inline]
  _free_event+0x48d/0x1440 kernel/events/core.c:4451
  free_event+0xb4/0x180 kernel/events/core.c:4472
  perf_event_release_kernel+0x7d5/0x1050 kernel/events/core.c:4633
  perf_release+0x37/0x50 kernel/events/core.c:4647
  __fput+0x355/0x8b0 fs/file_table.c:209
  ____fput+0x15/0x20 fs/file_table.c:243
  task_work_run+0x1ec/0x2a0 kernel/task_work.c:113
  exit_task_work include/linux/task_work.h:22 [inline]
  do_exit+0x1b08/0x2750 kernel/exit.c:865
  do_group_exit+0x177/0x440 kernel/exit.c:968
  get_signal+0x88e/0x1970 kernel/signal.c:2468
  do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816
  exit_to_usermode_loop+0x2e0/0x370 arch/x86/entry/common.c:162
  prepare_exit_to_usermode+0x342/0x3b0 arch/x86/entry/common.c:197
  retint_user+0x8/0x18
RIP: 0033:          (null)
Code: Bad RIP value.
RSP: 002b:0000000020000048 EFLAGS: 00010217
RAX: 0000000000000000 RBX: 00000000006dbc24 RCX: 0000000000445d89
RDX: 0000000020000100 RSI: 0000000020000040 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000020000080 R09: 0000000000000000
R10: 0000000020000000 R11: 0000000000000246 R12: 00000000006dbc20
R13: 0030656c69662f2e R14: 0000004000008912 R15: 0000000000000006

Allocated by task 5236:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
  kmem_cache_alloc_node+0x144/0x780 mm/slab.c:3644
  alloc_task_struct_node kernel/fork.c:157 [inline]
  dup_task_struct kernel/fork.c:779 [inline]
  copy_process.part.39+0x16b5/0x7220 kernel/fork.c:1641
  copy_process kernel/fork.c:1616 [inline]
  _do_fork+0x291/0x12a0 kernel/fork.c:2099
  __do_sys_clone kernel/fork.c:2206 [inline]
  __se_sys_clone kernel/fork.c:2200 [inline]
  __x64_sys_clone+0xbf/0x150 kernel/fork.c:2200
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 5236:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
  free_task_struct kernel/fork.c:162 [inline]
  free_task+0x16e/0x1f0 kernel/fork.c:390
  copy_process.part.39+0x15c9/0x7220 kernel/fork.c:2034
  copy_process kernel/fork.c:1616 [inline]
  _do_fork+0x291/0x12a0 kernel/fork.c:2099
  __do_sys_clone kernel/fork.c:2206 [inline]
  __se_sys_clone kernel/fork.c:2200 [inline]
  __x64_sys_clone+0xbf/0x150 kernel/fork.c:2200
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801ab6360c0
  which belongs to the cache task_struct of size 5952
The buggy address is located 32 bytes inside of
  5952-byte region [ffff8801ab6360c0, ffff8801ab637800)
The buggy address belongs to the page:
page:ffffea0006ad8d80 count:1 mapcount:0 mapping:ffff8801da94b200 index:0x0  
compound_mapcount: 0
flags: 0x2fffc0000008100(slab|head)
raw: 02fffc0000008100 ffffea0006ad9008 ffffea00075f5888 ffff8801da94b200
raw: 0000000000000000 ffff8801ab6360c0 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801ab635f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff8801ab636000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff8801ab636080: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                        ^
  ffff8801ab636100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801ab636180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Write in _free_event
  2018-07-09 10:11 ` Alexander Shishkin
  2018-07-09 11:02   ` Dmitry Vyukov
@ 2019-02-16 10:43   ` Xie XiuQi
  2019-02-28 14:01     ` [RFC PATCH] perf: Paper over the hw.target problems Alexander Shishkin
  1 sibling, 1 reply; 16+ messages in thread
From: Xie XiuQi @ 2019-02-16 10:43 UTC (permalink / raw)
  To: Alexander Shishkin, syzbot, acme, jolsa, linux-kernel, mingo,
	namhyung, peterz, syzkaller-bugs
  Cc: Cheng Jian, Li Bin, wangxuefeng, Xie XiuQi

[-- Attachment #1: Type: text/plain, Size: 5085 bytes --]

Hi everyone,

I meet this bug two times, does anyone has idea to fix this issue ? (See attachments)

==================================================================
BUG: KASAN: use-after-free in atomic_dec_and_test include/asm-generic/atomic-instrumented.h:259 [inline]
BUG: KASAN: use-after-free in put_task_struct include/linux/sched/task.h:95 [inline]
BUG: KASAN: use-after-free in _free_event+0x2c4/0xfe0 kernel/events/core.c:4452
Write of size 4 at addr ffff8883780cd0a0 by task syz-executor0/9553

CPU: 3 PID: 9553 Comm: syz-executor0 Not tainted 4.19.18-514.55.6.9.x86_64+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xca/0x13e lib/dump_stack.c:113
 print_address_description+0x79/0x330 mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report+0x18a/0x2e0 mm/kasan/report.c:412
 atomic_dec_and_test include/asm-generic/atomic-instrumented.h:259 [inline]
 put_task_struct include/linux/sched/task.h:95 [inline]
 _free_event+0x2c4/0xfe0 kernel/events/core.c:4452
 free_event+0x41/0xa0 kernel/events/core.c:4473
 perf_event_release_kernel+0x43b/0xc70 kernel/events/core.c:4634
 perf_release+0x33/0x40 kernel/events/core.c:4648
 __fput+0x27f/0x7f0 fs/file_table.c:278
 task_work_run+0x136/0x1b0 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x850/0x2b90 kernel/exit.c:867
 do_group_exit+0x106/0x2f0 kernel/exit.c:970
 get_signal+0x62b/0x1740 kernel/signal.c:2513
 do_signal+0x94/0x16a0 arch/x86/kernel/signal.c:816
 exit_to_usermode_loop+0x108/0x1d0 arch/x86/entry/common.c:162
 prepare_exit_to_usermode+0x27b/0x2f0 arch/x86/entry/common.c:197
 retint_user+0x8/0x18
RIP: 0033:0x4021b7
Code: Bad RIP value.
RSP: 002b:00007fa7cb40cb30 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 000000000000000b RCX: 0000000000462589
RDX: 00007fa7cb40cb40 RSI: ffffffffffffffb8 RDI: 000000000000000b
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00000000004bc4cd R14: 00000000006f4c70 R15: 00000000ffffffff

Allocated by task 9552:
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
 slab_post_alloc_hook mm/slab.h:444 [inline]
 slab_alloc_node mm/slub.c:2706 [inline]
 kmem_cache_alloc_node+0xe3/0x2e0 mm/slub.c:2742
 alloc_task_struct_node kernel/fork.c:157 [inline]
 dup_task_struct kernel/fork.c:802 [inline]
 copy_process.part.26+0x1a98/0x6c70 kernel/fork.c:1707
 copy_process kernel/fork.c:1664 [inline]
 _do_fork+0x1c6/0xe60 kernel/fork.c:2175
 do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 9552:
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
 slab_free_hook mm/slub.c:1371 [inline]
 slab_free_freelist_hook mm/slub.c:1398 [inline]
 slab_free mm/slub.c:2953 [inline]
 kmem_cache_free+0xbd/0x320 mm/slub.c:2969
 copy_process.part.26+0x18d9/0x6c70 kernel/fork.c:2107
 copy_process kernel/fork.c:1664 [inline]
 _do_fork+0x1c6/0xe60 kernel/fork.c:2175
 do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8883780cd080
 which belongs to the cache task_struct of size 9992
The buggy address is located 32 bytes inside of
 9992-byte region [ffff8883780cd080, ffff8883780cf788)
The buggy address belongs to the page:
page:ffffea000de03200 count:1 mapcount:0 mapping:ffff888107d07a00 index:0x0 compound_mapcount: 0
flags: 0x2fffff80008100(slab|head)
raw: 002fffff80008100 dead000000000100 dead000000000200 ffff888107d07a00
raw: 0000000000000000 0000000080030003 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8883780ccf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8883780cd000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8883780cd080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                               ^
 ffff8883780cd100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8883780cd180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


On 2018/7/9 18:11, Alexander Shishkin wrote:
> syzbot <syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com> writes:
> 
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    b2d44d145d2a Merge tag '4.18-rc3-smb3fixes' of git://git.s..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=178cf50c400000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=2ca6c7a31d407f86
>> dashboard link: https://syzkaller.appspot.com/bug?extid=a24c397a29ad22d86c98
>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
> 
> Is there a chance of getting a reproducer for this one?
> 
> Thanks,
> --
> Alex
> 
> .
> 

-- 
Thanks,
Xie XiuQi

[-- Attachment #2: errorlog-1.txt --]
[-- Type: text/plain, Size: 4468 bytes --]

==================================================================
BUG: KASAN: use-after-free in atomic_dec_and_test include/asm-generic/atomic-instrumented.h:259 [inline]
BUG: KASAN: use-after-free in put_task_struct include/linux/sched/task.h:95 [inline]
BUG: KASAN: use-after-free in _free_event+0x2c4/0xfe0 kernel/events/core.c:4452
Write of size 4 at addr ffff8883d8d80020 by task syz-executor0/28620

CPU: 1 PID: 28620 Comm: syz-executor0 Not tainted 4.19.18-514.55.6.9.x86_64+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xca/0x13e lib/dump_stack.c:113
 print_address_description+0x79/0x330 mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report+0x18a/0x2e0 mm/kasan/report.c:412
 atomic_dec_and_test include/asm-generic/atomic-instrumented.h:259 [inline]
 put_task_struct include/linux/sched/task.h:95 [inline]
 _free_event+0x2c4/0xfe0 kernel/events/core.c:4452
 free_event+0x41/0xa0 kernel/events/core.c:4473
 perf_event_release_kernel+0x43b/0xc70 kernel/events/core.c:4634
 perf_release+0x33/0x40 kernel/events/core.c:4648
 __fput+0x27f/0x7f0 fs/file_table.c:278
 task_work_run+0x136/0x1b0 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:193 [inline]
 exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166
 prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
 do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4115f7
Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 3f f3 c3 0f 1f 44 00 00 53 89 fb 48 83 ec 10 e8 f4 fb ff ff 89 df 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 89 d7 89 44 24 0c e8 36 fc ff ff 8b 44 24
RSP: 002b:00007ffe24d0ba80 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 00000000004115f7
RDX: 0000000000000000 RSI: 0000000000730f30 RDI: 0000000000000005
RBP: 00007ffe24d0baec R08: 00007ffe24d0b9b0 R09: 00007ffe24d0b9b0
R10: 00007ffe24d0b9c0 R11: 0000000000000293 R12: 0000000000000001
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000

Allocated by task 28624:
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
 slab_post_alloc_hook mm/slab.h:444 [inline]
 slab_alloc_node mm/slub.c:2706 [inline]
 kmem_cache_alloc_node+0xe3/0x2e0 mm/slub.c:2742
 alloc_task_struct_node kernel/fork.c:157 [inline]
 dup_task_struct kernel/fork.c:802 [inline]
 copy_process.part.26+0x1a98/0x6c70 kernel/fork.c:1707
 copy_process kernel/fork.c:1664 [inline]
 _do_fork+0x1c6/0xe60 kernel/fork.c:2175
 do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 28624:
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
 slab_free_hook mm/slub.c:1371 [inline]
 slab_free_freelist_hook mm/slub.c:1398 [inline]
 slab_free mm/slub.c:2953 [inline]
 kmem_cache_free+0xbd/0x320 mm/slub.c:2969
 copy_process.part.26+0x18d9/0x6c70 kernel/fork.c:2107
 copy_process kernel/fork.c:1664 [inline]
 _do_fork+0x1c6/0xe60 kernel/fork.c:2175
 do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8883d8d80000
 which belongs to the cache task_struct of size 9992
The buggy address is located 32 bytes inside of
 9992-byte region [ffff8883d8d80000, ffff8883d8d82708)
The buggy address belongs to the page:
page:ffffea000f636000 count:1 mapcount:0 mapping:ffff888107d07a00 index:0x0 compound_mapcount: 0
flags: 0x2fffff80008100(slab|head)
raw: 002fffff80008100 0000000000000000 0000000200000001 ffff888107d07a00
raw: 0000000000000000 0000000080030003 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8883d8d7ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff8883d8d7ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff8883d8d80000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                               ^
 ffff8883d8d80080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8883d8d80100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

[-- Attachment #3: errorlog-2.txt --]
[-- Type: text/plain, Size: 4299 bytes --]

==================================================================
BUG: KASAN: use-after-free in atomic_dec_and_test include/asm-generic/atomic-instrumented.h:259 [inline]
BUG: KASAN: use-after-free in put_task_struct include/linux/sched/task.h:95 [inline]
BUG: KASAN: use-after-free in _free_event+0x2c4/0xfe0 kernel/events/core.c:4452
Write of size 4 at addr ffff8883780cd0a0 by task syz-executor0/9553

CPU: 3 PID: 9553 Comm: syz-executor0 Not tainted 4.19.18-514.55.6.9.x86_64+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xca/0x13e lib/dump_stack.c:113
 print_address_description+0x79/0x330 mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report+0x18a/0x2e0 mm/kasan/report.c:412
 atomic_dec_and_test include/asm-generic/atomic-instrumented.h:259 [inline]
 put_task_struct include/linux/sched/task.h:95 [inline]
 _free_event+0x2c4/0xfe0 kernel/events/core.c:4452
 free_event+0x41/0xa0 kernel/events/core.c:4473
 perf_event_release_kernel+0x43b/0xc70 kernel/events/core.c:4634
 perf_release+0x33/0x40 kernel/events/core.c:4648
 __fput+0x27f/0x7f0 fs/file_table.c:278
 task_work_run+0x136/0x1b0 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x850/0x2b90 kernel/exit.c:867
 do_group_exit+0x106/0x2f0 kernel/exit.c:970
 get_signal+0x62b/0x1740 kernel/signal.c:2513
 do_signal+0x94/0x16a0 arch/x86/kernel/signal.c:816
 exit_to_usermode_loop+0x108/0x1d0 arch/x86/entry/common.c:162
 prepare_exit_to_usermode+0x27b/0x2f0 arch/x86/entry/common.c:197
 retint_user+0x8/0x18
RIP: 0033:0x4021b7
Code: Bad RIP value.
RSP: 002b:00007fa7cb40cb30 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 000000000000000b RCX: 0000000000462589
RDX: 00007fa7cb40cb40 RSI: ffffffffffffffb8 RDI: 000000000000000b
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00000000004bc4cd R14: 00000000006f4c70 R15: 00000000ffffffff

Allocated by task 9552:
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
 slab_post_alloc_hook mm/slab.h:444 [inline]
 slab_alloc_node mm/slub.c:2706 [inline]
 kmem_cache_alloc_node+0xe3/0x2e0 mm/slub.c:2742
 alloc_task_struct_node kernel/fork.c:157 [inline]
 dup_task_struct kernel/fork.c:802 [inline]
 copy_process.part.26+0x1a98/0x6c70 kernel/fork.c:1707
 copy_process kernel/fork.c:1664 [inline]
 _do_fork+0x1c6/0xe60 kernel/fork.c:2175
 do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 9552:
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
 slab_free_hook mm/slub.c:1371 [inline]
 slab_free_freelist_hook mm/slub.c:1398 [inline]
 slab_free mm/slub.c:2953 [inline]
 kmem_cache_free+0xbd/0x320 mm/slub.c:2969
 copy_process.part.26+0x18d9/0x6c70 kernel/fork.c:2107
 copy_process kernel/fork.c:1664 [inline]
 _do_fork+0x1c6/0xe60 kernel/fork.c:2175
 do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8883780cd080
 which belongs to the cache task_struct of size 9992
The buggy address is located 32 bytes inside of
 9992-byte region [ffff8883780cd080, ffff8883780cf788)
The buggy address belongs to the page:
page:ffffea000de03200 count:1 mapcount:0 mapping:ffff888107d07a00 index:0x0 compound_mapcount: 0
flags: 0x2fffff80008100(slab|head)
raw: 002fffff80008100 dead000000000100 dead000000000200 ffff888107d07a00
raw: 0000000000000000 0000000080030003 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8883780ccf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8883780cd000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8883780cd080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                               ^
 ffff8883780cd100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8883780cd180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC PATCH] perf: Paper over the hw.target problems
  2019-02-16 10:43   ` Xie XiuQi
@ 2019-02-28 14:01     ` Alexander Shishkin
       [not found]       ` <c174549c-d169-7773-2f47-5863ba0b8056@huawei.com>
  2019-03-08 15:54       ` Mark Rutland
  0 siblings, 2 replies; 16+ messages in thread
From: Alexander Shishkin @ 2019-02-28 14:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, jolsa,
	dvyukov, namhyung, xiexiuqi, Alexander Shishkin,
	syzbot+a24c397a29ad22d86c98

First, we have a race between perf_event_release_kernel() and
perf_free_event(), which happens when parent's event is released while the
child's fork fails (because of a fatal signal, for example), that looks
like this:

cpu X                            cpu Y
-----                            -----
                                 copy_process() error path
perf_release(parent)             +->perf_event_free_task()
+-> lock(child_ctx->mutex)       |  |
+-> remove_from_context(child)   |  |
+-> unlock(child_ctx->mutex)     |  |
|                                |  +-> lock(child_ctx->mutex)
|                                |  +-> unlock(child_ctx->mutex)
|                                +-> free_task(child_task)
+-> put_task_struct(child_task)

Technically, we're still holding a reference to the task via
parent->hw.target, that's not stopping free_task(), so we end up poking at
free'd memory, as is pointed out by KASAN in the syzkaller report (see Link
below). The straightforward fix is to drop the hw.target reference while
the task is still around.

Therein lies the second problem: the users of hw.target (uprobe) assume
that it's around at ->destroy() callback time, where they use it for
context. So, in order to not break the uprobe teardown and avoid leaking
stuff, we need to call ->destroy() at the same time.

This patch fixes the race and the subsequent fallout by doing both these
things at remove_from_context time.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://syzkaller.appspot.com/bug?extid=a24c397a29ad22d86c98
Reported-by: syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com
---
 kernel/events/core.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 36b8320590e8..640695d114f8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2105,6 +2105,27 @@ static void perf_remove_from_context(struct perf_event *event, unsigned long fla
 
 	event_function_call(event, __perf_remove_from_context, (void *)flags);
 
+	/*
+	 * This is as passable as any hw.target handling out there;
+	 * hw.target implies task context, therefore, no migration.
+	 * Which means that we can only get here at the teardown.
+	 */
+	if (event->hw.target) {
+		/*
+		 * Now, the problem with, say uprobes, is that they
+		 * use hw.target for context in their ->destroy()
+		 * callbacks. Supposedly, they may need to poke at
+		 * its contents, so better call it while we still
+		 * have the task.
+		 */
+		if (event->destroy) {
+			event->destroy(event);
+			event->destroy = NULL;
+		}
+		put_task_struct(event->hw.target);
+		event->hw.target = NULL;
+	}
+
 	/*
 	 * The above event_function_call() can NO-OP when it hits
 	 * TASK_TOMBSTONE. In that case we must already have been detached
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] perf: Paper over the hw.target problems
       [not found]       ` <c174549c-d169-7773-2f47-5863ba0b8056@huawei.com>
@ 2019-03-08 12:38         ` Alexander Shishkin
  2019-03-11 13:32           ` chengjian (D)
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Shishkin @ 2019-03-08 12:38 UTC (permalink / raw)
  To: Xie XiuQi, Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, jolsa,
	dvyukov, namhyung, syzbot+a24c397a29ad22d86c98, Cheng Jian

Xie XiuQi <xiexiuqi@huawei.com> writes:

> From: Cheng Jian <cj.chengjian@huawei.com>
>
> Hi Alexander,

Hi Cheng Jian,

> I have tested this patch and merged into 4.19 stable branch,
> syzkaller reported some new issues(WANRING), the C program
> generated by syzkaller can inevitably reproduce the issue
> (only in patched kernel).
>
> I put the syzkaller log and the manual test log in the attachment.

Thanks a lot for taking the time and providing all the details. Please
find an updated version below, which should not exhibit that warning.

From 1576f66b4311fe81107c3c39c38060178bc89457 Mon Sep 17 00:00:00 2001
From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Date: Thu, 28 Feb 2019 15:24:04 +0200
Subject: [RFC PATCH v1] perf: Paper over the hw.target problems

First, we have a race between perf_event_release_kernel() and
perf_free_event(), which happens when parent's event is released while the
child's fork fails (because of a fatal signal, for example), that looks
like this:

cpu X                            cpu Y
-----                            -----
                                 copy_process() error path
perf_release(parent)             +->perf_event_free_task()
+-> lock(child_ctx->mutex)       |  |
+-> remove_from_context(child)   |  |
+-> unlock(child_ctx->mutex)     |  |
|                                |  +-> lock(child_ctx->mutex)
|                                |  +-> unlock(child_ctx->mutex)
|                                +-> free_task(child_task)
+-> put_task_struct(child_task)

Technically, we're still holding a reference to the task via
parent->hw.target, that's not stopping free_task(), so we end up poking at
free'd memory, as is pointed out by KASAN in the syzkaller report (see Link
below). The straightforward fix is to drop the hw.target reference while
the task is still around.

Therein lies the second problem: the users of hw.target (uprobe) assume
that it's around at ->destroy() callback time, where they use it for
context. So, in order to not break the uprobe teardown and avoid leaking
stuff, we need to call ->destroy() at the same time.

This patch fixes the race and the subsequent fallout by doing both these
things at remove_from_context time.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://syzkaller.appspot.com/bug?extid=a24c397a29ad22d86c98
---
 kernel/events/core.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6797f0db0299..83469aae2774 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2106,6 +2106,28 @@ static void perf_remove_from_context(struct perf_event *event, unsigned long fla
 
 	event_function_call(event, __perf_remove_from_context, (void *)flags);
 
+	/*
+	 * This is as passable as any hw.target handling out there;
+	 * hw.target implies task context, therefore, no migration.
+	 * Which together with DETACH_GROUP means that this is the
+	 * final remove_from_context of a task event.
+	 */
+	if (event->hw.target && (flags & DETACH_GROUP)) {
+		/*
+		 * Now, the problem with, say uprobes, is that they
+		 * use hw.target for context in their ->destroy()
+		 * callbacks. Supposedly, they may need to poke at
+		 * its contents, so better call it while we still
+		 * have the task.
+		 */
+		if (event->destroy) {
+			event->destroy(event);
+			event->destroy = NULL;
+		}
+		put_task_struct(event->hw.target);
+		event->hw.target = NULL;
+	}
+
 	/*
 	 * The above event_function_call() can NO-OP when it hits
 	 * TASK_TOMBSTONE. In that case we must already have been detached
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] perf: Paper over the hw.target problems
  2019-02-28 14:01     ` [RFC PATCH] perf: Paper over the hw.target problems Alexander Shishkin
       [not found]       ` <c174549c-d169-7773-2f47-5863ba0b8056@huawei.com>
@ 2019-03-08 15:54       ` Mark Rutland
  2019-06-24 12:19         ` Peter Zijlstra
  1 sibling, 1 reply; 16+ messages in thread
From: Mark Rutland @ 2019-03-08 15:54 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, jolsa, dvyukov, namhyung, xiexiuqi,
	syzbot+a24c397a29ad22d86c98

Hi Alex,

On Thu, Feb 28, 2019 at 04:01:09PM +0200, Alexander Shishkin wrote:
> First, we have a race between perf_event_release_kernel() and
> perf_free_event(), which happens when parent's event is released while the
> child's fork fails (because of a fatal signal, for example), that looks
> like this:
> 
> cpu X                            cpu Y
> -----                            -----
>                                  copy_process() error path
> perf_release(parent)             +->perf_event_free_task()
> +-> lock(child_ctx->mutex)       |  |
> +-> remove_from_context(child)   |  |
> +-> unlock(child_ctx->mutex)     |  |
> |                                |  +-> lock(child_ctx->mutex)
> |                                |  +-> unlock(child_ctx->mutex)
> |                                +-> free_task(child_task)
> +-> put_task_struct(child_task)
> 
> Technically, we're still holding a reference to the task via
> parent->hw.target, that's not stopping free_task(), so we end up poking at
> free'd memory, as is pointed out by KASAN in the syzkaller report (see Link
> below). The straightforward fix is to drop the hw.target reference while
> the task is still around.

I've recently started hitting this on arm64, and had come to the same
conclusion.

> Therein lies the second problem: the users of hw.target (uprobe) assume
> that it's around at ->destroy() callback time, where they use it for
> context. So, in order to not break the uprobe teardown and avoid leaking
> stuff, we need to call ->destroy() at the same time.

I had not spotted that case. That's rather horrid. :/

> This patch fixes the race and the subsequent fallout by doing both these
> things at remove_from_context time.
> 
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Link: https://syzkaller.appspot.com/bug?extid=a24c397a29ad22d86c98
> Reported-by: syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com
> ---
>  kernel/events/core.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 36b8320590e8..640695d114f8 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2105,6 +2105,27 @@ static void perf_remove_from_context(struct perf_event *event, unsigned long fla
>  
>  	event_function_call(event, __perf_remove_from_context, (void *)flags);
>  
> +	/*
> +	 * This is as passable as any hw.target handling out there;
> +	 * hw.target implies task context, therefore, no migration.
> +	 * Which means that we can only get here at the teardown.
> +	 */
> +	if (event->hw.target) {
> +		/*
> +		 * Now, the problem with, say uprobes, is that they
> +		 * use hw.target for context in their ->destroy()
> +		 * callbacks. Supposedly, they may need to poke at
> +		 * its contents, so better call it while we still
> +		 * have the task.
> +		 */
> +		if (event->destroy) {
> +			event->destroy(event);
> +			event->destroy = NULL;
> +		}
> +		put_task_struct(event->hw.target);
> +		event->hw.target = NULL;
> +	}

We also use perf_remove_from_context() in perf_event_open() when we move
events from a SW context to a HW context, so we can't destroy the event
here.

I think we need something more like the below (untested), but I fear
that it's not safe to call perf_event::destroy() in this context.

Thanks,
Mark.

---->8----
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 26d6edab051a..b32f2cac5563 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4532,6 +4532,24 @@ static void put_event(struct perf_event *event)
        _free_event(event);
 }
 
+void perf_event_detach_target(struct perf_event *event)
+{
+       if (!event->hw.target)
+               return;
+
+       /*
+        * The uprobes perf_event::destroy() callback needs the target, so call
+        * that while the target is still valid.
+        */
+       if (event->destroy) {
+               event->destroy(event);
+               event->destroy = NULL;
+       }
+
+       put_task_struct(event->hw.target);
+       event->hw.target = NULL;
+}
+
 /*
  * Kill an event dead; while event:refcount will preserve the event
  * object, it will not preserve its functionality. Once the last 'user'
@@ -4559,6 +4577,7 @@ int perf_event_release_kernel(struct perf_event *event)
        ctx = perf_event_ctx_lock(event);
        WARN_ON_ONCE(ctx->parent_ctx);
        perf_remove_from_context(event, DETACH_GROUP);
+       perf_event_detach_target(event);
 
        raw_spin_lock_irq(&ctx->lock);
        /*
@@ -4614,6 +4633,7 @@ int perf_event_release_kernel(struct perf_event *event)
                                               struct perf_event, child_list);
                if (tmp == child) {
                        perf_remove_from_context(child, DETACH_GROUP);
+                       perf_event_detach_target(child);
                        list_move(&child->child_list, &free_list);
                        /*
                         * This matches the refcount bump in inherit_event();
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] perf: Paper over the hw.target problems
  2019-03-08 12:38         ` Alexander Shishkin
@ 2019-03-11 13:32           ` chengjian (D)
  0 siblings, 0 replies; 16+ messages in thread
From: chengjian (D) @ 2019-03-11 13:32 UTC (permalink / raw)
  To: Alexander Shishkin, Xie XiuQi, Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, jolsa,
	dvyukov, namhyung, syzbot+a24c397a29ad22d86c98, chengjian (D),
	Li Bin

Hi Alexander,


I have tested the new version, and put it in

the syzkaller workspace, it works well now.

Tested-by: Cheng Jian <cj.chengjian@huawei.com>


> Thanks a lot for taking the time and providing all the details. Please
> find an updated version below, which should not exhibit that warning.


Thanks.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] perf: Paper over the hw.target problems
  2019-03-08 15:54       ` Mark Rutland
@ 2019-06-24 12:19         ` Peter Zijlstra
  2019-06-25  8:49           ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2019-06-24 12:19 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Shishkin, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, jolsa, dvyukov, namhyung, xiexiuqi,
	syzbot+a24c397a29ad22d86c98

On Fri, Mar 08, 2019 at 03:54:29PM +0000, Mark Rutland wrote:
> Hi Alex,
> 
> On Thu, Feb 28, 2019 at 04:01:09PM +0200, Alexander Shishkin wrote:
> > First, we have a race between perf_event_release_kernel() and
> > perf_free_event(), which happens when parent's event is released while the
> > child's fork fails (because of a fatal signal, for example), that looks
> > like this:
> > 
> > cpu X                            cpu Y
> > -----                            -----
> >                                  copy_process() error path
> > perf_release(parent)             +->perf_event_free_task()
> > +-> lock(child_ctx->mutex)       |  |
> > +-> remove_from_context(child)   |  |
> > +-> unlock(child_ctx->mutex)     |  |
> > |                                |  +-> lock(child_ctx->mutex)
> > |                                |  +-> unlock(child_ctx->mutex)
> > |                                +-> free_task(child_task)
> > +-> put_task_struct(child_task)

I had a wee bit of bother following that, it's this, right?

	close()						clone()

							  copy_process()
							    perf_event_init_task()
							      perf_event_init_context()
							        mutex_lock(parent_ctx->mutex)
								inherit_task_group()
								  inherit_group()
								    inherit_event()
								      mutex_lock(event->child_mutex)
								      // expose event on child list
								      list_add_tail()
								      mutex_unlock(event->child_mutex)
							        mutex_unlock(parent_ctx->mutex)

							    ...
							    goto bad_fork_*

							  bad_fork_cleanup_perf:
							    perf_event_free_task()

	  perf_release()
	    perf_event_release_kernel()
	      list_for_each_entry()
		mutex_lock(ctx->mutex)
		mutex_lock(event->child_mutex)
		// event is from the failing inherit
		// on the other CPU
		perf_remove_from_context()
		list_move()
		mutex_unlock(event->child_mutex)
		mutex_unlock(ctx->mutex)

							      mutex_lock(ctx->mutex)
							      list_for_each_entry_safe()
							        // event already stolen
							      mutex_unlock(ctx->mutex)

							    delayed_free_task()
							      free_task()

	     list_for_each_entry_safe()
	       list_del()
	       free_event()
	         _free_event()
		   // and so event->hw.target
		   // is the already freed failed clone()
		   if (event->hw.target)
		     put_task_struct(event->hw.target)
		       // WHOOPSIE, already quite dead


Which puts the lie to the the comment on perf_event_free_task();
'unexposed, unused context' my ass.

Which is a 'fun' confluence of fail; copy_process() doing an
unconditional free_task() and not respecting refcounts, and perf having
creative locking. In particular:

  82d94856fa22 ("perf/core: Fix lock inversion between perf,trace,cpuhp")

seems to have overlooked this 'fun' parade.

> > Technically, we're still holding a reference to the task via
> > parent->hw.target, that's not stopping free_task(), so we end up poking at
> > free'd memory, as is pointed out by KASAN in the syzkaller report (see Link
> > below). The straightforward fix is to drop the hw.target reference while
> > the task is still around.

Right.

> > Therein lies the second problem: the users of hw.target (uprobe) assume
> > that it's around at ->destroy() callback time, where they use it for
> > context. So, in order to not break the uprobe teardown and avoid leaking
> > stuff, we need to call ->destroy() at the same time.
> 
> I had not spotted that case. That's rather horrid. :/

Such joy..

> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 36b8320590e8..640695d114f8 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -2105,6 +2105,27 @@ static void perf_remove_from_context(struct perf_event *event, unsigned long fla
> >  
> >  	event_function_call(event, __perf_remove_from_context, (void *)flags);
> >  
> > +	/*
> > +	 * This is as passable as any hw.target handling out there;
> > +	 * hw.target implies task context, therefore, no migration.
> > +	 * Which means that we can only get here at the teardown.
> > +	 */
> > +	if (event->hw.target) {
> > +		/*
> > +		 * Now, the problem with, say uprobes, is that they
> > +		 * use hw.target for context in their ->destroy()
> > +		 * callbacks. Supposedly, they may need to poke at
> > +		 * its contents, so better call it while we still
> > +		 * have the task.
> > +		 */
> > +		if (event->destroy) {
> > +			event->destroy(event);
> > +			event->destroy = NULL;
> > +		}
> > +		put_task_struct(event->hw.target);
> > +		event->hw.target = NULL;
> > +	}
> 
> We also use perf_remove_from_context() in perf_event_open() when we move
> events from a SW context to a HW context, so we can't destroy the event
> here.

Also perf_pmu_migrate_context(), and yes, we must not call ->destroy()
from remove_context, or rather, not unconditional. We could make it
conditional on DETACH_GROUP or better add another DETACH_ flags.

> I think we need something more like the below (untested), but I fear
> that it's not safe to call perf_event::destroy() in this context.

> ---->8----
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 26d6edab051a..b32f2cac5563 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4532,6 +4532,24 @@ static void put_event(struct perf_event *event)
>         _free_event(event);
>  }
>  
> +void perf_event_detach_target(struct perf_event *event)
> +{
> +       if (!event->hw.target)
> +               return;
> +
> +       /*
> +        * The uprobes perf_event::destroy() callback needs the target, so call
> +        * that while the target is still valid.
> +        */
> +       if (event->destroy) {
> +               event->destroy(event);
> +               event->destroy = NULL;
> +       }
> +
> +       put_task_struct(event->hw.target);
> +       event->hw.target = NULL;
> +}
> +
>  /*
>   * Kill an event dead; while event:refcount will preserve the event
>   * object, it will not preserve its functionality. Once the last 'user'
> @@ -4559,6 +4577,7 @@ int perf_event_release_kernel(struct perf_event *event)
>         ctx = perf_event_ctx_lock(event);
>         WARN_ON_ONCE(ctx->parent_ctx);
>         perf_remove_from_context(event, DETACH_GROUP);
> +       perf_event_detach_target(event);
>  
>         raw_spin_lock_irq(&ctx->lock);
>         /*
> @@ -4614,6 +4633,7 @@ int perf_event_release_kernel(struct perf_event *event)
>                                                struct perf_event, child_list);
>                 if (tmp == child) {
>                         perf_remove_from_context(child, DETACH_GROUP);
> +                       perf_event_detach_target(child);
>                         list_move(&child->child_list, &free_list);
>                         /*
>                          * This matches the refcount bump in inherit_event();

And doesn't this re-introduce the problem we fixed with 82d94856fa22 ?

By doing ->destroy() while holding ctx->mutex, we re-establish that
#5->#0 link and close the cycle again.

More thinking required...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] perf: Paper over the hw.target problems
  2019-06-24 12:19         ` Peter Zijlstra
@ 2019-06-25  8:49           ` Peter Zijlstra
  2019-06-25 10:43             ` [PATCH] perf: Fix race between close() and fork() Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2019-06-25  8:49 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Shishkin, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, jolsa, dvyukov, namhyung, xiexiuqi,
	syzbot+a24c397a29ad22d86c98

On Mon, Jun 24, 2019 at 02:19:02PM +0200, Peter Zijlstra wrote:
> 
> 	close()						clone()
> 
> 							  copy_process()
> 							    perf_event_init_task()
> 							      perf_event_init_context()
> 							        mutex_lock(parent_ctx->mutex)
> 								inherit_task_group()
> 								  inherit_group()
> 								    inherit_event()
> 								      mutex_lock(event->child_mutex)
> 								      // expose event on child list
> 								      list_add_tail()
> 								      mutex_unlock(event->child_mutex)
> 							        mutex_unlock(parent_ctx->mutex)
> 
> 							    ...
> 							    goto bad_fork_*
> 
> 							  bad_fork_cleanup_perf:
> 							    perf_event_free_task()
> 
> 	  perf_release()
> 	    perf_event_release_kernel()
> 	      list_for_each_entry()
> 		mutex_lock(ctx->mutex)
> 		mutex_lock(event->child_mutex)
> 		// event is from the failing inherit
> 		// on the other CPU
> 		perf_remove_from_context()
> 		list_move()
> 		mutex_unlock(event->child_mutex)
> 		mutex_unlock(ctx->mutex)
> 
> 							      mutex_lock(ctx->mutex)
> 							      list_for_each_entry_safe()
> 							        // event already stolen
> 							      mutex_unlock(ctx->mutex)
> 
> 							    delayed_free_task()
> 							      free_task()
> 
> 	     list_for_each_entry_safe()
> 	       list_del()
> 	       free_event()
> 	         _free_event()
> 		   // and so event->hw.target
> 		   // is the already freed failed clone()
> 		   if (event->hw.target)
> 		     put_task_struct(event->hw.target)
> 		       // WHOOPSIE, already quite dead
> 
> 
> Which puts the lie to the the comment on perf_event_free_task();
> 'unexposed, unused context' my ass.
> 
> Which is a 'fun' confluence of fail; copy_process() doing an
> unconditional free_task() and not respecting refcounts, and perf having
> creative locking. In particular:
> 
>   82d94856fa22 ("perf/core: Fix lock inversion between perf,trace,cpuhp")
> 
> seems to have overlooked this 'fun' parade.

The below seems to cure things; it uses the fact that detached events
still have a reference count on their context.

So perf_event_free_task() can detect when (child) events have gotten
stolen and wait for it.

The below (which includes a debug printk) confirms the reproducer
triggered the problem by printing a 2 (where a 1 is expected).

Thoughts?

---
 kernel/events/core.c | 34 ++++++++++++++++++++++++++++++----
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 10c1dba9068c..e19d036125d1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4463,12 +4463,16 @@ static void _free_event(struct perf_event *event)
 	if (event->destroy)
 		event->destroy(event);
 
-	if (event->ctx)
-		put_ctx(event->ctx);
-
+	/*
+	 * Must be after ->destroy(), due to uprobe_perf_close() using
+	 * hw.target, but before put_ctx() because of perf_event_free_task().
+	 */
 	if (event->hw.target)
 		put_task_struct(event->hw.target);
 
+	if (event->ctx)
+		put_ctx(event->ctx);
+
 	exclusive_event_destroy(event);
 	module_put(event->pmu->module);
 
@@ -4648,10 +4652,20 @@ int perf_event_release_kernel(struct perf_event *event)
 	mutex_unlock(&event->child_mutex);
 
 	list_for_each_entry_safe(child, tmp, &free_list, child_list) {
+		void *var = &child->ctx->refcount;
+
 		list_del(&child->child_list);
 		free_event(child);
+
+		/*
+		 * Wake any perf_event_free_task() waiting for this event to be
+		 * freed.
+		 */
+		smp_mb(); /* pairs with wait_var_event() */
+		wake_up_var(var);
 	}
 
+
 no_ctx:
 	put_event(event); /* Must be the 'last' reference */
 	return 0;
@@ -11546,7 +11560,19 @@ void perf_event_free_task(struct task_struct *task)
 			perf_free_event(event, ctx);
 
 		mutex_unlock(&ctx->mutex);
-		put_ctx(ctx);
+
+		/*
+		 * perf_event_release_kernel() could've stolen some of our
+		 * child events and still have them on its free_list. In that
+		 * case it can do free_event() after the failing copy_process()
+		 * has already freed the task, and get a UaF on
+		 * event->hw.target.
+		 *
+		 * Wait for all events to drop their context reference.
+		 */
+		printk("%d\n", refcount_read(&ctx->refcount));
+		wait_var_event(&ctx->refcount, refcount_read(&ctx->refcount) == 1);
+		put_ctx(ctx); /* must be last */
 	}
 }
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] perf: Fix race between close() and fork()
  2019-06-25  8:49           ` Peter Zijlstra
@ 2019-06-25 10:43             ` Peter Zijlstra
  2019-06-25 12:20               ` Alexander Shishkin
  2019-06-28 16:50               ` Mark Rutland
  0 siblings, 2 replies; 16+ messages in thread
From: Peter Zijlstra @ 2019-06-25 10:43 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Shishkin, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, jolsa, dvyukov, namhyung, xiexiuqi,
	syzbot+a24c397a29ad22d86c98


Syzcaller reported the following Use-after-Free issue:

	close()						clone()

							  copy_process()
							    perf_event_init_task()
							      perf_event_init_context()
							        mutex_lock(parent_ctx->mutex)
								inherit_task_group()
								  inherit_group()
								    inherit_event()
								      mutex_lock(event->child_mutex)
								      // expose event on child list
								      list_add_tail()
								      mutex_unlock(event->child_mutex)
							        mutex_unlock(parent_ctx->mutex)

							    ...
							    goto bad_fork_*

							  bad_fork_cleanup_perf:
							    perf_event_free_task()

	  perf_release()
	    perf_event_release_kernel()
	      list_for_each_entry()
		mutex_lock(ctx->mutex)
		mutex_lock(event->child_mutex)
		// event is from the failing inherit
		// on the other CPU
		perf_remove_from_context()
		list_move()
		mutex_unlock(event->child_mutex)
		mutex_unlock(ctx->mutex)

							      mutex_lock(ctx->mutex)
							      list_for_each_entry_safe()
							        // event already stolen
							      mutex_unlock(ctx->mutex)

							    delayed_free_task()
							      free_task()

	     list_for_each_entry_safe()
	       list_del()
	       free_event()
	         _free_event()
		   // and so event->hw.target
		   // is the already freed failed clone()
		   if (event->hw.target)
		     put_task_struct(event->hw.target)
		       // WHOOPSIE, already quite dead


Which puts the lie to the the comment on perf_event_free_task():
'unexposed, unused context' not so much.

Which is a 'fun' confluence of fail; copy_process() doing an
unconditional free_task() and not respecting refcounts, and perf having
creative locking. In particular:

  82d94856fa22 ("perf/core: Fix lock inversion between perf,trace,cpuhp")

seems to have overlooked this 'fun' parade.

Solve it by using the fact that detached events still have a reference
count on their (previous) context. With this perf_event_free_task()
can detect when events have escaped and wait for their destruction.

Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: 82d94856fa22 ("perf/core: Fix lock inversion between perf,trace,cpuhp")
Reported-by: syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com
Debugged-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c | 49 +++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 41 insertions(+), 8 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 10c1dba9068c..5302c19e9892 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4463,12 +4463,20 @@ static void _free_event(struct perf_event *event)
 	if (event->destroy)
 		event->destroy(event);
 
-	if (event->ctx)
-		put_ctx(event->ctx);
-
+	/*
+	 * Must be after ->destroy(), due to uprobe_perf_close() using
+	 * hw.target.
+	 */
 	if (event->hw.target)
 		put_task_struct(event->hw.target);
 
+	/*
+	 * perf_event_free_task() relies on put_ctx() being 'last', in particular
+	 * all task references must be cleaned up.
+	 */
+	if (event->ctx)
+		put_ctx(event->ctx);
+
 	exclusive_event_destroy(event);
 	module_put(event->pmu->module);
 
@@ -4648,8 +4656,17 @@ int perf_event_release_kernel(struct perf_event *event)
 	mutex_unlock(&event->child_mutex);
 
 	list_for_each_entry_safe(child, tmp, &free_list, child_list) {
+		void *var = &child->ctx->refcount;
+
 		list_del(&child->child_list);
 		free_event(child);
+
+		/*
+		 * Wake any perf_event_free_task() waiting for this event to be
+		 * freed.
+		 */
+		smp_mb(); /* pairs with wait_var_event() */
+		wake_up_var(var);
 	}
 
 no_ctx:
@@ -11512,11 +11529,11 @@ static void perf_free_event(struct perf_event *event,
 }
 
 /*
- * Free an unexposed, unused context as created by inheritance by
- * perf_event_init_task below, used by fork() in case of fail.
+ * Free a context as created by inheritance by perf_event_init_task() below,
+ * used by fork() in case of fail.
  *
- * Not all locks are strictly required, but take them anyway to be nice and
- * help out with the lockdep assertions.
+ * Even though the task has never lived, the context and events have been
+ * exposed through the child_list, so we must take care tearing it all down.
  */
 void perf_event_free_task(struct task_struct *task)
 {
@@ -11546,7 +11563,23 @@ void perf_event_free_task(struct task_struct *task)
 			perf_free_event(event, ctx);
 
 		mutex_unlock(&ctx->mutex);
-		put_ctx(ctx);
+
+		/*
+		 * perf_event_release_kernel() could've stolen some of our
+		 * child events and still have them on its free_list. In that
+		 * case we must wait for these events to have been freed (in
+		 * particular all their references to this task must've been
+		 * dropped).
+		 *
+		 * Without this copy_process() will unconditionally free this
+		 * task (irrespective of its reference count) and
+		 * _free_event()'s put_task_struct(event->hw.target) will be a
+		 * use-after-free.
+		 *
+		 * Wait for all events to drop their context reference.
+		 */
+		wait_var_event(&ctx->refcount, refcount_read(&ctx->refcount) == 1);
+		put_ctx(ctx); /* must be last */
 	}
 }
 


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] perf: Fix race between close() and fork()
  2019-06-25 10:43             ` [PATCH] perf: Fix race between close() and fork() Peter Zijlstra
@ 2019-06-25 12:20               ` Alexander Shishkin
  2019-06-28 16:50               ` Mark Rutland
  1 sibling, 0 replies; 16+ messages in thread
From: Alexander Shishkin @ 2019-06-25 12:20 UTC (permalink / raw)
  To: Peter Zijlstra, Mark Rutland
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, jolsa,
	dvyukov, namhyung, xiexiuqi, syzbot+a24c397a29ad22d86c98,
	alexander.shishkin

Peter Zijlstra <peterz@infradead.org> writes:

> Solve it by using the fact that detached events still have a reference
> count on their (previous) context. With this perf_event_free_task()
> can detect when events have escaped and wait for their destruction.
>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Fixes: 82d94856fa22 ("perf/core: Fix lock inversion between perf,trace,cpuhp")
> Reported-by: syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com
> Debugged-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] perf: Fix race between close() and fork()
  2019-06-25 10:43             ` [PATCH] perf: Fix race between close() and fork() Peter Zijlstra
  2019-06-25 12:20               ` Alexander Shishkin
@ 2019-06-28 16:50               ` Mark Rutland
  2019-06-28 20:46                 ` Peter Zijlstra
  1 sibling, 1 reply; 16+ messages in thread
From: Mark Rutland @ 2019-06-28 16:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexander Shishkin, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, jolsa, dvyukov, namhyung, xiexiuqi,
	syzbot+a24c397a29ad22d86c98

On Tue, Jun 25, 2019 at 12:43:20PM +0200, Peter Zijlstra wrote:
> 
> Syzcaller reported the following Use-after-Free issue:
> 
> 	close()						clone()
> 
> 							  copy_process()
> 							    perf_event_init_task()
> 							      perf_event_init_context()
> 							        mutex_lock(parent_ctx->mutex)
> 								inherit_task_group()
> 								  inherit_group()
> 								    inherit_event()
> 								      mutex_lock(event->child_mutex)
> 								      // expose event on child list
> 								      list_add_tail()
> 								      mutex_unlock(event->child_mutex)
> 							        mutex_unlock(parent_ctx->mutex)
> 
> 							    ...
> 							    goto bad_fork_*
> 
> 							  bad_fork_cleanup_perf:
> 							    perf_event_free_task()
> 
> 	  perf_release()
> 	    perf_event_release_kernel()
> 	      list_for_each_entry()
> 		mutex_lock(ctx->mutex)
> 		mutex_lock(event->child_mutex)
> 		// event is from the failing inherit
> 		// on the other CPU
> 		perf_remove_from_context()
> 		list_move()
> 		mutex_unlock(event->child_mutex)
> 		mutex_unlock(ctx->mutex)
> 
> 							      mutex_lock(ctx->mutex)
> 							      list_for_each_entry_safe()
> 							        // event already stolen
> 							      mutex_unlock(ctx->mutex)
> 
> 							    delayed_free_task()
> 							      free_task()
> 
> 	     list_for_each_entry_safe()
> 	       list_del()
> 	       free_event()
> 	         _free_event()
> 		   // and so event->hw.target
> 		   // is the already freed failed clone()
> 		   if (event->hw.target)
> 		     put_task_struct(event->hw.target)
> 		       // WHOOPSIE, already quite dead
> 
> 
> Which puts the lie to the the comment on perf_event_free_task():
> 'unexposed, unused context' not so much.
> 
> Which is a 'fun' confluence of fail; copy_process() doing an
> unconditional free_task() and not respecting refcounts, and perf having
> creative locking. In particular:
> 
>   82d94856fa22 ("perf/core: Fix lock inversion between perf,trace,cpuhp")
> 
> seems to have overlooked this 'fun' parade.
> 
> Solve it by using the fact that detached events still have a reference
> count on their (previous) context. With this perf_event_free_task()
> can detect when events have escaped and wait for their destruction.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Fixes: 82d94856fa22 ("perf/core: Fix lock inversion between perf,trace,cpuhp")
> Reported-by: syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com
> Debugged-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/events/core.c | 49 +++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 41 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 10c1dba9068c..5302c19e9892 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4463,12 +4463,20 @@ static void _free_event(struct perf_event *event)
>  	if (event->destroy)
>  		event->destroy(event);
>  
> -	if (event->ctx)
> -		put_ctx(event->ctx);
> -
> +	/*
> +	 * Must be after ->destroy(), due to uprobe_perf_close() using
> +	 * hw.target.
> +	 */
>  	if (event->hw.target)
>  		put_task_struct(event->hw.target);
>  
> +	/*
> +	 * perf_event_free_task() relies on put_ctx() being 'last', in particular
> +	 * all task references must be cleaned up.
> +	 */
> +	if (event->ctx)
> +		put_ctx(event->ctx);
> +
>  	exclusive_event_destroy(event);
>  	module_put(event->pmu->module);
>  
> @@ -4648,8 +4656,17 @@ int perf_event_release_kernel(struct perf_event *event)
>  	mutex_unlock(&event->child_mutex);
>  
>  	list_for_each_entry_safe(child, tmp, &free_list, child_list) {
> +		void *var = &child->ctx->refcount;
> +
>  		list_del(&child->child_list);
>  		free_event(child);
> +
> +		/*
> +		 * Wake any perf_event_free_task() waiting for this event to be
> +		 * freed.
> +		 */
> +		smp_mb(); /* pairs with wait_var_event() */
> +		wake_up_var(var);

Huh, so wake_up_var() doesn't imply a RELEASE?

As an aside, doesn't that mean all callers of wake_up_var() have to do
likewise to ensure it isn't re-ordered with whatever prior stuff they're
trying to notify waiters about? Several do an smp_store_release() then a
wake_up_var(), but IIUC the wake_up_var() could get pulled before that
release...

I'm likely missing a subtlety there (I guess in practice the
implementation of wake_up_var prevents that), or maybe I have more
vocabulary than I have a clue. ;)

Other than that tangent, this looks sane to me, so FWIW:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Ideally, we'd throw some CPU hours at testing this. I can drop this in
my fuzzing queue next week, but I don't have a reproducer that I can
soak-test this with. :(

Thanks,
Mark.

>  	}
>  
>  no_ctx:
> @@ -11512,11 +11529,11 @@ static void perf_free_event(struct perf_event *event,
>  }
>  
>  /*
> - * Free an unexposed, unused context as created by inheritance by
> - * perf_event_init_task below, used by fork() in case of fail.
> + * Free a context as created by inheritance by perf_event_init_task() below,
> + * used by fork() in case of fail.
>   *
> - * Not all locks are strictly required, but take them anyway to be nice and
> - * help out with the lockdep assertions.
> + * Even though the task has never lived, the context and events have been
> + * exposed through the child_list, so we must take care tearing it all down.
>   */
>  void perf_event_free_task(struct task_struct *task)
>  {
> @@ -11546,7 +11563,23 @@ void perf_event_free_task(struct task_struct *task)
>  			perf_free_event(event, ctx);
>  
>  		mutex_unlock(&ctx->mutex);
> -		put_ctx(ctx);
> +
> +		/*
> +		 * perf_event_release_kernel() could've stolen some of our
> +		 * child events and still have them on its free_list. In that
> +		 * case we must wait for these events to have been freed (in
> +		 * particular all their references to this task must've been
> +		 * dropped).
> +		 *
> +		 * Without this copy_process() will unconditionally free this
> +		 * task (irrespective of its reference count) and
> +		 * _free_event()'s put_task_struct(event->hw.target) will be a
> +		 * use-after-free.
> +		 *
> +		 * Wait for all events to drop their context reference.
> +		 */
> +		wait_var_event(&ctx->refcount, refcount_read(&ctx->refcount) == 1);
> +		put_ctx(ctx); /* must be last */
>  	}
>  }
>  
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] perf: Fix race between close() and fork()
  2019-06-28 16:50               ` Mark Rutland
@ 2019-06-28 20:46                 ` Peter Zijlstra
  2019-07-01  9:24                   ` Mark Rutland
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2019-06-28 20:46 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Shishkin, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, jolsa, dvyukov, namhyung, xiexiuqi,
	syzbot+a24c397a29ad22d86c98

On Fri, Jun 28, 2019 at 05:50:03PM +0100, Mark Rutland wrote:
> > +		/*
> > +		 * Wake any perf_event_free_task() waiting for this event to be
> > +		 * freed.
> > +		 */
> > +		smp_mb(); /* pairs with wait_var_event() */
> > +		wake_up_var(var);
> 
> Huh, so wake_up_var() doesn't imply a RELEASE?
> 
> As an aside, doesn't that mean all callers of wake_up_var() have to do
> likewise to ensure it isn't re-ordered with whatever prior stuff they're
> trying to notify waiters about? Several do an smp_store_release() then a
> wake_up_var(), but IIUC the wake_up_var() could get pulled before that
> release...

Yah,...

  https://lkml.kernel.org/r/20190624165012.GH3436@hirez.programming.kicks-ass.net

I needs to get back to that.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] perf: Fix race between close() and fork()
  2019-06-28 20:46                 ` Peter Zijlstra
@ 2019-07-01  9:24                   ` Mark Rutland
  0 siblings, 0 replies; 16+ messages in thread
From: Mark Rutland @ 2019-07-01  9:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexander Shishkin, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, jolsa, dvyukov, namhyung, xiexiuqi,
	syzbot+a24c397a29ad22d86c98

On Fri, Jun 28, 2019 at 10:46:08PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 28, 2019 at 05:50:03PM +0100, Mark Rutland wrote:
> > > +		/*
> > > +		 * Wake any perf_event_free_task() waiting for this event to be
> > > +		 * freed.
> > > +		 */
> > > +		smp_mb(); /* pairs with wait_var_event() */
> > > +		wake_up_var(var);
> > 
> > Huh, so wake_up_var() doesn't imply a RELEASE?
> > 
> > As an aside, doesn't that mean all callers of wake_up_var() have to do
> > likewise to ensure it isn't re-ordered with whatever prior stuff they're
> > trying to notify waiters about? Several do an smp_store_release() then a
> > wake_up_var(), but IIUC the wake_up_var() could get pulled before that
> > release...
> 
> Yah,...
> 
>   https://lkml.kernel.org/r/20190624165012.GH3436@hirez.programming.kicks-ass.net

That gets me a 404, so I'll assume that's a speculative store. ;)

> I needs to get back to that.

Ouch; sorry for reminding you of this mess!

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-07-01  9:24 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-09  5:52 KASAN: use-after-free Write in _free_event syzbot
2018-07-09 10:11 ` Alexander Shishkin
2018-07-09 11:02   ` Dmitry Vyukov
2019-02-16 10:43   ` Xie XiuQi
2019-02-28 14:01     ` [RFC PATCH] perf: Paper over the hw.target problems Alexander Shishkin
     [not found]       ` <c174549c-d169-7773-2f47-5863ba0b8056@huawei.com>
2019-03-08 12:38         ` Alexander Shishkin
2019-03-11 13:32           ` chengjian (D)
2019-03-08 15:54       ` Mark Rutland
2019-06-24 12:19         ` Peter Zijlstra
2019-06-25  8:49           ` Peter Zijlstra
2019-06-25 10:43             ` [PATCH] perf: Fix race between close() and fork() Peter Zijlstra
2019-06-25 12:20               ` Alexander Shishkin
2019-06-28 16:50               ` Mark Rutland
2019-06-28 20:46                 ` Peter Zijlstra
2019-07-01  9:24                   ` Mark Rutland
2018-07-10  2:16 ` KASAN: use-after-free Write in _free_event syzbot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.