Linux-Security-Module Archive on lore.kernel.org
 help / Atom feed
* KASAN: use-after-free Read in task_is_descendant
@ 2018-10-21  7:10 syzbot
  2018-10-21  7:12 ` Tetsuo Handa
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: syzbot @ 2018-10-21  7:10 UTC (permalink / raw)
  To: jmorris, keescook, linux-kernel, linux-security-module, serge,
	syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    270b77a0f30e Merge tag 'drm-fixes-2018-10-20-1' of git://a..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=116f4ad9400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=b3f55cb3dfcc6c33
dashboard link: https://syzkaller.appspot.com/bug?extid=a9ac39bf55329e206219
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+a9ac39bf55329e206219@syzkaller.appspotmail.com

binder: 12719:12721 got transaction to context manager from process owning  
it
binder: BINDER_SET_CONTEXT_MGR already set
binder: 12719:12724 ioctl 40046207 0 returned -16
binder: 12719:12721 BC_ACQUIRE_DONE u0000000000000000 no match
==================================================================
BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:188  
[inline]
BUG: KASAN: use-after-free in task_is_descendant.part.2+0x610/0x670  
security/yama/yama_lsm.c:295
Read of size 8 at addr ffff8801c4666b20 by task syz-executor3/12722

CPU: 1 PID: 12722 Comm: syz-executor3 Not tainted 4.19.0-rc8+ #70
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
  print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
  __read_once_size include/linux/compiler.h:188 [inline]
  task_is_descendant.part.2+0x610/0x670 security/yama/yama_lsm.c:295
  task_is_descendant security/yama/yama_lsm.c:282 [inline]
  yama_ptrace_access_check+0x215/0x10fc security/yama/yama_lsm.c:371
  security_ptrace_access_check+0x54/0xb0 security/security.c:260
  __ptrace_may_access+0x564/0x950 kernel/ptrace.c:331
  ptrace_attach+0x1fa/0x640 kernel/ptrace.c:379
  __do_sys_ptrace kernel/ptrace.c:1130 [inline]
  __se_sys_ptrace kernel/ptrace.c:1110 [inline]
  __x64_sys_ptrace+0x229/0x260 kernel/ptrace.c:1110
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457569
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f7df12c4c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000065
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000457569
RDX: 0000000000000000 RSI: 0000000000000265 RDI: 0000000000000010
RBP: 000000000072bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7df12c56d4
R13: 00000000004c30c5 R14: 00000000004d4ab8 R15: 00000000ffffffff

Allocated by task 8920:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
  kmem_cache_alloc_node+0x144/0x730 mm/slab.c:3644
  alloc_task_struct_node kernel/fork.c:157 [inline]
  dup_task_struct kernel/fork.c:802 [inline]
  copy_process+0x1ff4/0x8780 kernel/fork.c:1707
  _do_fork+0x1cb/0x11d0 kernel/fork.c:2166
  __do_sys_clone kernel/fork.c:2273 [inline]
  __se_sys_clone kernel/fork.c:2267 [inline]
  __x64_sys_clone+0xbf/0x150 kernel/fork.c:2267
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 18:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kmem_cache_free+0x83/0x290 mm/slab.c:3756
  free_task_struct kernel/fork.c:162 [inline]
  free_task+0x16e/0x1f0 kernel/fork.c:416
  __put_task_struct+0x2e6/0x620 kernel/fork.c:689
  put_task_struct include/linux/sched/task.h:96 [inline]
  delayed_put_task_struct+0x2ff/0x4c0 kernel/exit.c:181
  __rcu_reclaim kernel/rcu/rcu.h:236 [inline]
  rcu_do_batch kernel/rcu/tree.c:2576 [inline]
  invoke_rcu_callbacks kernel/rcu/tree.c:2880 [inline]
  __rcu_process_callbacks kernel/rcu/tree.c:2847 [inline]
  rcu_process_callbacks+0xf23/0x2670 kernel/rcu/tree.c:2864
  __do_softirq+0x30b/0xad8 kernel/softirq.c:292

The buggy address belongs to the object at ffff8801c4666640
  which belongs to the cache task_struct(65:syz3) of size 6080
The buggy address is located 1248 bytes inside of
  6080-byte region [ffff8801c4666640, ffff8801c4667e00)
The buggy address belongs to the page:
page:ffffea0007119980 count:1 mapcount:0 mapping:ffff8801c2336800 index:0x0  
compound_mapcount: 0
flags: 0x2fffc0000008100(slab|head)
raw: 02fffc0000008100 ffffea00070a0008 ffffea0006080688 ffff8801c2336800
raw: 0000000000000000 ffff8801c4666640 0000000100000001 ffff8801bdb40c00
page dumped because: kasan: bad access detected
page->mem_cgroup:ffff8801bdb40c00

Memory state around the buggy address:
  ffff8801c4666a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801c4666a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8801c4666b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                ^
  ffff8801c4666b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801c4666c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-21  7:10 KASAN: use-after-free Read in task_is_descendant syzbot
@ 2018-10-21  7:12 ` Tetsuo Handa
  2018-10-22  9:54   ` Oleg Nesterov
  2018-11-10  3:25 ` syzbot
  2018-11-10 11:46 ` syzbot
  2 siblings, 1 reply; 31+ messages in thread
From: Tetsuo Handa @ 2018-10-21  7:12 UTC (permalink / raw)
  To: serge, Oleg Nesterov
  Cc: syzbot, jmorris, keescook, linux-kernel, linux-security-module,
	syzkaller-bugs

On 2018/10/21 16:10, syzbot wrote:
> BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:188 [inline]
> BUG: KASAN: use-after-free in task_is_descendant.part.2+0x610/0x670 security/yama/yama_lsm.c:295
> Read of size 8 at addr ffff8801c4666b20 by task syz-executor3/12722
> 
> CPU: 1 PID: 12722 Comm: syz-executor3 Not tainted 4.19.0-rc8+ #70
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
>  print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
>  kasan_report_error mm/kasan/report.c:354 [inline]
>  kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
>  __read_once_size include/linux/compiler.h:188 [inline]
>  task_is_descendant.part.2+0x610/0x670 security/yama/yama_lsm.c:295

Do we need to hold

  write_lock_irq(&tasklist_lock);

rather than

  rcu_read_lock();

when accessing

  "struct task_struct"->real_parent

?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-21  7:12 ` Tetsuo Handa
@ 2018-10-22  9:54   ` Oleg Nesterov
  2018-10-22 10:06     ` Tetsuo Handa
  2018-10-25  8:19     ` Kees Cook
  0 siblings, 2 replies; 31+ messages in thread
From: Oleg Nesterov @ 2018-10-22  9:54 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: serge, syzbot, jmorris, keescook, linux-kernel,
	linux-security-module, syzkaller-bugs

On 10/21, Tetsuo Handa wrote:
>
> On 2018/10/21 16:10, syzbot wrote:
> > BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:188 [inline]
> > BUG: KASAN: use-after-free in task_is_descendant.part.2+0x610/0x670 security/yama/yama_lsm.c:295
> > Read of size 8 at addr ffff8801c4666b20 by task syz-executor3/12722
> >
> > CPU: 1 PID: 12722 Comm: syz-executor3 Not tainted 4.19.0-rc8+ #70
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:77 [inline]
> >  dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
> >  print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
> >  kasan_report_error mm/kasan/report.c:354 [inline]
> >  kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
> >  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
> >  __read_once_size include/linux/compiler.h:188 [inline]
> >  task_is_descendant.part.2+0x610/0x670 security/yama/yama_lsm.c:295
>
> Do we need to hold
>
>   write_lock_irq(&tasklist_lock);
>
> rather than
>
>   rcu_read_lock();
>
> when accessing
>
>   "struct task_struct"->real_parent

Well, if "task" is stable (can't exit), then I think

	rcu_dereference(task->real_parent)

is fine, we know that ->real_parent did not pass exit_notif() yet.

However, task_is_descendant() looks unnecessarily complicated, it could be

	static int task_is_descendant(struct task_struct *parent,
				      struct task_struct *child)
	{
		int rc = 0;
		struct task_struct *walker;

		if (!parent || !child)
			return 0;

		rcu_read_lock();
		for (walker = child; walker->pid; walker = rcu_dereference(walker->real_parent))
			if (same_thread_group(parent, walker)) {
				rc = 1;
				break;
			}
		rcu_read_unlock();

		return rc;
	}

And again, I do not know how/if yama ensures that child is rcu-protected, perhaps
task_is_descendant() needs to check pid_alive(child) right after rcu_read_lock() ?

Oleg.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-22  9:54   ` Oleg Nesterov
@ 2018-10-22 10:06     ` Tetsuo Handa
  2018-10-22 13:46       ` Oleg Nesterov
  2018-10-25  8:19     ` Kees Cook
  1 sibling, 1 reply; 31+ messages in thread
From: Tetsuo Handa @ 2018-10-22 10:06 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: serge, syzbot, jmorris, keescook, linux-kernel,
	linux-security-module, syzkaller-bugs

On 2018/10/22 18:54, Oleg Nesterov wrote:
> On 10/21, Tetsuo Handa wrote:
>>
>> On 2018/10/21 16:10, syzbot wrote:
>>> BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:188 [inline]
>>> BUG: KASAN: use-after-free in task_is_descendant.part.2+0x610/0x670 security/yama/yama_lsm.c:295
>>> Read of size 8 at addr ffff8801c4666b20 by task syz-executor3/12722
>>>
>>> CPU: 1 PID: 12722 Comm: syz-executor3 Not tainted 4.19.0-rc8+ #70
>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>>> Call Trace:
>>>  __dump_stack lib/dump_stack.c:77 [inline]
>>>  dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
>>>  print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
>>>  kasan_report_error mm/kasan/report.c:354 [inline]
>>>  kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
>>>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
>>>  __read_once_size include/linux/compiler.h:188 [inline]
>>>  task_is_descendant.part.2+0x610/0x670 security/yama/yama_lsm.c:295
>>
>> Do we need to hold
>>
>>   write_lock_irq(&tasklist_lock);
>>
>> rather than
>>
>>   rcu_read_lock();
>>
>> when accessing
>>
>>   "struct task_struct"->real_parent
> 
> Well, if "task" is stable (can't exit), then I think
> 
> 	rcu_dereference(task->real_parent)
> 
> is fine, we know that ->real_parent did not pass exit_notif() yet.

OK.

> 
> However, task_is_descendant() looks unnecessarily complicated, it could be
> 
> 	static int task_is_descendant(struct task_struct *parent,
> 				      struct task_struct *child)
> 	{
> 		int rc = 0;
> 		struct task_struct *walker;
> 
> 		if (!parent || !child)
> 			return 0;
> 
> 		rcu_read_lock();
> 		for (walker = child; walker->pid; walker = rcu_dereference(walker->real_parent))
> 			if (same_thread_group(parent, walker)) {
> 				rc = 1;
> 				break;
> 			}
> 		rcu_read_unlock();
> 
> 		return rc;
> 	}
> 
> And again, I do not know how/if yama ensures that child is rcu-protected, perhaps
> task_is_descendant() needs to check pid_alive(child) right after rcu_read_lock() ?

Since the caller (ptrace() path) called get_task_struct(child), child itself can't be
released. Do we still need pid_alive(child) ?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-22 10:06     ` Tetsuo Handa
@ 2018-10-22 13:46       ` Oleg Nesterov
  2018-10-25  2:15         ` Tetsuo Handa
  0 siblings, 1 reply; 31+ messages in thread
From: Oleg Nesterov @ 2018-10-22 13:46 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: serge, syzbot, jmorris, keescook, linux-kernel,
	linux-security-module, syzkaller-bugs

On 10/22, Tetsuo Handa wrote:
>
> > However, task_is_descendant() looks unnecessarily complicated, it could be
> >
> > 	static int task_is_descendant(struct task_struct *parent,
> > 				      struct task_struct *child)
> > 	{
> > 		int rc = 0;
> > 		struct task_struct *walker;
> >
> > 		if (!parent || !child)
> > 			return 0;
> >
> > 		rcu_read_lock();
> > 		for (walker = child; walker->pid; walker = rcu_dereference(walker->real_parent))
> > 			if (same_thread_group(parent, walker)) {
> > 				rc = 1;
> > 				break;
> > 			}
> > 		rcu_read_unlock();
> >
> > 		return rc;
> > 	}
> >
> > And again, I do not know how/if yama ensures that child is rcu-protected, perhaps
> > task_is_descendant() needs to check pid_alive(child) right after rcu_read_lock() ?
>
> Since the caller (ptrace() path) called get_task_struct(child), child itself can't be
> released. Do we still need pid_alive(child) ?

get_task_struct(child) can only ensure that this task_struct can't be freed.

Suppose that this child exits after get_task_struct(), then its real_parent exits
too and calls call_rcu(delayed_put_task_struct).

Now, when task_is_descendant() is called, rcu_read_lock() can happen after rcu gp,
iow child->parent can be already freed/reused/unmapped.

We need to ensure that child is still protected by RCU.

Oleg.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-22 13:46       ` Oleg Nesterov
@ 2018-10-25  2:15         ` Tetsuo Handa
  2018-10-25 11:13           ` Oleg Nesterov
  0 siblings, 1 reply; 31+ messages in thread
From: Tetsuo Handa @ 2018-10-25  2:15 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: serge, syzbot, jmorris, keescook, linux-kernel,
	linux-security-module, syzkaller-bugs

Oleg Nesterov wrote:
> On 10/22, Tetsuo Handa wrote:
> > > And again, I do not know how/if yama ensures that child is rcu-protected, perhaps
> > > task_is_descendant() needs to check pid_alive(child) right after rcu_read_lock() ?
> >
> > Since the caller (ptrace() path) called get_task_struct(child), child itself can't be
> > released. Do we still need pid_alive(child) ?
> 
> get_task_struct(child) can only ensure that this task_struct can't be freed.

The report says that it is a use-after-free read at

  walker = rcu_dereference(walker->real_parent);

which means that walker was already released.

> 
> Suppose that this child exits after get_task_struct(), then its real_parent exits
> too and calls call_rcu(delayed_put_task_struct).
> 
> Now, when task_is_descendant() is called, rcu_read_lock() can happen after rcu gp,
> iow child->parent can be already freed/reused/unmapped.
> 
> We need to ensure that child is still protected by RCU.

I wonder whether pid_alive() test helps.

We can get

[   40.620318] parent or walker is dead.
[   40.624146] tracee is dead.

messages using below patch and reproducer.

----------
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 99cfddd..0d9d786 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -385,6 +385,7 @@ static int ptrace_attach(struct task_struct *task, long request,
 	if (mutex_lock_interruptible(&task->signal->cred_guard_mutex))
 		goto out;
 
+	schedule_timeout_killable(HZ);
 	task_lock(task);
 	retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
 	task_unlock(task);
diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index ffda91a..a231ec6 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -283,6 +283,11 @@ static int task_is_descendant(struct task_struct *parent,
 		return 0;
 
 	rcu_read_lock();
+	if (!pid_alive(parent) || !pid_alive(walker)) {
+		rcu_read_unlock();
+		printk("parent or walker is dead.\n");
+		return 0;
+	}
 	if (!thread_group_leader(parent))
 		parent = rcu_dereference(parent->group_leader);
 	while (walker->pid > 0) {
@@ -315,6 +320,10 @@ static int ptracer_exception_found(struct task_struct *tracer,
 	bool found = false;
 
 	rcu_read_lock();
+	if (!pid_alive(tracee)) {
+		printk("tracee is dead.\n");
+		goto unlock;
+	}
 
 	/*
 	 * If there's already an active tracing relationship, then make an
----------

----------
#include <unistd.h>
#include <sys/ptrace.h>
#include <poll.h>

int main(int argc, char *argv[])
{
	if (fork() == 0) {
		ptrace(PTRACE_ATTACH, getppid(), NULL, NULL);
		_exit(0);
	}
	poll(NULL, 0, 100);
	return 0;
}
----------

But since "child" has at least one reference, reading "child->real_parent" should
be safe. Therefore, I think that bailing out due to pid_is_alive(child) == false
(like above patch does) cannot avoid this problem...


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-22  9:54   ` Oleg Nesterov
  2018-10-22 10:06     ` Tetsuo Handa
@ 2018-10-25  8:19     ` Kees Cook
  2018-10-25 11:52       ` Oleg Nesterov
  1 sibling, 1 reply; 31+ messages in thread
From: Kees Cook @ 2018-10-25  8:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tetsuo Handa, Serge E. Hallyn, syzbot, James Morris, LKML,
	linux-security-module, syzkaller-bugs

On Mon, Oct 22, 2018 at 10:54 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 10/21, Tetsuo Handa wrote:
>>
>> On 2018/10/21 16:10, syzbot wrote:
>> > BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:188 [inline]
>> > BUG: KASAN: use-after-free in task_is_descendant.part.2+0x610/0x670 security/yama/yama_lsm.c:295
>> > Read of size 8 at addr ffff8801c4666b20 by task syz-executor3/12722
>> >
>> > CPU: 1 PID: 12722 Comm: syz-executor3 Not tainted 4.19.0-rc8+ #70
>> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> > Call Trace:
>> >  __dump_stack lib/dump_stack.c:77 [inline]
>> >  dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
>> >  print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
>> >  kasan_report_error mm/kasan/report.c:354 [inline]
>> >  kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
>> >  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
>> >  __read_once_size include/linux/compiler.h:188 [inline]
>> >  task_is_descendant.part.2+0x610/0x670 security/yama/yama_lsm.c:295
>>
>> Do we need to hold
>>
>>   write_lock_irq(&tasklist_lock);
>>
>> rather than
>>
>>   rcu_read_lock();
>>
>> when accessing
>>
>>   "struct task_struct"->real_parent
>
> Well, if "task" is stable (can't exit), then I think
>
>         rcu_dereference(task->real_parent)
>
> is fine, we know that ->real_parent did not pass exit_notif() yet.
>
> However, task_is_descendant() looks unnecessarily complicated, it could be
>
>         static int task_is_descendant(struct task_struct *parent,
>                                       struct task_struct *child)
>         {
>                 int rc = 0;
>                 struct task_struct *walker;
>
>                 if (!parent || !child)
>                         return 0;
>
>                 rcu_read_lock();
>                 for (walker = child; walker->pid; walker = rcu_dereference(walker->real_parent))
>                         if (same_thread_group(parent, walker)) {
>                                 rc = 1;
>                                 break;
>                         }
>                 rcu_read_unlock();
>
>                 return rc;
>         }
>
> And again, I do not know how/if yama ensures that child is rcu-protected, perhaps
> task_is_descendant() needs to check pid_alive(child) right after rcu_read_lock() ?

task_is_descendant() is called under rcu_read_lock() in both
ptracer_exception_found() and yama_ptrace_access_check() so I don't
understand how any of the tasks could get freed? This is walking
group_leader and real_parent -- are these not stable under rcu_lock()?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-25  2:15         ` Tetsuo Handa
@ 2018-10-25 11:13           ` Oleg Nesterov
  2018-10-25 11:36             ` Kees Cook
  2018-10-25 11:47             ` Tetsuo Handa
  0 siblings, 2 replies; 31+ messages in thread
From: Oleg Nesterov @ 2018-10-25 11:13 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: serge, syzbot, jmorris, keescook, linux-kernel,
	linux-security-module, syzkaller-bugs

On 10/25, Tetsuo Handa wrote:
>
> Oleg Nesterov wrote:
> > On 10/22, Tetsuo Handa wrote:
> > > > And again, I do not know how/if yama ensures that child is rcu-protected, perhaps
> > > > task_is_descendant() needs to check pid_alive(child) right after rcu_read_lock() ?
> > >
> > > Since the caller (ptrace() path) called get_task_struct(child), child itself can't be
> > > released. Do we still need pid_alive(child) ?
> >
> > get_task_struct(child) can only ensure that this task_struct can't be freed.
>
> The report says that it is a use-after-free read at
>
>   walker = rcu_dereference(walker->real_parent);
>
> which means that walker was already released.

quite possibly I missed something, but I am not sure I understand your concerns...

So again, suppose that "child" is already dead. Its task_struct can't be freed,
but child->real_parent can point to the already freed memory.

This means that the 1st walker = rcu_dereference(walker->real_parent) is fine,
this simply reads the child->real_parent pointer, but on the second iteration

	walker = rcu_dereference(walker->real_parent);

reads the alredy freed memory.

> I wonder whether pid_alive() test helps.
>
> We can get
>
> [   40.620318] parent or walker is dead.
> [   40.624146] tracee is dead.
>
> messages using below patch and reproducer.

again, I do not understand, this all looks correct...

> ----------
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 99cfddd..0d9d786 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -385,6 +385,7 @@ static int ptrace_attach(struct task_struct *task, long request,
>  	if (mutex_lock_interruptible(&task->signal->cred_guard_mutex))
>  		goto out;
>
> +	schedule_timeout_killable(HZ);
>  	task_lock(task);
>  	retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
>  	task_unlock(task);
> diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
> index ffda91a..a231ec6 100644
> --- a/security/yama/yama_lsm.c
> +++ b/security/yama/yama_lsm.c
> @@ -283,6 +283,11 @@ static int task_is_descendant(struct task_struct *parent,
>  		return 0;
>
>  	rcu_read_lock();
> +	if (!pid_alive(parent) || !pid_alive(walker)) {
> +		rcu_read_unlock();
> +		printk("parent or walker is dead.\n");

This is what we need to do, except I think we should change yama_ptrace_access_check().
And iiuc parent == current, pid_alive(parent) looks unnecessary. Although we need to
check ptracer_exception_found(), may be it needs some changes too.

And yes, task_is_descendant() can hit the dead child, if nothing else it can
be killed. This can explain the kasan report.

> @@ -315,6 +320,10 @@ static int ptracer_exception_found(struct task_struct *tracer,
>  	bool found = false;
>
>  	rcu_read_lock();
> +	if (!pid_alive(tracee)) {
> +		printk("tracee is dead.\n");
> +		goto unlock;

Sure, this is possible too.

> But since "child" has at least one reference, reading "child->real_parent" should
> be safe. Therefore, I think that bailing out due to pid_is_alive(child) == false
> (like above patch does) cannot avoid this problem...

Why?

OK. Lets ignore ptracer_exception_found() for the moment. Why do you think the
patch below can't help?

Oleg.

--- x/security/yama/yama_lsm.c
+++ x/security/yama/yama_lsm.c
@@ -368,7 +368,8 @@ static int yama_ptrace_access_check(stru
 			break;
 		case YAMA_SCOPE_RELATIONAL:
 			rcu_read_lock();
-			if (!task_is_descendant(current, child) &&
+			if (!pid_alive(child) ||
+			    !task_is_descendant(current, child) &&
 			    !ptracer_exception_found(current, child) &&
 			    !ns_capable(__task_cred(child)->user_ns, CAP_SYS_PTRACE))
 				rc = -EPERM;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-25 11:13           ` Oleg Nesterov
@ 2018-10-25 11:36             ` Kees Cook
  2018-10-25 12:05               ` Oleg Nesterov
  2018-10-25 11:47             ` Tetsuo Handa
  1 sibling, 1 reply; 31+ messages in thread
From: Kees Cook @ 2018-10-25 11:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tetsuo Handa, Serge E. Hallyn, syzbot, James Morris, LKML,
	linux-security-module, syzkaller-bugs

On Thu, Oct 25, 2018 at 12:13 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> So again, suppose that "child" is already dead. Its task_struct can't be freed,
> but child->real_parent can point to the already freed memory.

I can't find a path for "child" to be released. I see task_lock()
always called on it before it ends up in Yama.

(Well, I can't find the lock for switch_mm(), but I assume that's safe
because it's switching to the task.)

> This means that the 1st walker = rcu_dereference(walker->real_parent) is fine,
> this simply reads the child->real_parent pointer, but on the second iteration
>
>         walker = rcu_dereference(walker->real_parent);
>
> reads the alredy freed memory.

What does rcu_read_lock() protect actually protect here? I thought
none of the task structs would be freed until after all
rcu_read_unlock() finished.

> OK. Lets ignore ptracer_exception_found() for the moment. Why do you think the
> patch below can't help?
>
> Oleg.
>
> --- x/security/yama/yama_lsm.c
> +++ x/security/yama/yama_lsm.c
> @@ -368,7 +368,8 @@ static int yama_ptrace_access_check(stru
>                         break;
>                 case YAMA_SCOPE_RELATIONAL:
>                         rcu_read_lock();
> -                       if (!task_is_descendant(current, child) &&
> +                       if (!pid_alive(child) ||
> +                           !task_is_descendant(current, child) &&
>                             !ptracer_exception_found(current, child) &&
>                             !ns_capable(__task_cred(child)->user_ns, CAP_SYS_PTRACE))
>                                 rc = -EPERM;
>

Hm, documentation there says:
 * If pid_alive fails, then pointers within the task structure
 * can be stale and must not be dereferenced.

What is the safe pattern for checking vs rcu?

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-25 11:13           ` Oleg Nesterov
  2018-10-25 11:36             ` Kees Cook
@ 2018-10-25 11:47             ` Tetsuo Handa
  2018-10-25 12:17               ` Oleg Nesterov
  1 sibling, 1 reply; 31+ messages in thread
From: Tetsuo Handa @ 2018-10-25 11:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: serge, syzbot, jmorris, keescook, linux-kernel,
	linux-security-module, syzkaller-bugs

On 2018/10/25 20:13, Oleg Nesterov wrote:
> On 10/25, Tetsuo Handa wrote:
>>
>> Oleg Nesterov wrote:
>>> On 10/22, Tetsuo Handa wrote:
>>>>> And again, I do not know how/if yama ensures that child is rcu-protected, perhaps
>>>>> task_is_descendant() needs to check pid_alive(child) right after rcu_read_lock() ?
>>>>
>>>> Since the caller (ptrace() path) called get_task_struct(child), child itself can't be
>>>> released. Do we still need pid_alive(child) ?
>>>
>>> get_task_struct(child) can only ensure that this task_struct can't be freed.
>>
>> The report says that it is a use-after-free read at
>>
>>   walker = rcu_dereference(walker->real_parent);
>>
>> which means that walker was already released.
> 
> quite possibly I missed something, but I am not sure I understand your concerns...
> 
> So again, suppose that "child" is already dead. Its task_struct can't be freed,
> but child->real_parent can point to the already freed memory.

Yes.

But if child->real_parent is pointing to the already freed memory,
why does pid_alive(child) == true help?

> 
> This means that the 1st walker = rcu_dereference(walker->real_parent) is fine,
> this simply reads the child->real_parent pointer,

Yes.

>                                                   but on the second iteration
> 
> 	walker = rcu_dereference(walker->real_parent);
> 
> reads the alredy freed memory.

Yes.

> 
>> I wonder whether pid_alive() test helps.
>>
>> We can get
>>
>> [   40.620318] parent or walker is dead.
>> [   40.624146] tracee is dead.
>>
>> messages using below patch and reproducer.
> 
> again, I do not understand, this all looks correct...
> 
>> ----------
>> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
>> index 99cfddd..0d9d786 100644
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -385,6 +385,7 @@ static int ptrace_attach(struct task_struct *task, long request,
>>  	if (mutex_lock_interruptible(&task->signal->cred_guard_mutex))
>>  		goto out;
>>
>> +	schedule_timeout_killable(HZ);
>>  	task_lock(task);
>>  	retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
>>  	task_unlock(task);
>> diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
>> index ffda91a..a231ec6 100644
>> --- a/security/yama/yama_lsm.c
>> +++ b/security/yama/yama_lsm.c
>> @@ -283,6 +283,11 @@ static int task_is_descendant(struct task_struct *parent,
>>  		return 0;
>>
>>  	rcu_read_lock();
>> +	if (!pid_alive(parent) || !pid_alive(walker)) {
>> +		rcu_read_unlock();
>> +		printk("parent or walker is dead.\n");
> 
> This is what we need to do, except I think we should change yama_ptrace_access_check().
> And iiuc parent == current, pid_alive(parent) looks unnecessary. Although we need to
> check ptracer_exception_found(), may be it needs some changes too.

There are two task_is_descendant() callers, and one of them is not passing current.

> 
> And yes, task_is_descendant() can hit the dead child, if nothing else it can
> be killed. This can explain the kasan report.

The kasan is reporting that child->real_parent (or maybe child->real_parent->real_parent
or child->real_parent->real_parent->real_parent ...) was pointing to already freed memory,
isn't it?

How can we check that that pointer is pointing to already freed memory? As soon as

  walker = rcu_dereference(walker->real_parent);

is executed, task_alive(walker) will try to read from already freed memory...


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-25  8:19     ` Kees Cook
@ 2018-10-25 11:52       ` Oleg Nesterov
  0 siblings, 0 replies; 31+ messages in thread
From: Oleg Nesterov @ 2018-10-25 11:52 UTC (permalink / raw)
  To: Kees Cook
  Cc: Tetsuo Handa, Serge E. Hallyn, syzbot, James Morris, LKML,
	linux-security-module, syzkaller-bugs

On 10/25, Kees Cook wrote:
>
> task_is_descendant() is called under rcu_read_lock() in both
> ptracer_exception_found() and yama_ptrace_access_check() so I don't
> understand how any of the tasks could get freed? This is walking
> group_leader and real_parent -- are these not stable under rcu_lock()?

group_leader/real_parent/etc are no longer rcu-protected after the exiting
child calls release_task() which in particular removes the child from
children/thread_group lists.

OK. Suppose you have an rcu-protected list, and each element also has a
reference counter so you can do something

	struct elt {
		atomic_t ctr;
		struct list_head list;
		int pid;
	};

	rcu_read_lock();
	list_for_each_entry(elt, &LIST, list) {
		if (elt->pid == 100) {
			atomic_inc(&elt->ctr);	// get_task_struct()
			break;
		}
	}
	rcu_read_unlock();

	do_something(elt);

This code is fine. This elt can't be freed, you have a reference. But once
you drop rcu lock you can't trust elt->list.next! So, for example, you can
not do

	rcu_read_lock();
	list_for_each_entry_continue_rcu(elt, &LIST, list) {
		...
	}
	rcu_read_unlock();

too late, elt.list.next can be already freed, or it can be freed while you
iterate the list.

Another simple example. Suppose you have a global PTR protected by rcu. So
ignoring the necessary rcu_dereference this code is fine:

	rcu_read_lock();
	if (ptr = PTR)
		do_something(ptr);
	rcu_read_unlcok();

But this is not:

	ptr = PTR;
	rcu_read_lock();
	if (ptr)
		do_something(ptr);
	rcu_read_unlock();

basically the same thing...

Oleg.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-25 11:36             ` Kees Cook
@ 2018-10-25 12:05               ` Oleg Nesterov
  0 siblings, 0 replies; 31+ messages in thread
From: Oleg Nesterov @ 2018-10-25 12:05 UTC (permalink / raw)
  To: Kees Cook
  Cc: Tetsuo Handa, Serge E. Hallyn, syzbot, James Morris, LKML,
	linux-security-module, syzkaller-bugs

On 10/25, Kees Cook wrote:
>
> On Thu, Oct 25, 2018 at 12:13 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> > So again, suppose that "child" is already dead. Its task_struct can't be freed,
> > but child->real_parent can point to the already freed memory.
>
> I can't find a path for "child" to be released. I see task_lock()
> always called on it before it ends up in Yama.

Are you saying that yama_ptrace_access_check() is always called under
task_lock(child) ? Yes, it seems so

So what? Say, ptrace_attach() can hit a dead task. It should notice this
and fail later, but security_ptrace_access_check() is called before that.


> > This means that the 1st walker = rcu_dereference(walker->real_parent) is fine,
> > this simply reads the child->real_parent pointer, but on the second iteration
> >
> >         walker = rcu_dereference(walker->real_parent);
> >
> > reads the alredy freed memory.
>
> What does rcu_read_lock() protect actually protect here? I thought
> none of the task structs would be freed until after all
> rcu_read_unlock() finished.

See another email I sent you a minute ago.

Oleg.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-25 11:47             ` Tetsuo Handa
@ 2018-10-25 12:17               ` Oleg Nesterov
  2018-10-25 13:01                 ` Oleg Nesterov
  2018-10-25 13:14                 ` KASAN: use-after-free Read in task_is_descendant Tetsuo Handa
  0 siblings, 2 replies; 31+ messages in thread
From: Oleg Nesterov @ 2018-10-25 12:17 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: serge, syzbot, jmorris, keescook, linux-kernel,
	linux-security-module, syzkaller-bugs

On 10/25, Tetsuo Handa wrote:
>
> On 2018/10/25 20:13, Oleg Nesterov wrote:
> >
> > So again, suppose that "child" is already dead. Its task_struct can't be freed,
> > but child->real_parent can point to the already freed memory.
>
> Yes.
>
> But if child->real_parent is pointing to the already freed memory,
> why does pid_alive(child) == true help?

Hmm. Because pid_alive(child) == true && child->real_parent is freed must not
be possible? As long as we check pid_alive() under rcu_read_lock().

> >> @@ -283,6 +283,11 @@ static int task_is_descendant(struct task_struct *parent,
> >>  		return 0;
> >>
> >>  	rcu_read_lock();
> >> +	if (!pid_alive(parent) || !pid_alive(walker)) {
> >> +		rcu_read_unlock();
> >> +		printk("parent or walker is dead.\n");
> >
> > This is what we need to do, except I think we should change yama_ptrace_access_check().
> > And iiuc parent == current, pid_alive(parent) looks unnecessary. Although we need to
> > check ptracer_exception_found(), may be it needs some changes too.
>
> There are two task_is_descendant() callers, and one of them is not passing current.

As I said below, please ignore ptracer_exception_found(), another caller for now,
perhaps it needs some changes too. I even have a vague feeling that I have already
blamed this function some time ago...

> > And yes, task_is_descendant() can hit the dead child, if nothing else it can
> > be killed. This can explain the kasan report.
>
> The kasan is reporting that child->real_parent (or maybe child->real_parent->real_parent
> or child->real_parent->real_parent->real_parent ...) was pointing to already freed memory,
> isn't it?

Yes. and you know, I am all confused. I no longer can understand you :/

> How can we check that that pointer is pointing to already freed memory? As soon as
>
>   walker = rcu_dereference(walker->real_parent);
>
> is executed, task_alive(walker) will try to read from already freed memory...

Of course we should not do it this way. The patch I sent doesn't...

Oleg.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-25 12:17               ` Oleg Nesterov
@ 2018-10-25 13:01                 ` Oleg Nesterov
  2018-10-26 16:09                   ` Kees Cook
  2018-10-25 13:14                 ` KASAN: use-after-free Read in task_is_descendant Tetsuo Handa
  1 sibling, 1 reply; 31+ messages in thread
From: Oleg Nesterov @ 2018-10-25 13:01 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: serge, syzbot, jmorris, keescook, linux-kernel,
	linux-security-module, syzkaller-bugs

On 10/25, Oleg Nesterov wrote:
>
> As I said below, please ignore ptracer_exception_found(), another caller for now,
> perhaps it needs some changes too. I even have a vague feeling that I have already
> blamed this function some time ago...

Heh, yes, 3 years ago ;)

https://lore.kernel.org/lkml/20150106184427.GA18153@redhat.com/

I can't understand my email today, but note that I tried to point out that
task_is_descendant() can dereference the freed mem.

And yes, task_is_descendant() is overcompicated for no reason, afaics.

Oleg.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-25 12:17               ` Oleg Nesterov
  2018-10-25 13:01                 ` Oleg Nesterov
@ 2018-10-25 13:14                 ` Tetsuo Handa
  2018-10-25 15:55                   ` Oleg Nesterov
  1 sibling, 1 reply; 31+ messages in thread
From: Tetsuo Handa @ 2018-10-25 13:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: serge, syzbot, jmorris, keescook, linux-kernel,
	linux-security-module, syzkaller-bugs

On 2018/10/25 21:17, Oleg Nesterov wrote:
>>> And yes, task_is_descendant() can hit the dead child, if nothing else it can
>>> be killed. This can explain the kasan report.
>>
>> The kasan is reporting that child->real_parent (or maybe child->real_parent->real_parent
>> or child->real_parent->real_parent->real_parent ...) was pointing to already freed memory,
>> isn't it?
> 
> Yes. and you know, I am all confused. I no longer can understand you :/

Why don't we need to check every time like shown below?
Why checking only once is sufficient?

--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -285,7 +285,7 @@ static int task_is_descendant(struct task_struct *parent,
 	rcu_read_lock();
 	if (!thread_group_leader(parent))
 		parent = rcu_dereference(parent->group_leader);
-	while (walker->pid > 0) {
+	while (pid_alive(walker) && walker->pid > 0) {
 		if (!thread_group_leader(walker))
 			walker = rcu_dereference(walker->group_leader);
 		if (walker == parent) {

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-25 13:14                 ` KASAN: use-after-free Read in task_is_descendant Tetsuo Handa
@ 2018-10-25 15:55                   ` Oleg Nesterov
  2018-10-25 16:25                     ` Oleg Nesterov
  2018-10-26 12:23                     ` Tetsuo Handa
  0 siblings, 2 replies; 31+ messages in thread
From: Oleg Nesterov @ 2018-10-25 15:55 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: serge, syzbot, jmorris, keescook, linux-kernel,
	linux-security-module, syzkaller-bugs

On 10/25, Tetsuo Handa wrote:
>
> On 2018/10/25 21:17, Oleg Nesterov wrote:
> >>> And yes, task_is_descendant() can hit the dead child, if nothing else it can
> >>> be killed. This can explain the kasan report.
> >>
> >> The kasan is reporting that child->real_parent (or maybe child->real_parent->real_parent
> >> or child->real_parent->real_parent->real_parent ...) was pointing to already freed memory,
> >> isn't it?
> >
> > Yes. and you know, I am all confused. I no longer can understand you :/
>
> Why don't we need to check every time like shown below?
> Why checking only once is sufficient?

Why do you think it is not sufficient?

Again, I can be easily wrong, rcu is not simple, but so far I think we need
a single check at the start.

> --- a/security/yama/yama_lsm.c
> +++ b/security/yama/yama_lsm.c
> @@ -285,7 +285,7 @@ static int task_is_descendant(struct task_struct *parent,
>  	rcu_read_lock();
>  	if (!thread_group_leader(parent))
>  		parent = rcu_dereference(parent->group_leader);
> -	while (walker->pid > 0) {
> +	while (pid_alive(walker) && walker->pid > 0) {

OK. To simplify, ets suppose that task_is_descendant() is called with tasklist
lock held. And lets suppose that all tasks are single-threaded.

Then we obviously need a single check at the start, we need to ensure that the
child was not removed from its ->real_parent->children list. The latter means
that if ->real_parent exits, the child will be re-parented and its ->real_parent
will be updated.

So we could do

	read_lock(tasklist);

	if (list_empty(child->sibling))
		// it is dead, removed from ->children list, we can't trust
		// child->real_parent
		return -EWHATEVER;

	task_is_descendant(current, child);

But note that we can safely use pid_alive(child) instead, detach_pid() and
list_del_init(&p->sibling) happen "at the same time" since we hold tasklist.

(And btw, I suggested several times to rename it, or add another helper with
 a better name. Note also that we could check, say, ->sighand != NULL with
 the same effect.)


Now. Why do you think rcu_read_lock() differs in that we need to check
pid_alive() at every step?

Suppose that one of the grand parents exits, and it is going to be freed. Again,
to (over)simplify the things, lets suppose that release_task() does

	synchronize_rcu();
	free_task(p);

at the end. Now, can

	rcu_read_lock();
	if (pid_alive(child)) {
		while (child->pid)
			child = child->real_parent;
	}
	rcu_read_unlock();

hit the already freed ->real_parent ? Say, the freed child->real_parent->real_parent.

Lets denote P1 = child->real_parent, P2 = P1->real_parent. Can P2 be already freed?

This is only possible if synchronize_rcu() above was called before rcu_read_lock(),
see the last sentence below.

If P1->real_parent is still P2, then P1 has already exited too. And we still observe
that child->real_parent == P1, this too is only possible if child has exited, so we
must see pid_alive() == F.

Why must we see pid_alive() == F without tasklist? It must be true, release_task()
is serialized by tasklist_lock, but why we can't get the stale value under
rcu_read_lock() ?

Because our rcu read-lock critical section extends beyond the return from
synchronize_rcu(), and thus we must have a full memory barrier _between_
that synchronize_rcu() and our rcu_read_lock(). We must see all memory updates,
including thread_pid = NULL which makes pid_alive() == F.

Do you see any hole?

Oleg.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-25 15:55                   ` Oleg Nesterov
@ 2018-10-25 16:25                     ` Oleg Nesterov
  2018-10-26 12:23                     ` Tetsuo Handa
  1 sibling, 0 replies; 31+ messages in thread
From: Oleg Nesterov @ 2018-10-25 16:25 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: serge, syzbot, jmorris, keescook, linux-kernel,
	linux-security-module, syzkaller-bugs

On 10/25, Oleg Nesterov wrote:
>
> Because our rcu read-lock critical section extends beyond the return from
> synchronize_rcu(), and thus we must have a full memory barrier _between_
> that synchronize_rcu() and our rcu_read_lock(). We must see all memory updates,
> including thread_pid = NULL which makes pid_alive() == F.

In case I was not clear....

Suppose we have int X = 0. If some CPU does

	X = 1;
	synchronize_rcu();

and another CPU does

	rcu_read_lock();
	x = X;
	rcu_read_unlock();

then x should be == 1 in case when rcu_read_unlock() happens _after_ return
from synchronize_rcu().

Oleg.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-25 15:55                   ` Oleg Nesterov
  2018-10-25 16:25                     ` Oleg Nesterov
@ 2018-10-26 12:23                     ` Tetsuo Handa
  2018-10-26 13:04                       ` Oleg Nesterov
  1 sibling, 1 reply; 31+ messages in thread
From: Tetsuo Handa @ 2018-10-26 12:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: serge, syzbot, jmorris, keescook, linux-kernel,
	linux-security-module, syzkaller-bugs

On 2018/10/26 0:55, Oleg Nesterov wrote:
> On 10/25, Tetsuo Handa wrote:
>>
>> On 2018/10/25 21:17, Oleg Nesterov wrote:
>>>>> And yes, task_is_descendant() can hit the dead child, if nothing else it can
>>>>> be killed. This can explain the kasan report.
>>>>
>>>> The kasan is reporting that child->real_parent (or maybe child->real_parent->real_parent
>>>> or child->real_parent->real_parent->real_parent ...) was pointing to already freed memory,
>>>> isn't it?
>>>
>>> Yes. and you know, I am all confused. I no longer can understand you :/
>>
>> Why don't we need to check every time like shown below?
>> Why checking only once is sufficient?
> 
> Why do you think it is not sufficient?
> 
> Again, I can be easily wrong, rcu is not simple, but so far I think we need
> a single check at the start.
> 

Hmm, this report is difficult to guess what happened.

Since the "child" passed to task_is_descendant() has at least one reference
count taken by find_get_task_by_vpid(), rcu_dereference(walker->real_parent)
in the first iteration

  while (child->pid > 0) {
    if (!thread_group_leader(child))
      walker = rcu_dereference(child->group_leader);
    if (walker == parent) {
      rc = 1;
      break;
    }
    walker = rcu_dereference(walker->real_parent);
  }

must not trigger use-after-free bug. Thus, when this use-after-free was
detected at rcu_dereference(walker->real_parent), the memory pointed by
"walker" must have been released between

  while (walker->pid > 0) {
    if (!thread_group_leader(walker))
      walker = rcu_dereference(walker->group_leader);

and

    walker = rcu_dereference(walker->real_parent);
  }

because otherwise use-after-free would have been reported at walker->pid
or thread_group_leader(walker) or rcu_dereference(walker->group_leader).

Is my understanding correct?



Then, what pid_alive(child) is testing? It is not memory pointed by "child" but
memory pointed by "walker" (i.e. parent of "child" or parent of parent of "child"
or ... ) which is triggering use-after-free.

Suppose p1 == p2->real_parent and p2 == p3->real_parent, and p1 exited
when p2 tried to attach on p1, p2->real_parent was pointing to already
(or about to be) freed p1.

Even if pid_alive(p2) test can guarantee that p1 won't be released,
how can pid_alive(p3) test guarantee that p1 won't be released?
p1 can be released any moment because it has already waited for RCU
grace period, can't it?


ptrace(PTRACE_ATTACH, vpid_of_p2) {
  p2 = find_get_task_by_vpid(vpid_of_p2);
  ptrace_attach(p2, PTRACE_ATTACH, addr, data) {
    mutex_lock_interruptible(&p2->signal->cred_guard_mutex);
    // p1 starts exit()ing here.
    task_lock(p2);
    __ptrace_may_access(p2) {
      // p2->real_parent starts pointing to already freed p1.
      security_ptrace_access_check(p2, PTRACE_MODE_ATTACH) {
        yama_ptrace_access_check() {
           task_is_descendant(current, p2) {
             walker = p2;
             rcu_read_lock();
             if (pid_alive(p2)) { // If true
               if (p2->pid > 0) { // will be true
                 p1 = rcu_dereference(p2->real_parent); // might be OK due to pid_alive(p2) == true?
               }
             }
             rcu_read_unlock();
           }
        }
      }
    }
    task_unlock(p2);
    mutex_unlock(&p2->signal->cred_guard_mutex);
  }
  put_task_struct(p2);
}

ptrace(PTRACE_ATTACH, vpid_of_p3) {
  p3 = find_get_task_by_vpid(vpid_of_p3);
  ptrace_attach(p3, PTRACE_ATTACH, addr, data) {
    mutex_lock_interruptible(&p3->signal->cred_guard_mutex);
    // p1 starts exit()ing here.
    task_lock(p3);
    __ptrace_may_access(p3) {
      // p2->real_parent starts pointing to already freed p1.
      security_ptrace_access_check(p3, PTRACE_MODE_ATTACH) {
        yama_ptrace_access_check() {
           task_is_descendant(current, p3) {
             walker = p3;
             rcu_read_lock();
             if (pid_alive(p3)) { // If true
               if (p3->pid > 0) { // will be true
                 p2 = rcu_dereference(p3->real_parent); // will be OK if above assumption is OK.
                 if (p2->pid > 0) { // will be true
                   p1 = rcu_dereference(p2->real_parent); // will read already (or about to be) freed p1 address
                   if (p1->pid > 0) { // Oops here or
                     if (!thread_group_leader(p1)) // oops here or
                       p1 = rcu_dereference(p1->group_leader); // oops here or
                     p0 = rcu_dereference(p1->real_parent); // oops here, or not oops because releasing after this
                   }
                 }
               }
             }
             rcu_read_unlock();
           }
        }
      }
    }
    task_unlock(p3);
    mutex_unlock(&p3->signal->cred_guard_mutex);
  }
  put_task_struct(p3);
}


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-26 12:23                     ` Tetsuo Handa
@ 2018-10-26 13:04                       ` Oleg Nesterov
  2018-10-26 13:51                         ` Tetsuo Handa
  0 siblings, 1 reply; 31+ messages in thread
From: Oleg Nesterov @ 2018-10-26 13:04 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: serge, syzbot, jmorris, keescook, linux-kernel,
	linux-security-module, syzkaller-bugs

On 10/26, Tetsuo Handa wrote:
>
> Since the "child" passed to task_is_descendant() has at least one reference
> count taken by find_get_task_by_vpid(), rcu_dereference(walker->real_parent)
> in the first iteration
>
>   while (child->pid > 0) {
>     if (!thread_group_leader(child))
>       walker = rcu_dereference(child->group_leader);
>     if (walker == parent) {
>       rc = 1;
>       break;
>     }
>     walker = rcu_dereference(walker->real_parent);
>   }
>
> must not trigger use-after-free bug.

Yes,

> Thus, when this use-after-free was
> detected at rcu_dereference(walker->real_parent), the memory pointed by
> "walker" must have been released between
>
>   while (walker->pid > 0) {
>     if (!thread_group_leader(walker))
>       walker = rcu_dereference(walker->group_leader);
>
> and
>
>     walker = rcu_dereference(walker->real_parent);
>   }

this is possible too, rcu callback which frees the task_struct can run
int between

> because otherwise use-after-free would have been reported at walker->pid
> or thread_group_leader(walker) or rcu_dereference(walker->group_leader).

Well, I do not know how much kasan is "percise", but if it is "perfect"
then you are right,

> Then, what pid_alive(child) is testing?

I tried to explain this in my previous email. I even mentioned that we could
do another check, say, ->sighand != NULL, or list_empty(child->sibling) with
the same effect.

> Suppose p1 == p2->real_parent and p2 == p3->real_parent, and p1 exited
> when p2 tried to attach on p1, p2->real_parent was pointing to already
> (or about to be) freed p1.

No, p2->real_parent will be updated. If p1 exits it will re-parent its
children including p2.

Again, did you read my previous email?

Let me repeat, of course I can be wrong. But so far I am answering the
same questions again and again...

Oleg.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-26 13:04                       ` Oleg Nesterov
@ 2018-10-26 13:51                         ` Tetsuo Handa
  2018-10-26 14:39                           ` Oleg Nesterov
  0 siblings, 1 reply; 31+ messages in thread
From: Tetsuo Handa @ 2018-10-26 13:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: serge, syzbot, jmorris, keescook, linux-kernel,
	linux-security-module, syzkaller-bugs

On 2018/10/26 22:04, Oleg Nesterov wrote:
>> Suppose p1 == p2->real_parent and p2 == p3->real_parent, and p1 exited
>> when p2 tried to attach on p1, p2->real_parent was pointing to already
>> (or about to be) freed p1.
> 
> No, p2->real_parent will be updated. If p1 exits it will re-parent its
> children including p2.

My error.

Suppose p1 == p2->real_parent and p2 == p3->real_parent, and p1 exited
when someone tried to attach on p2, p2->real_parent was pointing to already
(or about to be) freed p1.

So, the puzzle part is why p2->real_parent was still pointing p1 even after
p1 was freed...

> 
> Again, did you read my previous email?

Yes. But I still can't be convinced that pid_alive() test helps.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-26 13:51                         ` Tetsuo Handa
@ 2018-10-26 14:39                           ` Oleg Nesterov
  2018-10-26 15:04                             ` Tetsuo Handa
  0 siblings, 1 reply; 31+ messages in thread
From: Oleg Nesterov @ 2018-10-26 14:39 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: serge, syzbot, jmorris, keescook, linux-kernel,
	linux-security-module, syzkaller-bugs

On 10/26, Tetsuo Handa wrote:
>
> On 2018/10/26 22:04, Oleg Nesterov wrote:
> >> Suppose p1 == p2->real_parent and p2 == p3->real_parent, and p1 exited
> >> when p2 tried to attach on p1, p2->real_parent was pointing to already
> >> (or about to be) freed p1.
> >
> > No, p2->real_parent will be updated. If p1 exits it will re-parent its
> > children including p2.
>
> My error.
>
> Suppose p1 == p2->real_parent and p2 == p3->real_parent, and p1 exited
> when someone tried to attach on p2, p2->real_parent was pointing to already
> (or about to be) freed p1.

I don't see a difference.

If p1 exits it will re-parent p2, p2->real_parent will be updated.

> So, the puzzle part is why p2->real_parent was still pointing p1 even after
> p1 was freed...

I don't understand the question.

Once again. TASK->real_parent can point to the freed mem only if a) TASK exits,
and b) _after_ that its parent TASK->real_parent exits too.

> > Again, did you read my previous email?
>
> Yes. But I still can't be convinced that pid_alive() test helps.

Well, I don't understand which part of my explanations is not clear to you.

OR. Perhaps I am wrong and do not understand your concerns.

Oleg.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-26 14:39                           ` Oleg Nesterov
@ 2018-10-26 15:04                             ` Tetsuo Handa
  2018-10-26 15:22                               ` Oleg Nesterov
  0 siblings, 1 reply; 31+ messages in thread
From: Tetsuo Handa @ 2018-10-26 15:04 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: serge, syzbot, jmorris, keescook, linux-kernel,
	linux-security-module, syzkaller-bugs

On 2018/10/26 23:39, Oleg Nesterov wrote:
> On 10/26, Tetsuo Handa wrote:
>> Suppose p1 == p2->real_parent and p2 == p3->real_parent, and p1 exited
>> when someone tried to attach on p2, p2->real_parent was pointing to already
>> (or about to be) freed p1.
> 
> I don't see a difference.
> 
> If p1 exits it will re-parent p2, p2->real_parent will be updated.
> 
>> So, the puzzle part is why p2->real_parent was still pointing p1 even after
>> p1 was freed...
> 
> I don't understand the question.
> 
> Once again. TASK->real_parent can point to the freed mem only if a) TASK exits,
> and b) _after_ that its parent TASK->real_parent exits too.

Oh, p2 exited and then p1 also exited when someone tried to attach on p2.
Then, p2->real_parent can point to already (or about to be) freed p1.

> 
>>> Again, did you read my previous email?
>>
>> Yes. But I still can't be convinced that pid_alive() test helps.
> 
> Well, I don't understand which part of my explanations is not clear to you.

OK. Checking pid_alive() should help.

(By the way, if p->real_parent were updated to point to init_task when p exits,
we could omit pid_alive() check?)


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-26 15:04                             ` Tetsuo Handa
@ 2018-10-26 15:22                               ` Oleg Nesterov
  0 siblings, 0 replies; 31+ messages in thread
From: Oleg Nesterov @ 2018-10-26 15:22 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: serge, syzbot, jmorris, keescook, linux-kernel,
	linux-security-module, syzkaller-bugs

On 10/27, Tetsuo Handa wrote:
>
> On 2018/10/26 23:39, Oleg Nesterov wrote:
> > On 10/26, Tetsuo Handa wrote:
> >> Suppose p1 == p2->real_parent and p2 == p3->real_parent, and p1 exited
> >> when someone tried to attach on p2, p2->real_parent was pointing to already
> >> (or about to be) freed p1.
> >
> > I don't see a difference.
> >
> > If p1 exits it will re-parent p2, p2->real_parent will be updated.
> >
> >> So, the puzzle part is why p2->real_parent was still pointing p1 even after
> >> p1 was freed...
> >
> > I don't understand the question.
> >
> > Once again. TASK->real_parent can point to the freed mem only if a) TASK exits,
> > and b) _after_ that its parent TASK->real_parent exits too.
>
> Oh, p2 exited and then p1 also exited when someone tried to attach on p2.
> Then, p2->real_parent can point to already (or about to be) freed p1.

Then we must see pid_alive(p2) == F as I already explained you in my
yesterday's email.

Because if p1 exits _after_ p2, then pid_alive(p2) == F must be already
completed, p1 can't exit (I mean, release_task(p1) can't be called) until
release_task(p2) does detach_pid() and drops tasklist_lock.

> (By the way, if p->real_parent were updated to point to init_task when p exits,
> we could omit pid_alive() check?)

Sorry, I don't understand the question...

Ignoring the PR_SET_CHILD_SUBREAPER parents, ignoring the exiting sub-threads
which reparent to group leader, ->real_parent is alwasy updated to point to
init_task. But I don't see why we could omit pid_alive() check.

Oleg.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-25 13:01                 ` Oleg Nesterov
@ 2018-10-26 16:09                   ` Kees Cook
  2018-10-29 12:23                     ` Oleg Nesterov
  0 siblings, 1 reply; 31+ messages in thread
From: Kees Cook @ 2018-10-26 16:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tetsuo Handa, Serge E. Hallyn, syzbot, James Morris, LKML,
	linux-security-module, syzkaller-bugs

On Thu, Oct 25, 2018 at 2:01 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 10/25, Oleg Nesterov wrote:
>> perhaps it needs some changes too. I even have a vague feeling that I have already
>> blamed this function some time ago...
>
> Heh, yes, 3 years ago ;)
>
> https://lore.kernel.org/lkml/20150106184427.GA18153@redhat.com/
>
> I can't understand my email today, but note that I tried to point out that
> task_is_descendant() can dereference the freed mem.

Instead of:

        while (walker->pid > 0) {

should it simply be "while (pid_liave(walker)) {"? And add a
pid_alive(parent) after rcu_read_lock()?

> And yes, task_is_descendant() is overcompicated for no reason, afaics.

Yeah, agreed. I'll fix this up. Just to make sure I'm not crazy: the
real_parent of all tasks in a thread group are the same, yes? The
trouble I was trying to deal with for the complication was where a
non-leader thread would add an exception to the checking, and the
tasks wouldn't match. (As far as I can see, though, using
same_thread_group() should fix it.)

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-26 16:09                   ` Kees Cook
@ 2018-10-29 12:23                     ` Oleg Nesterov
  2018-10-29 15:05                       ` yama: unsafe usage of ptrace_relation->tracer Oleg Nesterov
  0 siblings, 1 reply; 31+ messages in thread
From: Oleg Nesterov @ 2018-10-29 12:23 UTC (permalink / raw)
  To: Kees Cook
  Cc: Tetsuo Handa, Serge E. Hallyn, syzbot, James Morris, LKML,
	linux-security-module, syzkaller-bugs

On 10/26, Kees Cook wrote:
>
> On Thu, Oct 25, 2018 at 2:01 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 10/25, Oleg Nesterov wrote:
> >> perhaps it needs some changes too. I even have a vague feeling that I have already
> >> blamed this function some time ago...
> >
> > Heh, yes, 3 years ago ;)
> >
> > https://lore.kernel.org/lkml/20150106184427.GA18153@redhat.com/
> >
> > I can't understand my email today, but note that I tried to point out that
> > task_is_descendant() can dereference the freed mem.
>
> Instead of:
>
>         while (walker->pid > 0) {
>
> should it simply be "while (pid_liave(walker)) {"?

No, this would be wrong. Probably walker->pid > 0 is not the best check,
but we do not need to change it for correctness.

> And add a
> pid_alive(parent) after rcu_read_lock()?

So you too do not read my emails ;)

I still think we need a single pid_alive() check and I even sent the patch.
Attached again at the end.

To clarify, let me repeat that ptracer_exception_found() may need some fixes
too, right now I am only talking about task_is_descendant().

> > And yes, task_is_descendant() is overcompicated for no reason, afaics.
>
> Yeah, agreed. I'll fix this up.

I have already posted this code, this is what I think it should do.


	static int task_is_descendant(struct task_struct *parent,
				      struct task_struct *child)
	{
		struct task_struct *walker;

		for (walker = child; walker->pid; walker = rcu_dereference(walker->real_parent)) {
			if (same_thread_group(parent, walker))
				return 1;
		}

		return 0;
	}

This version differs in that I removed the parent/child != NULL at the start
and rcu_read_lock(), it should be held by the caller anyway.

> Just to make sure I'm not crazy: the
> real_parent of all tasks in a thread group are the same, yes?

Well, yes and no. So if

	same_thread_group(t1, t2) == T

then
	same_thread_group(t1->real_parent, t2->real_parent) == T

which means that real_parent of all tasks in a thread group is the same
_process_.

But t1->real_parent and t2->real_parent are not necessarily the same task.

Oleg.

--- x/security/yama/yama_lsm.c
+++ x/security/yama/yama_lsm.c
@@ -368,7 +368,8 @@ static int yama_ptrace_access_check(stru
 			break;
 		case YAMA_SCOPE_RELATIONAL:
 			rcu_read_lock();
-			if (!task_is_descendant(current, child) &&
+			if (!pid_alive(child) ||
+			    !task_is_descendant(current, child) &&
 			    !ptracer_exception_found(current, child) &&
 			    !ns_capable(__task_cred(child)->user_ns, CAP_SYS_PTRACE))
 				rc = -EPERM;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* yama: unsafe usage of ptrace_relation->tracer
  2018-10-29 12:23                     ` Oleg Nesterov
@ 2018-10-29 15:05                       ` Oleg Nesterov
  2019-01-10 11:05                         ` Tetsuo Handa
  0 siblings, 1 reply; 31+ messages in thread
From: Oleg Nesterov @ 2018-10-29 15:05 UTC (permalink / raw)
  To: Kees Cook
  Cc: Tetsuo Handa, Serge E. Hallyn, syzbot, James Morris, LKML,
	linux-security-module, syzkaller-bugs

let me change the subject to avoid the confusion with the already confusing
disccussion about task_is_descendant().

On 10/29, Oleg Nesterov wrote:
>
> I still think we need a single pid_alive() check and I even sent the patch.
> Attached again at the end.
>
> To clarify, let me repeat that ptracer_exception_found() may need some fixes
> too, right now I am only talking about task_is_descendant().

so yes, the ptracer_relations code looks very broken to me, but perhaps I
misread this code, please correct me.

RCU can only protect the ptracer_relations list itself, you can do nothing
with (say) relation->tracer. relation->tracer can be already freed when
ptracer_exception_found() checks relation->tracee == tracee.

Not only pid_alive(parent) can not help in this case, pid_alive(parent) is
equally unsafe because, again, this memory can be freed.

security_task_free(tsk) is called right before free_task(tsk), there is no
a gp pass in between, and of course we can't rely on the ->invalid check.

_At first glance_ we can fix this if we simply turn both ->tracer/tracee
pointers into "signal_struct *", then we can turn all same_thread_group()'s
into walker->signal == parent which doesn't need to dereference the possibly-
freed parent. This also allows to remove all thread_group_leader() checks.
We need to ensure that false-positive is not possible (if, say, ->tracer
was already re-allocated and points to another task->signal), but this
doesn't look difficult.

Oleg.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-21  7:10 KASAN: use-after-free Read in task_is_descendant syzbot
  2018-10-21  7:12 ` Tetsuo Handa
@ 2018-11-10  3:25 ` syzbot
  2018-11-10 11:46 ` syzbot
  2 siblings, 0 replies; 31+ messages in thread
From: syzbot @ 2018-11-10  3:25 UTC (permalink / raw)
  To: jmorris, keescook, linux-kernel, linux-security-module, oleg,
	penguin-kernel, penguin-kernel, serge, syzkaller-bugs

syzbot has found a reproducer for the following crash on:

HEAD commit:    3541833fd1f2 Merge tag 's390-4.20-2' of git://git.kernel.o..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16d3cad5400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8f559fee2fc3375a
dashboard link: https://syzkaller.appspot.com/bug?extid=a9ac39bf55329e206219
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1494bb0b400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+a9ac39bf55329e206219@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
==================================================================
BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:182  
[inline]
BUG: KASAN: use-after-free in task_is_descendant.part.3+0x610/0x670  
security/yama/yama_lsm.c:295
Read of size 8 at addr ffff8801c46f24e0 by task syz-executor2/9973

CPU: 0 PID: 9973 Comm: syz-executor2 Not tainted 4.20.0-rc1+ #107
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x244/0x39d lib/dump_stack.c:113
  print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
  __read_once_size include/linux/compiler.h:182 [inline]
  task_is_descendant.part.3+0x610/0x670 security/yama/yama_lsm.c:295
  task_is_descendant security/yama/yama_lsm.c:282 [inline]
  yama_ptrace_access_check+0x215/0x10fc security/yama/yama_lsm.c:371
  security_ptrace_access_check+0x54/0xb0 security/security.c:271
  __ptrace_may_access+0x5c8/0x980 kernel/ptrace.c:336
  ptrace_attach+0x1fa/0x640 kernel/ptrace.c:389
  __do_sys_ptrace kernel/ptrace.c:1139 [inline]
  __se_sys_ptrace kernel/ptrace.c:1119 [inline]
  __x64_sys_ptrace+0x229/0x260 kernel/ptrace.c:1119
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457569
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f2ed4dbfc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000065
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000000457569
RDX: 0000000000000000 RSI: 000000000000021d RDI: 0000000000004206
RBP: 000000000072bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f2ed4dc06d4
R13: 00000000004c33bd R14: 00000000004d50e0 R15: 00000000ffffffff

Allocated by task 5873:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
  kmem_cache_alloc_node+0x144/0x730 mm/slab.c:3644
  alloc_task_struct_node kernel/fork.c:158 [inline]
  dup_task_struct kernel/fork.c:843 [inline]
  copy_process+0x2026/0x87a0 kernel/fork.c:1751
  _do_fork+0x1cb/0x11d0 kernel/fork.c:2216
  __do_sys_clone kernel/fork.c:2323 [inline]
  __se_sys_clone kernel/fork.c:2317 [inline]
  __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 9:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kmem_cache_free+0x83/0x290 mm/slab.c:3760
  free_task_struct kernel/fork.c:163 [inline]
  free_task+0x16e/0x1f0 kernel/fork.c:457
  __put_task_struct+0x2e6/0x620 kernel/fork.c:730
  put_task_struct include/linux/sched/task.h:96 [inline]
  delayed_put_task_struct+0x2ff/0x4c0 kernel/exit.c:181
  __rcu_reclaim kernel/rcu/rcu.h:240 [inline]
  rcu_do_batch kernel/rcu/tree.c:2437 [inline]
  invoke_rcu_callbacks kernel/rcu/tree.c:2716 [inline]
  rcu_process_callbacks+0x100a/0x1ac0 kernel/rcu/tree.c:2697
  __do_softirq+0x308/0xb7e kernel/softirq.c:292

The buggy address belongs to the object at ffff8801c46f2000
  which belongs to the cache task_struct(65:syz2) of size 6080
The buggy address is located 1248 bytes inside of
  6080-byte region [ffff8801c46f2000, ffff8801c46f37c0)
The buggy address belongs to the page:
page:ffffea000711bc80 count:1 mapcount:0 mapping:ffff8801c204fc80 index:0x0  
compound_mapcount: 0
flags: 0x2fffc0000010200(slab|head)
raw: 02fffc0000010200 ffffea00073cf808 ffffea000697c908 ffff8801c204fc80
raw: 0000000000000000 ffff8801c46f2000 0000000100000001 ffff8801d8404a80
page dumped because: kasan: bad access detected
page->mem_cgroup:ffff8801d8404a80

Memory state around the buggy address:
  ffff8801c46f2380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801c46f2400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8801c46f2480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                        ^
  ffff8801c46f2500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801c46f2580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KASAN: use-after-free Read in task_is_descendant
  2018-10-21  7:10 KASAN: use-after-free Read in task_is_descendant syzbot
  2018-10-21  7:12 ` Tetsuo Handa
  2018-11-10  3:25 ` syzbot
@ 2018-11-10 11:46 ` syzbot
  2 siblings, 0 replies; 31+ messages in thread
From: syzbot @ 2018-11-10 11:46 UTC (permalink / raw)
  To: jmorris, keescook, linux-kernel, linux-security-module, oleg,
	penguin-kernel, penguin-kernel, serge, syzkaller-bugs

syzbot has found a reproducer for the following crash on:

HEAD commit:    aa4330e15c26 Merge tag 'devicetree-fixes-for-4.20-2' of gi..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=17abaa47400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8f559fee2fc3375a
dashboard link: https://syzkaller.appspot.com/bug?extid=a9ac39bf55329e206219
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
userspace arch: i386
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=108ae6d5400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17bd86a3400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+a9ac39bf55329e206219@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:182  
[inline]
BUG: KASAN: use-after-free in task_is_descendant.part.3+0x610/0x670  
security/yama/yama_lsm.c:295
Read of size 8 at addr ffff8801ae718560 by task syz-executor198/4607

CPU: 1 PID: 4607 Comm: syz-executor198 Not tainted 4.20.0-rc1+ #232
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x244/0x39d lib/dump_stack.c:113
  print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
  __read_once_size include/linux/compiler.h:182 [inline]
  task_is_descendant.part.3+0x610/0x670 security/yama/yama_lsm.c:295
  task_is_descendant security/yama/yama_lsm.c:282 [inline]
  yama_ptrace_access_check+0x215/0x10fc security/yama/yama_lsm.c:371
  security_ptrace_access_check+0x54/0xb0 security/security.c:271
  __ptrace_may_access+0x5c8/0x980 kernel/ptrace.c:336
  ptrace_attach+0x1fa/0x640 kernel/ptrace.c:389
  __do_compat_sys_ptrace kernel/ptrace.c:1284 [inline]
  __se_compat_sys_ptrace kernel/ptrace.c:1266 [inline]
  __ia32_compat_sys_ptrace+0x1d2/0x260 kernel/ptrace.c:1266
  do_syscall_32_irqs_on arch/x86/entry/common.c:326 [inline]
  do_fast_syscall_32+0x34d/0xfb2 arch/x86/entry/common.c:397
  entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7feba29
Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 90 90  
90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90  
90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 002b:00000000f7fe71ec EFLAGS: 00000296 ORIG_RAX: 000000000000001a
RAX: ffffffffffffffda RBX: 0000000000004206 RCX: 00000000000014b5
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Allocated by task 5668:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
  kmem_cache_alloc_node+0x144/0x730 mm/slab.c:3644
  alloc_task_struct_node kernel/fork.c:158 [inline]
  dup_task_struct kernel/fork.c:843 [inline]
  copy_process+0x2026/0x87a0 kernel/fork.c:1751
  _do_fork+0x1cb/0x11d0 kernel/fork.c:2216
  __do_compat_sys_x86_clone arch/x86/ia32/sys_ia32.c:240 [inline]
  __se_compat_sys_x86_clone arch/x86/ia32/sys_ia32.c:236 [inline]
  __ia32_compat_sys_x86_clone+0xbc/0x140 arch/x86/ia32/sys_ia32.c:236
  do_syscall_32_irqs_on arch/x86/entry/common.c:326 [inline]
  do_fast_syscall_32+0x34d/0xfb2 arch/x86/entry/common.c:397
  entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139

Freed by task 16:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kmem_cache_free+0x83/0x290 mm/slab.c:3760
  free_task_struct kernel/fork.c:163 [inline]
  free_task+0x16e/0x1f0 kernel/fork.c:457
  __put_task_struct+0x2e6/0x620 kernel/fork.c:730
  put_task_struct include/linux/sched/task.h:96 [inline]
  delayed_put_task_struct+0x2ff/0x4c0 kernel/exit.c:181
  __rcu_reclaim kernel/rcu/rcu.h:240 [inline]
  rcu_do_batch kernel/rcu/tree.c:2437 [inline]
  invoke_rcu_callbacks kernel/rcu/tree.c:2716 [inline]
  rcu_process_callbacks+0x100a/0x1ac0 kernel/rcu/tree.c:2697
  __do_softirq+0x308/0xb7e kernel/softirq.c:292

The buggy address belongs to the object at ffff8801ae718080
  which belongs to the cache task_struct of size 6080
The buggy address is located 1248 bytes inside of
  6080-byte region [ffff8801ae718080, ffff8801ae719840)
The buggy address belongs to the page:
page:ffffea0006b9c600 count:1 mapcount:0 mapping:ffff8801da970180 index:0x0  
compound_mapcount: 0
flags: 0x2fffc0000010200(slab|head)
raw: 02fffc0000010200 ffffea0006b99488 ffffea0006b9c688 ffff8801da970180
raw: 0000000000000000 ffff8801ae718080 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801ae718400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801ae718480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8801ae718500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                        ^
  ffff8801ae718580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801ae718600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: yama: unsafe usage of ptrace_relation->tracer
  2018-10-29 15:05                       ` yama: unsafe usage of ptrace_relation->tracer Oleg Nesterov
@ 2019-01-10 11:05                         ` Tetsuo Handa
  2019-01-10 18:47                           ` Kees Cook
  2019-01-16 17:40                           ` Oleg Nesterov
  0 siblings, 2 replies; 31+ messages in thread
From: Tetsuo Handa @ 2019-01-10 11:05 UTC (permalink / raw)
  To: Oleg Nesterov, Kees Cook
  Cc: Serge E. Hallyn, syzbot, James Morris, LKML,
	linux-security-module, syzkaller-bugs

Hello, Kees.

syzbot is hitting this problem as of linux-next-20190110.
When a patch will be proposed?

On 2018/10/30 0:05, Oleg Nesterov wrote:
> let me change the subject to avoid the confusion with the already confusing
> disccussion about task_is_descendant().
> 
> On 10/29, Oleg Nesterov wrote:
>>
>> I still think we need a single pid_alive() check and I even sent the patch.
>> Attached again at the end.
>>
>> To clarify, let me repeat that ptracer_exception_found() may need some fixes
>> too, right now I am only talking about task_is_descendant().
> 
> so yes, the ptracer_relations code looks very broken to me, but perhaps I
> misread this code, please correct me.
> 
> RCU can only protect the ptracer_relations list itself, you can do nothing
> with (say) relation->tracer. relation->tracer can be already freed when
> ptracer_exception_found() checks relation->tracee == tracee.
> 
> Not only pid_alive(parent) can not help in this case, pid_alive(parent) is
> equally unsafe because, again, this memory can be freed.
> 
> security_task_free(tsk) is called right before free_task(tsk), there is no
> a gp pass in between, and of course we can't rely on the ->invalid check.
> 
> _At first glance_ we can fix this if we simply turn both ->tracer/tracee
> pointers into "signal_struct *", then we can turn all same_thread_group()'s
> into walker->signal == parent which doesn't need to dereference the possibly-
> freed parent. This also allows to remove all thread_group_leader() checks.
> We need to ensure that false-positive is not possible (if, say, ->tracer
> was already re-allocated and points to another task->signal), but this
> doesn't look difficult.
> 
> Oleg.
> 
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: yama: unsafe usage of ptrace_relation->tracer
  2019-01-10 11:05                         ` Tetsuo Handa
@ 2019-01-10 18:47                           ` Kees Cook
  2019-01-16 17:40                           ` Oleg Nesterov
  1 sibling, 0 replies; 31+ messages in thread
From: Kees Cook @ 2019-01-10 18:47 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Oleg Nesterov, Serge E. Hallyn, syzbot, James Morris, LKML,
	linux-security-module, syzkaller-bugs

On Thu, Jan 10, 2019 at 3:06 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> syzbot is hitting this problem as of linux-next-20190110.
> When a patch will be proposed?

Hi! Sorry, this got delayed over the holidays. Let me finish the patch
I was working on and get it published.

Thanks!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: yama: unsafe usage of ptrace_relation->tracer
  2019-01-10 11:05                         ` Tetsuo Handa
  2019-01-10 18:47                           ` Kees Cook
@ 2019-01-16 17:40                           ` Oleg Nesterov
  1 sibling, 0 replies; 31+ messages in thread
From: Oleg Nesterov @ 2019-01-16 17:40 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Kees Cook, Serge E. Hallyn, syzbot, James Morris, LKML,
	linux-security-module, syzkaller-bugs

On 01/10, Tetsuo Handa wrote:
>
> syzbot is hitting this problem as of linux-next-20190110.
> When a patch will be proposed?

Well. I have already suggested the patch below several times. It won't fix all
problems in this code (I forgot the details but iirc ptracer_exception_found()
is broken too, task_is_descendant() should be cleanuped, may be something else),
but probably this trivial fix makes sense for the start?

Oleg.


--- x/security/yama/yama_lsm.c
+++ x/security/yama/yama_lsm.c
@@ -368,7 +368,8 @@ static int yama_ptrace_access_check(stru
 			break;
 		case YAMA_SCOPE_RELATIONAL:
 			rcu_read_lock();
-			if (!task_is_descendant(current, child) &&
+			if (!pid_alive(child) ||
+			    !task_is_descendant(current, child) &&
 			    !ptracer_exception_found(current, child) &&
 			    !ns_capable(__task_cred(child)->user_ns, CAP_SYS_PTRACE))
 				rc = -EPERM;


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, back to index

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-21  7:10 KASAN: use-after-free Read in task_is_descendant syzbot
2018-10-21  7:12 ` Tetsuo Handa
2018-10-22  9:54   ` Oleg Nesterov
2018-10-22 10:06     ` Tetsuo Handa
2018-10-22 13:46       ` Oleg Nesterov
2018-10-25  2:15         ` Tetsuo Handa
2018-10-25 11:13           ` Oleg Nesterov
2018-10-25 11:36             ` Kees Cook
2018-10-25 12:05               ` Oleg Nesterov
2018-10-25 11:47             ` Tetsuo Handa
2018-10-25 12:17               ` Oleg Nesterov
2018-10-25 13:01                 ` Oleg Nesterov
2018-10-26 16:09                   ` Kees Cook
2018-10-29 12:23                     ` Oleg Nesterov
2018-10-29 15:05                       ` yama: unsafe usage of ptrace_relation->tracer Oleg Nesterov
2019-01-10 11:05                         ` Tetsuo Handa
2019-01-10 18:47                           ` Kees Cook
2019-01-16 17:40                           ` Oleg Nesterov
2018-10-25 13:14                 ` KASAN: use-after-free Read in task_is_descendant Tetsuo Handa
2018-10-25 15:55                   ` Oleg Nesterov
2018-10-25 16:25                     ` Oleg Nesterov
2018-10-26 12:23                     ` Tetsuo Handa
2018-10-26 13:04                       ` Oleg Nesterov
2018-10-26 13:51                         ` Tetsuo Handa
2018-10-26 14:39                           ` Oleg Nesterov
2018-10-26 15:04                             ` Tetsuo Handa
2018-10-26 15:22                               ` Oleg Nesterov
2018-10-25  8:19     ` Kees Cook
2018-10-25 11:52       ` Oleg Nesterov
2018-11-10  3:25 ` syzbot
2018-11-10 11:46 ` syzbot

Linux-Security-Module Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-security-module/0 linux-security-module/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-security-module linux-security-module/ https://lore.kernel.org/linux-security-module \
		linux-security-module@vger.kernel.org linux-security-module@archiver.kernel.org
	public-inbox-index linux-security-module


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-security-module


AGPL code for this site: git clone https://public-inbox.org/ public-inbox