linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: KASAN: use-after-free Read in sock_release
       [not found] <94eb2c19e756c0119b055f1afbd0@google.com>
@ 2017-11-29 19:37 ` Cong Wang
  2017-11-29 20:24   ` Linus Torvalds
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Cong Wang @ 2017-11-29 19:37 UTC (permalink / raw)
  To: syzbot
  Cc: David Miller, LKML, Linux Kernel Network Developers,
	syzkaller-bugs, linux-fsdevel, Linus Torvalds, Al Viro

(Cc'ing fs people...)

On Wed, Nov 29, 2017 at 12:33 AM, syzbot
<bot+9abea25706ae35022385a41f61e579ed66e88a3f@syzkaller.appspotmail.com>
wrote:
> Hello,
>
> syzkaller hit the following crash on
> 1d3b78bbc6e983fabb3fbf91b76339bf66e4a12c
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
>
> Unfortunately, I don't have any reproducer for this bug yet.
>
>
> device syz3 left promiscuous mode
> device syz3 entered promiscuous mode
> ==================================================================
> BUG: KASAN: use-after-free in sock_release+0x1c6/0x1e0 net/socket.c:601
> Read of size 8 at addr ffff8801c8dd1d10 by task syz-executor4/31085
>
> CPU: 0 PID: 31085 Comm: syz-executor4 Not tainted 4.14.0+ #129
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  print_address_description+0x73/0x250 mm/kasan/report.c:252
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
>  sock_release+0x1c6/0x1e0 net/socket.c:601
>  sock_close+0x16/0x20 net/socket.c:1125
>  __fput+0x333/0x7f0 fs/file_table.c:210
>  ____fput+0x15/0x20 fs/file_table.c:244
>  task_work_run+0x199/0x270 kernel/task_work.c:113
>  exit_task_work include/linux/task_work.h:22 [inline]
>  do_exit+0x9bb/0x1ae0 kernel/exit.c:865
>  do_group_exit+0x149/0x400 kernel/exit.c:968
>  get_signal+0x73f/0x16c0 kernel/signal.c:2335
>  do_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:809
>  exit_to_usermode_loop+0x214/0x310 arch/x86/entry/common.c:158
>  prepare_exit_to_usermode arch/x86/entry/common.c:195 [inline]
>  syscall_return_slowpath+0x490/0x550 arch/x86/entry/common.c:264
>  entry_SYSCALL_64_fastpath+0x94/0x96
> RIP: 0033:0x452879
> RSP: 002b:00007fb1c2435ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: fffffffffffffe00 RBX: 0000000000758100 RCX: 0000000000452879
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000758100
> RBP: 0000000000758100 R08: 0000000000000304 R09: 00000000007580d8
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000a6f7ff R14: 00007fb1c24369c0 R15: 000000000000000e
>
> Allocated by task 31066:
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
>  kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3613
>  kmalloc include/linux/slab.h:499 [inline]
>  sock_alloc_inode+0xb4/0x300 net/socket.c:253
>  alloc_inode+0x65/0x180 fs/inode.c:208
>  new_inode_pseudo+0x69/0x190 fs/inode.c:890
>  sock_alloc+0x41/0x270 net/socket.c:565
>  __sock_create+0x148/0x850 net/socket.c:1225
>  sock_create net/socket.c:1301 [inline]
>  SYSC_socket net/socket.c:1331 [inline]
>  SyS_socket+0xeb/0x200 net/socket.c:1311
>  entry_SYSCALL_64_fastpath+0x1f/0x96
>
> Freed by task 3039:
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
>  __cache_free mm/slab.c:3491 [inline]
>  kfree+0xca/0x250 mm/slab.c:3806
>  __rcu_reclaim kernel/rcu/rcu.h:190 [inline]
>  rcu_do_batch kernel/rcu/tree.c:2758 [inline]
>  invoke_rcu_callbacks kernel/rcu/tree.c:3012 [inline]
>  __rcu_process_callbacks kernel/rcu/tree.c:2979 [inline]
>  rcu_process_callbacks+0xe79/0x17d0 kernel/rcu/tree.c:2996
>  __do_softirq+0x29d/0xbb2 kernel/softirq.c:285

This looks more like a fs issue than network, my fs knowledge
is not good enough to justify why the hell the inode could be
destroyed before we release the fd.

My _guess_ is that it is because we defer the ____fput() to a
task work. If this is the case, then fs layer is not guilty for this.

On the other hand, if we have to blame net layer, it does look
suspicious on the RCU usage in sock_release() where we
claim RCU protection but I don't see we hold any RCU lock
there. Also, the code that deferences sock->wq is pretty much
useless now, at least I don't see it catches any bug though.


diff --git a/net/socket.c b/net/socket.c
index 42d8e9c9ccd5..b2390b5591a9 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -598,9 +598,6 @@ void sock_release(struct socket *sock)
                module_put(owner);
        }

-       if (rcu_dereference_protected(sock->wq, 1)->fasync_list)
-               pr_err("%s: fasync list not empty!\n", __func__);
-
        this_cpu_sub(sockets_in_use, 1);
        if (!sock->file) {
                iput(SOCK_INODE(sock));

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: KASAN: use-after-free Read in sock_release
  2017-11-29 19:37 ` KASAN: use-after-free Read in sock_release Cong Wang
@ 2017-11-29 20:24   ` Linus Torvalds
  2017-11-30  2:07     ` Al Viro
  2017-11-29 20:49   ` Eric Dumazet
  2017-11-30  2:24   ` Al Viro
  2 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2017-11-29 20:24 UTC (permalink / raw)
  To: Cong Wang, Al Viro
  Cc: syzbot, David Miller, LKML, Linux Kernel Network Developers,
	syzkaller-bugs, linux-fsdevel

On Wed, Nov 29, 2017 at 11:37 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> (Cc'ing fs people...)
>
> On Wed, Nov 29, 2017 at 12:33 AM, syzbot wrote:
>> BUG: KASAN: use-after-free in sock_release+0x1c6/0x1e0 net/socket.c:601

Lovely.

Yeah, that is:

   601          if (rcu_dereference_protected(sock->wq, 1)->fasync_list)

and as you say, that "rcu_dereference_protected()" is confusing, but
that should be ok because we have a ref to the inode, and we're really
just testing that the pointer is zero.

The call trace here is:

>>  sock_release+0x1c6/0x1e0 net/socket.c:601
>>  sock_close+0x16/0x20 net/socket.c:1125
>>  __fput+0x333/0x7f0 fs/file_table.c:210
>>  ____fput+0x15/0x20 fs/file_table.c:244
>>  task_work_run+0x199/0x270 kernel/task_work.c:113

and there is no RCU protection anywhere, but it's really just a sanity
check, and the access _should_ be ok.

The stale access does seem to be because 'sock' (embedded in the
inode) itself that has been free'd:

>> Allocated by task 31066:
>>  kmalloc include/linux/slab.h:499 [inline]
>>  sock_alloc_inode+0xb4/0x300 net/socket.c:253
>>  alloc_inode+0x65/0x180 fs/inode.c:208
>>  new_inode_pseudo+0x69/0x190 fs/inode.c:890
>>  sock_alloc+0x41/0x270 net/socket.c:565
>>  __sock_create+0x148/0x850 net/socket.c:1225
>>  sock_create net/socket.c:1301 [inline]
>>  SYSC_socket net/socket.c:1331 [inline]
>>  SyS_socket+0xeb/0x200 net/socket.c:1311
>
> This looks more like a fs issue than network, my fs knowledge
> is not good enough to justify why the hell the inode could be
> destroyed before we release the fd.

Ugh. The inode freeing really is confusing and fairly involved, but
the last free *should* happen as part of the final dput() that is done
at the end of __fput().

So in __fput() calls into the

        if (file->f_op->release)
                file->f_op->release(inode, file);

then the inode should still be around, because the final ref won't be
done until later. And RCU simply shouldn't be an issue, because of
that reference count on the inode.

So it smells like some reference counting went wrong. The socket inode
creation is a bit confusing, and then in "sock_release()" we do have
that

        if (!sock->file) {
                iput(SOCK_INODE(sock));
                return;
        }
        sock->file = NULL;

which *also* tries to free the inode. I'm not sure what the logic (and
what the locking) behind that code all is.

What *is* the locking for "sock->file" anyway?

Al, can you take a look on the vfs side? But I'm inclined to blame the
socket code, because if we really had a "inode free'd early" issue at
a vfs level, I'd have expected us to see infinite chaos.

             Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: KASAN: use-after-free Read in sock_release
  2017-11-29 19:37 ` KASAN: use-after-free Read in sock_release Cong Wang
  2017-11-29 20:24   ` Linus Torvalds
@ 2017-11-29 20:49   ` Eric Dumazet
  2017-11-30  2:24   ` Al Viro
  2 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2017-11-29 20:49 UTC (permalink / raw)
  To: Cong Wang, syzbot
  Cc: David Miller, LKML, Linux Kernel Network Developers,
	syzkaller-bugs, linux-fsdevel, Linus Torvalds, Al Viro

On Wed, 2017-11-29 at 11:37 -0800, Cong Wang wrote:
> (Cc'ing fs people...)
> 
> On Wed, Nov 29, 2017 at 12:33 AM, syzbot
> <bot+9abea25706ae35022385a41f61e579ed66e88a3f@syzkaller.appspotmail.c
> om>
> wrote:
> > Hello,
> > 
> > syzkaller hit the following crash on
> > 1d3b78bbc6e983fabb3fbf91b76339bf66e4a12c
> > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-
> > next.git/master
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached
> > Raw console output is attached.
> > 
> > Unfortunately, I don't have any reproducer for this bug yet.
> > 
> > 
> > device syz3 left promiscuous mode
> > device syz3 entered promiscuous mode
> > ==================================================================
> > BUG: KASAN: use-after-free in sock_release+0x1c6/0x1e0
> > net/socket.c:601
> > Read of size 8 at addr ffff8801c8dd1d10 by task syz-executor4/31085
> > 
> > CPU: 0 PID: 31085 Comm: syz-executor4 Not tainted 4.14.0+ #129
> > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > BIOS
> > Google 01/01/2011
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:17 [inline]
> >  dump_stack+0x194/0x257 lib/dump_stack.c:53
> >  print_address_description+0x73/0x250 mm/kasan/report.c:252
> >  kasan_report_error mm/kasan/report.c:351 [inline]
> >  kasan_report+0x25b/0x340 mm/kasan/report.c:409
> >  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
> >  sock_release+0x1c6/0x1e0 net/socket.c:601
> >  sock_close+0x16/0x20 net/socket.c:1125
> >  __fput+0x333/0x7f0 fs/file_table.c:210
> >  ____fput+0x15/0x20 fs/file_table.c:244
> >  task_work_run+0x199/0x270 kernel/task_work.c:113
> >  exit_task_work include/linux/task_work.h:22 [inline]
> >  do_exit+0x9bb/0x1ae0 kernel/exit.c:865
> >  do_group_exit+0x149/0x400 kernel/exit.c:968
> >  get_signal+0x73f/0x16c0 kernel/signal.c:2335
> >  do_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:809
> >  exit_to_usermode_loop+0x214/0x310 arch/x86/entry/common.c:158
> >  prepare_exit_to_usermode arch/x86/entry/common.c:195 [inline]
> >  syscall_return_slowpath+0x490/0x550 arch/x86/entry/common.c:264
> >  entry_SYSCALL_64_fastpath+0x94/0x96
> > RIP: 0033:0x452879
> > RSP: 002b:00007fb1c2435ce8 EFLAGS: 00000246 ORIG_RAX:
> > 00000000000000ca
> > RAX: fffffffffffffe00 RBX: 0000000000758100 RCX: 0000000000452879
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000758100
> > RBP: 0000000000758100 R08: 0000000000000304 R09: 00000000007580d8
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > R13: 0000000000a6f7ff R14: 00007fb1c24369c0 R15: 000000000000000e
> > 
> > Allocated by task 31066:
> >  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
> >  set_track mm/kasan/kasan.c:459 [inline]
> >  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
> >  kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3613
> >  kmalloc include/linux/slab.h:499 [inline]
> >  sock_alloc_inode+0xb4/0x300 net/socket.c:253
> >  alloc_inode+0x65/0x180 fs/inode.c:208
> >  new_inode_pseudo+0x69/0x190 fs/inode.c:890
> >  sock_alloc+0x41/0x270 net/socket.c:565
> >  __sock_create+0x148/0x850 net/socket.c:1225
> >  sock_create net/socket.c:1301 [inline]
> >  SYSC_socket net/socket.c:1331 [inline]
> >  SyS_socket+0xeb/0x200 net/socket.c:1311
> >  entry_SYSCALL_64_fastpath+0x1f/0x96
> > 
> > Freed by task 3039:
> >  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
> >  set_track mm/kasan/kasan.c:459 [inline]
> >  kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
> >  __cache_free mm/slab.c:3491 [inline]
> >  kfree+0xca/0x250 mm/slab.c:3806
> >  __rcu_reclaim kernel/rcu/rcu.h:190 [inline]
> >  rcu_do_batch kernel/rcu/tree.c:2758 [inline]
> >  invoke_rcu_callbacks kernel/rcu/tree.c:3012 [inline]
> >  __rcu_process_callbacks kernel/rcu/tree.c:2979 [inline]
> >  rcu_process_callbacks+0xe79/0x17d0 kernel/rcu/tree.c:2996
> >  __do_softirq+0x29d/0xbb2 kernel/softirq.c:285
> 
> This looks more like a fs issue than network, my fs knowledge
> is not good enough to justify why the hell the inode could be
> destroyed before we release the fd.
> 
> My _guess_ is that it is because we defer the ____fput() to a
> task work. If this is the case, then fs layer is not guilty for this.
> 
> On the other hand, if we have to blame net layer, it does look
> suspicious on the RCU usage in sock_release() where we
> claim RCU protection but I don't see we hold any RCU lock
> there.

There is rcu protection for sock->wq, and the 1 in 
rcu_dereference_protected(sock->wq, 1) is because we do not have a
lockdep convenient way to express that we are the last user of sock,
and about to free it.


>  Also, the code that deferences sock->wq is pretty much
> useless now, at least I don't see it catches any bug though.
> 
> 
> diff --git a/net/socket.c b/net/socket.c
> index 42d8e9c9ccd5..b2390b5591a9 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -598,9 +598,6 @@ void sock_release(struct socket *sock)
>                 module_put(owner);
>         }
> 
> -       if (rcu_dereference_protected(sock->wq, 1)->fasync_list)
> -               pr_err("%s: fasync list not empty!\n", __func__);
> -
> 

At this point, sock->wq must be valid, and freed later (by us)

This really looks like some other bug, and a late effect.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: KASAN: use-after-free Read in sock_release
  2017-11-29 20:24   ` Linus Torvalds
@ 2017-11-30  2:07     ` Al Viro
  2017-11-30  4:16       ` Al Viro
  2017-11-30 13:18       ` Christoph Hellwig
  0 siblings, 2 replies; 9+ messages in thread
From: Al Viro @ 2017-11-30  2:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Cong Wang, syzbot, David Miller, LKML,
	Linux Kernel Network Developers, syzkaller-bugs, linux-fsdevel

On Wed, Nov 29, 2017 at 12:24:55PM -0800, Linus Torvalds wrote:
> Ugh. The inode freeing really is confusing and fairly involved, but
> the last free *should* happen as part of the final dput() that is done
> at the end of __fput().

Note that struct socket is coallocated with its inode.  _Normally_
from sock_alloc() (and that's the case here, apparently), but in several
cases it's embedded into another object.  TUN and TAP - definitely,
might have been other added.  Those should never be passed to sock_release()
at all.

> So in __fput() calls into the
> 
>         if (file->f_op->release)
>                 file->f_op->release(inode, file);
> 
> then the inode should still be around, because the final ref won't be
> done until later. And RCU simply shouldn't be an issue, because of
> that reference count on the inode.
> 
> So it smells like some reference counting went wrong. The socket inode
> creation is a bit confusing, and then in "sock_release()" we do have
> that
> 
>         if (!sock->file) {
>                 iput(SOCK_INODE(sock));
>                 return;
>         }
>         sock->file = NULL;
> 
> which *also* tries to free the inode. I'm not sure what the logic (and
> what the locking) behind that code all is.

If socket has never gone through sock_alloc_file(), sock_release() on it
is called explicitly and frees the sucker.  If it has been through
sock_alloc_file(), we must not call sock_release() directly and freeing
is done by iput() from final fput().

> What *is* the locking for "sock->file" anyway?

Pretty much assign-once - zeroing it in the end of sock_release() is
pure cosmetics (we'd damn better have no other references to that
sucker left anywhere; there's still a reference to embedded inode,
but that's it).

FWIW, looking through the callers of sock_alloc_file()... we might be
better off if it did sock_release() on failure.  Then the calling
conventions become "sock_alloc_file() means not calling sock_release()
directly - either it'll be done by the final fput() on resulting file,
or by sock_alloc_file() itself".

Look:
1) in lustre:
        sock_filp = sock_alloc_file(sock, 0, NULL);
        if (IS_ERR(sock_filp)) {
                sock_release(sock);
                rc = PTR_ERR(sock_filp);
                goto out;
        }
2) in net/9p:
        file = sock_alloc_file(csocket, 0, NULL);
        if (IS_ERR(file)) {
                pr_err("%s (%d): failed to map fd\n",
                       __func__, task_pid_nr(current));
                sock_release(csocket);
                kfree(p);
                return PTR_ERR(file);
        }
3) in sctp:
        *newfile = sock_alloc_file(newsock, 0, NULL);
        if (IS_ERR(*newfile)) {
                put_unused_fd(retval); 
                sock_release(newsock);
                retval = PTR_ERR(*newfile);
                *newfile = NULL;
                return retval;
        }
4) in accept4():
        newfile = sock_alloc_file(newsock, flags, sock->sk->sk_prot_creator->name);
        if (IS_ERR(newfile)) {
                err = PTR_ERR(newfile);
                put_unused_fd(newfd);
                sock_release(newsock);
                goto out_put;
        }

5) called in sock_map_fd(), and the sole caller is
        retval = sock_map_fd(sock, flags & (O_CLOEXEC | O_NONBLOCK));
        if (retval < 0)
                goto out_release;
...
out_release:
        sock_release(sock);
        return retval;
(with no fallthrough or other goto into out_release)

6) the second caller in socketpair():
        newfile2 = sock_alloc_file(sock2, flags, NULL);
        if (IS_ERR(newfile2)) {
                err = PTR_ERR(newfile2);
                goto out_fput_1;
        }
...
out_fput_1:
        fput(newfile1);
        put_unused_fd(fd2);
        put_unused_fd(fd1);
        sock_release(sock2);
        goto out;
(again, no fallthrough or other goto into out_fput_1)

7) the first caller in socketpair():
        newfile1 = sock_alloc_file(sock1, flags, NULL);
        if (IS_ERR(newfile1)) {
                err = PTR_ERR(newfile1);
                goto out_put_unused_both;
        }
...
out_put_unused_both:
        put_unused_fd(fd2);
out_put_unused_1:
        put_unused_fd(fd1);
out_release_both:
        sock_release(sock2);
out_release_1:
        sock_release(sock1);
out:
        return err;
No fallthrough or goto either.  Sure, we get a failure exit unshared,
but AFAICS some reordering can simplify things quite a bit there.

8) kcm_clone().  Fucked in head - we allocate socket, then file, *THEN*
sock, then attach sock to socket (already attached to file), then finally
deign to initialize sock (already attached to socket, which is attached
to file).  And, surprise, surprise, failure exits are all wrong.
Moreover, calling conventions are broken by design - after we'd put
the damn file into descriptor table we return the pointer to sock
to the caller.  By that time it might have bloody well been destroyed
by close(2) from another thread; good thing the caller doesn't use
the damn thing.  Unfortunately, it is doing this:
                if (!err) {
                        if (copy_to_user((void __user *)arg, &info,
                                         sizeof(info))) {
                                err = -EFAULT;
                                sys_close(info.fd);
                        }
                }
which is also bogus - again, the descriptor might have been already
closed by another thread, with another one put in its place, etc.
The right way to do that is to do fd_install() _last_.  After the
last failure exit, i.e. after copy_to_user() in that case.

KCM would've looked as a likely cause of that shit, if not for the
fact that object had been created by socket(2).  It definitely needs
fixing, but that's not the cause of PITA here.

Incidentally, grepping for sys_close() shows another piece of fun in
net/netfilter/xt_bpf.c.  Folks, ONCE DESCRIPTOR IS INSTALLED, THAT'S
IT; THERE'S NO REMOVING IT ON FAILURE EXITS.  sys_close() should
never, ever be used that way.  Sigh...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: KASAN: use-after-free Read in sock_release
  2017-11-29 19:37 ` KASAN: use-after-free Read in sock_release Cong Wang
  2017-11-29 20:24   ` Linus Torvalds
  2017-11-29 20:49   ` Eric Dumazet
@ 2017-11-30  2:24   ` Al Viro
  2 siblings, 0 replies; 9+ messages in thread
From: Al Viro @ 2017-11-30  2:24 UTC (permalink / raw)
  To: Cong Wang
  Cc: syzbot, David Miller, LKML, Linux Kernel Network Developers,
	syzkaller-bugs, linux-fsdevel, Linus Torvalds

On Wed, Nov 29, 2017 at 11:37:04AM -0800, Cong Wang wrote:

> > Allocated by task 31066:
> >  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
> >  set_track mm/kasan/kasan.c:459 [inline]
> >  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
> >  kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3613
> >  kmalloc include/linux/slab.h:499 [inline]
> >  sock_alloc_inode+0xb4/0x300 net/socket.c:253
> >  alloc_inode+0x65/0x180 fs/inode.c:208
> >  new_inode_pseudo+0x69/0x190 fs/inode.c:890
> >  sock_alloc+0x41/0x270 net/socket.c:565
> >  __sock_create+0x148/0x850 net/socket.c:1225
> >  sock_create net/socket.c:1301 [inline]
> >  SYSC_socket net/socket.c:1331 [inline]
> >  SyS_socket+0xeb/0x200 net/socket.c:1311
> >  entry_SYSCALL_64_fastpath+0x1f/0x96
> >
> > Freed by task 3039:
> >  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
> >  set_track mm/kasan/kasan.c:459 [inline]
> >  kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
> >  __cache_free mm/slab.c:3491 [inline]
> >  kfree+0xca/0x250 mm/slab.c:3806
> >  __rcu_reclaim kernel/rcu/rcu.h:190 [inline]
> >  rcu_do_batch kernel/rcu/tree.c:2758 [inline]
> >  invoke_rcu_callbacks kernel/rcu/tree.c:3012 [inline]
> >  __rcu_process_callbacks kernel/rcu/tree.c:2979 [inline]
> >  rcu_process_callbacks+0xe79/0x17d0 kernel/rcu/tree.c:2996
> >  __do_softirq+0x29d/0xbb2 kernel/softirq.c:285

IDGI.  We are running into the object pointed to by sock->wq
already freed, right?  So how the hell had we managed to _fetch_
the pointer in the first place?  Freeing had been scheduled
by
        wq = rcu_dereference_protected(ei->socket.wq, 1);  
        kfree_rcu(wq, rcu);
        kmem_cache_free(sock_inode_cachep, ei);
so we should have
	* sock_destroy_inode() run on another CPU while we are
in the middle of sock_release(), sock->wq fetched by sock_release(),
sock->wq fed to kfree_rcu() by sock_destroy_inode() *AND* freed
before sock_release() got around to dereferencing it.

	Not impossible to hit, but... why hadn't we run into
much wider window?  If that sock_destroy_inode() on another
CPU had gotten to the call right after that kfree_rcu(), we
would've seen use-after-free on attempt to fetch ->wq...

	And it goes without saying that sock_destroy_inode() should
not have happened in parallel to sock_release(), or, for that matter,
to anything else done to struct socket instance...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: KASAN: use-after-free Read in sock_release
  2017-11-30  2:07     ` Al Viro
@ 2017-11-30  4:16       ` Al Viro
  2017-11-30 13:18       ` Christoph Hellwig
  1 sibling, 0 replies; 9+ messages in thread
From: Al Viro @ 2017-11-30  4:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Cong Wang, syzbot, David Miller, LKML,
	Linux Kernel Network Developers, syzkaller-bugs, linux-fsdevel

On Thu, Nov 30, 2017 at 02:07:19AM +0000, Al Viro wrote:

> FWIW, looking through the callers of sock_alloc_file()... we might be
> better off if it did sock_release() on failure.  Then the calling
> conventions become "sock_alloc_file() means not calling sock_release()
> directly - either it'll be done by the final fput() on resulting file,
> or by sock_alloc_file() itself".

FWIW^2: vfs.git#work.net is (completely untested) implementation of
that.  KCM fixes + sock_alloc_file() calling conventions change.

That's _not_ a pull request, it obviously needs testing and review on
netdev.  I like the way it looks, though...

Al Viro (3):
      socketpair(): allocate descriptors first
      fix kcm_clone()
      make sock_alloc_file() do sock_release() on failures
Diffstat:
 drivers/staging/lustre/lnet/lnet/lib-socket.c |   8 ++---
 net/9p/trans_fd.c                             |   1 -
 net/kcm/kcmsock.c                             |  68 ++++++++++++++---------------------------
 net/sctp/socket.c                             |   1 -
 net/socket.c                                  | 110 +++++++++++++++++++++++++++----------------------------------------
 5 files changed, 69 insertions(+), 119 deletions(-)

Cumulative diff:

diff --git a/drivers/staging/lustre/lnet/lnet/lib-socket.c b/drivers/staging/lustre/lnet/lnet/lib-socket.c
index 539a26444f31..7d49d4865298 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-socket.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-socket.c
@@ -71,16 +71,12 @@ lnet_sock_ioctl(int cmd, unsigned long arg)
 	}
 
 	sock_filp = sock_alloc_file(sock, 0, NULL);
-	if (IS_ERR(sock_filp)) {
-		sock_release(sock);
-		rc = PTR_ERR(sock_filp);
-		goto out;
-	}
+	if (IS_ERR(sock_filp))
+		return PTR_ERR(sock_filp);
 
 	rc = kernel_sock_unlocked_ioctl(sock_filp, cmd, arg);
 
 	fput(sock_filp);
-out:
 	return rc;
 }
 
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 985046ae4231..80f5c79053a4 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -839,7 +839,6 @@ static int p9_socket_open(struct p9_client *client, struct socket *csocket)
 	if (IS_ERR(file)) {
 		pr_err("%s (%d): failed to map fd\n",
 		       __func__, task_pid_nr(current));
-		sock_release(csocket);
 		kfree(p);
 		return PTR_ERR(file);
 	}
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index 0b750a22c4b9..d4e98f20fc2a 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -1625,60 +1625,30 @@ static struct proto kcm_proto = {
 };
 
 /* Clone a kcm socket. */
-static int kcm_clone(struct socket *osock, struct kcm_clone *info,
-		     struct socket **newsockp)
+static struct file *kcm_clone(struct socket *osock)
 {
 	struct socket *newsock;
 	struct sock *newsk;
-	struct file *newfile;
-	int err, newfd;
 
-	err = -ENFILE;
 	newsock = sock_alloc();
 	if (!newsock)
-		goto out;
+		return ERR_PTR(-ENFILE);
 
 	newsock->type = osock->type;
 	newsock->ops = osock->ops;
 
 	__module_get(newsock->ops->owner);
 
-	newfd = get_unused_fd_flags(0);
-	if (unlikely(newfd < 0)) {
-		err = newfd;
-		goto out_fd_fail;
-	}
-
-	newfile = sock_alloc_file(newsock, 0, osock->sk->sk_prot_creator->name);
-	if (IS_ERR(newfile)) {
-		err = PTR_ERR(newfile);
-		goto out_sock_alloc_fail;
-	}
-
 	newsk = sk_alloc(sock_net(osock->sk), PF_KCM, GFP_KERNEL,
 			 &kcm_proto, true);
 	if (!newsk) {
-		err = -ENOMEM;
-		goto out_sk_alloc_fail;
+		sock_release(newsock);
+		return ERR_PTR(-ENOMEM);
 	}
-
 	sock_init_data(newsock, newsk);
 	init_kcm_sock(kcm_sk(newsk), kcm_sk(osock->sk)->mux);
 
-	fd_install(newfd, newfile);
-	*newsockp = newsock;
-	info->fd = newfd;
-
-	return 0;
-
-out_sk_alloc_fail:
-	fput(newfile);
-out_sock_alloc_fail:
-	put_unused_fd(newfd);
-out_fd_fail:
-	sock_release(newsock);
-out:
-	return err;
+	return sock_alloc_file(newsock, 0, osock->sk->sk_prot_creator->name);
 }
 
 static int kcm_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
@@ -1708,17 +1678,25 @@ static int kcm_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 	}
 	case SIOCKCMCLONE: {
 		struct kcm_clone info;
-		struct socket *newsock = NULL;
-
-		err = kcm_clone(sock, &info, &newsock);
-		if (!err) {
-			if (copy_to_user((void __user *)arg, &info,
-					 sizeof(info))) {
-				err = -EFAULT;
-				sys_close(info.fd);
-			}
-		}
+		struct file *file;
+
+		info.fd = get_unused_fd_flags(0);
+		if (unlikely(info.fd < 0))
+			return info.fd;
 
+		file = kcm_clone(sock);
+		if (IS_ERR(file)) {
+			put_unused_fd(info.fd);
+			return PTR_ERR(file);
+		}
+		if (copy_to_user((void __user *)arg, &info,
+				 sizeof(info))) {
+			put_unused_fd(info.fd);
+			fput(file);
+			return -EFAULT;
+		}
+		fd_install(info.fd, file);
+		err = 0;
 		break;
 	}
 	default:
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 3204a9b29407..8bb5163d6331 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -5080,7 +5080,6 @@ static int sctp_getsockopt_peeloff_common(struct sock *sk, sctp_peeloff_arg_t *p
 	*newfile = sock_alloc_file(newsock, 0, NULL);
 	if (IS_ERR(*newfile)) {
 		put_unused_fd(retval);
-		sock_release(newsock);
 		retval = PTR_ERR(*newfile);
 		*newfile = NULL;
 		return retval;
diff --git a/net/socket.c b/net/socket.c
index 42d8e9c9ccd5..05f361faec45 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -406,8 +406,10 @@ struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname)
 		name.len = strlen(name.name);
 	}
 	path.dentry = d_alloc_pseudo(sock_mnt->mnt_sb, &name);
-	if (unlikely(!path.dentry))
+	if (unlikely(!path.dentry)) {
+		sock_release(sock);
 		return ERR_PTR(-ENOMEM);
+	}
 	path.mnt = mntget(sock_mnt);
 
 	d_instantiate(path.dentry, SOCK_INODE(sock));
@@ -415,9 +417,11 @@ struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname)
 	file = alloc_file(&path, FMODE_READ | FMODE_WRITE,
 		  &socket_file_ops);
 	if (IS_ERR(file)) {
-		/* drop dentry, keep inode */
+		/* drop dentry, keep inode for a bit */
 		ihold(d_inode(path.dentry));
 		path_put(&path);
+		/* ... and now kill it properly */
+		sock_release(sock);
 		return file;
 	}
 
@@ -1330,19 +1334,9 @@ SYSCALL_DEFINE3(socket, int, family, int, type, int, protocol)
 
 	retval = sock_create(family, type, protocol, &sock);
 	if (retval < 0)
-		goto out;
-
-	retval = sock_map_fd(sock, flags & (O_CLOEXEC | O_NONBLOCK));
-	if (retval < 0)
-		goto out_release;
-
-out:
-	/* It may be already another descriptor 8) Not kernel problem. */
-	return retval;
+		return retval;
 
-out_release:
-	sock_release(sock);
-	return retval;
+	return sock_map_fd(sock, flags & (O_CLOEXEC | O_NONBLOCK));
 }
 
 /*
@@ -1366,87 +1360,72 @@ SYSCALL_DEFINE4(socketpair, int, family, int, type, int, protocol,
 		flags = (flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
 
 	/*
+	 * reserve descriptors and make sure we won't fail
+	 * to return them to userland.
+	 */
+	fd1 = get_unused_fd_flags(flags);
+	if (unlikely(fd1 < 0))
+		return fd1;
+
+	fd2 = get_unused_fd_flags(flags);
+	if (unlikely(fd2 < 0)) {
+		put_unused_fd(fd1);
+		return fd2;
+	}
+
+	err = put_user(fd1, &usockvec[0]);
+	if (err)
+		goto out;
+
+	err = put_user(fd2, &usockvec[1]);
+	if (err)
+		goto out;
+
+	/*
 	 * Obtain the first socket and check if the underlying protocol
 	 * supports the socketpair call.
 	 */
 
 	err = sock_create(family, type, protocol, &sock1);
-	if (err < 0)
+	if (unlikely(err < 0))
 		goto out;
 
 	err = sock_create(family, type, protocol, &sock2);
-	if (err < 0)
-		goto out_release_1;
-
-	err = sock1->ops->socketpair(sock1, sock2);
-	if (err < 0)
-		goto out_release_both;
-
-	fd1 = get_unused_fd_flags(flags);
-	if (unlikely(fd1 < 0)) {
-		err = fd1;
-		goto out_release_both;
+	if (unlikely(err < 0)) {
+		sock_release(sock1);
+		goto out;
 	}
 
-	fd2 = get_unused_fd_flags(flags);
-	if (unlikely(fd2 < 0)) {
-		err = fd2;
-		goto out_put_unused_1;
+	err = sock1->ops->socketpair(sock1, sock2);
+	if (unlikely(err < 0)) {
+		sock_release(sock2);
+		sock_release(sock1);
+		goto out;
 	}
 
 	newfile1 = sock_alloc_file(sock1, flags, NULL);
 	if (IS_ERR(newfile1)) {
 		err = PTR_ERR(newfile1);
-		goto out_put_unused_both;
+		sock_release(sock2);
+		goto out;
 	}
 
 	newfile2 = sock_alloc_file(sock2, flags, NULL);
 	if (IS_ERR(newfile2)) {
 		err = PTR_ERR(newfile2);
-		goto out_fput_1;
+		fput(newfile1);
+		goto out;
 	}
 
-	err = put_user(fd1, &usockvec[0]);
-	if (err)
-		goto out_fput_both;
-
-	err = put_user(fd2, &usockvec[1]);
-	if (err)
-		goto out_fput_both;
-
 	audit_fd_pair(fd1, fd2);
 
 	fd_install(fd1, newfile1);
 	fd_install(fd2, newfile2);
-	/* fd1 and fd2 may be already another descriptors.
-	 * Not kernel problem.
-	 */
-
 	return 0;
 
-out_fput_both:
-	fput(newfile2);
-	fput(newfile1);
-	put_unused_fd(fd2);
-	put_unused_fd(fd1);
-	goto out;
-
-out_fput_1:
-	fput(newfile1);
-	put_unused_fd(fd2);
-	put_unused_fd(fd1);
-	sock_release(sock2);
-	goto out;
-
-out_put_unused_both:
+out:
 	put_unused_fd(fd2);
-out_put_unused_1:
 	put_unused_fd(fd1);
-out_release_both:
-	sock_release(sock2);
-out_release_1:
-	sock_release(sock1);
-out:
 	return err;
 }
 
@@ -1562,7 +1541,6 @@ SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr,
 	if (IS_ERR(newfile)) {
 		err = PTR_ERR(newfile);
 		put_unused_fd(newfd);
-		sock_release(newsock);
 		goto out_put;
 	}
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: KASAN: use-after-free Read in sock_release
  2017-11-30  2:07     ` Al Viro
  2017-11-30  4:16       ` Al Viro
@ 2017-11-30 13:18       ` Christoph Hellwig
  2017-11-30 15:46         ` Al Viro
  1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2017-11-30 13:18 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Cong Wang, syzbot, David Miller, LKML,
	Linux Kernel Network Developers, syzkaller-bugs, linux-fsdevel

On Thu, Nov 30, 2017 at 02:07:19AM +0000, Al Viro wrote:
> Incidentally, grepping for sys_close() shows another piece of fun in
> net/netfilter/xt_bpf.c.  Folks, ONCE DESCRIPTOR IS INSTALLED, THAT'S
> IT; THERE'S NO REMOVING IT ON FAILURE EXITS.  sys_close() should
> never, ever be used that way.  Sigh...

Would be great do unexport the thing.  Except that we also have
binfmt_misc (which looks legit) and autofs4, which on crack decided
that close() isn't a fun syscall, they'd much rather have an ioctl
that does exactly the same..

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: KASAN: use-after-free Read in sock_release
  2017-11-30 13:18       ` Christoph Hellwig
@ 2017-11-30 15:46         ` Al Viro
  2018-02-13 19:16           ` Dmitry Vyukov
  0 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2017-11-30 15:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Cong Wang, syzbot, David Miller, LKML,
	Linux Kernel Network Developers, syzkaller-bugs, linux-fsdevel

On Thu, Nov 30, 2017 at 05:18:33AM -0800, Christoph Hellwig wrote:
> On Thu, Nov 30, 2017 at 02:07:19AM +0000, Al Viro wrote:
> > Incidentally, grepping for sys_close() shows another piece of fun in
> > net/netfilter/xt_bpf.c.  Folks, ONCE DESCRIPTOR IS INSTALLED, THAT'S
> > IT; THERE'S NO REMOVING IT ON FAILURE EXITS.  sys_close() should
> > never, ever be used that way.  Sigh...
> 
> Would be great do unexport the thing.  Except that we also have
> binfmt_misc (which looks legit) and autofs4, which on crack decided
> that close() isn't a fun syscall, they'd much rather have an ioctl
> that does exactly the same..

Yes, since binfmt_misc one is guaranteed that its descriptor table is
not shared - all callchains go through do_execveat_common(), where we'd
use unshare_files().  autofs one is... not in good taste, but still
safe; there the descriptor is preexisting and it's essentially a weird
way of spelling close(2).  References from syscall tables are, of course,
OK.  init/*.c uses are done pretty much from userland - they could have
been straight syscalls, if not for the lack of klibc in kernel tree.
Everything else, though...

IMO we need a whack-a-mole list somewhere; "new callers of sys_close()
anywhere outside of init/* and syscall tables" definitely should be
on it...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: KASAN: use-after-free Read in sock_release
  2017-11-30 15:46         ` Al Viro
@ 2018-02-13 19:16           ` Dmitry Vyukov
  0 siblings, 0 replies; 9+ messages in thread
From: Dmitry Vyukov @ 2018-02-13 19:16 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Linus Torvalds, Cong Wang, syzbot,
	David Miller, LKML, Linux Kernel Network Developers,
	syzkaller-bugs, linux-fsdevel

On Thu, Nov 30, 2017 at 4:46 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Thu, Nov 30, 2017 at 05:18:33AM -0800, Christoph Hellwig wrote:
>> On Thu, Nov 30, 2017 at 02:07:19AM +0000, Al Viro wrote:
>> > Incidentally, grepping for sys_close() shows another piece of fun in
>> > net/netfilter/xt_bpf.c.  Folks, ONCE DESCRIPTOR IS INSTALLED, THAT'S
>> > IT; THERE'S NO REMOVING IT ON FAILURE EXITS.  sys_close() should
>> > never, ever be used that way.  Sigh...
>>
>> Would be great do unexport the thing.  Except that we also have
>> binfmt_misc (which looks legit) and autofs4, which on crack decided
>> that close() isn't a fun syscall, they'd much rather have an ioctl
>> that does exactly the same..
>
> Yes, since binfmt_misc one is guaranteed that its descriptor table is
> not shared - all callchains go through do_execveat_common(), where we'd
> use unshare_files().  autofs one is... not in good taste, but still
> safe; there the descriptor is preexisting and it's essentially a weird
> way of spelling close(2).  References from syscall tables are, of course,
> OK.  init/*.c uses are done pretty much from userland - they could have
> been straight syscalls, if not for the lack of klibc in kernel tree.
> Everything else, though...
>
> IMO we need a whack-a-mole list somewhere; "new callers of sys_close()
> anywhere outside of init/* and syscall tables" definitely should be
> on it...


#syz fix: fix kcm_clone()

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-02-13 19:17 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <94eb2c19e756c0119b055f1afbd0@google.com>
2017-11-29 19:37 ` KASAN: use-after-free Read in sock_release Cong Wang
2017-11-29 20:24   ` Linus Torvalds
2017-11-30  2:07     ` Al Viro
2017-11-30  4:16       ` Al Viro
2017-11-30 13:18       ` Christoph Hellwig
2017-11-30 15:46         ` Al Viro
2018-02-13 19:16           ` Dmitry Vyukov
2017-11-29 20:49   ` Eric Dumazet
2017-11-30  2:24   ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).