* Use-after-free in ep_remove_wait_queue
@ 2015-10-12 11:07 Dmitry Vyukov
2015-10-12 12:02 ` Michal Kubecek
0 siblings, 1 reply; 43+ messages in thread
From: Dmitry Vyukov @ 2015-10-12 11:07 UTC (permalink / raw)
To: Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa,
dhowells, paul, salyzyn, sds, ying.xue, netdev
Cc: syzkaller, Kostya Serebryany, Alexander Potapenko,
Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook
Hello,
The following program causes use-after-in kernel:
// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include <syscall.h>
#include <string.h>
#include <stdint.h>
int main()
{
long r0 = syscall(SYS_mmap, 0x20001000ul, 0x1000ul, 0x3ul,
0x32ul, 0xfffffffffffffffful, 0x0ul);
long r1 = syscall(SYS_mmap, 0x20000000ul, 0x1000ul, 0x3ul,
0x32ul, 0xfffffffffffffffful, 0x0ul);
long r2 = syscall(SYS_socketpair, 0x1ul, 0x3ul, 0x1ul, 0x20000ffcul);
long r3 = -1;
if (r2 != -1)
r3 = *(uint32_t*)0x20000ffc;
long r4 = -1;
if (r2 != -1)
r4 = *(uint32_t*)0x20001000;
long r5 = syscall(SYS_epoll_create, 0x1ul);
long r6 = syscall(SYS_mmap, 0x20003000ul, 0x1000ul, 0x3ul,
0x32ul, 0xfffffffffffffffful, 0x0ul);
long r7 = syscall(SYS_dup3, r4, r3, 0x80000ul);
*(uint32_t*)0x20003000 = 0x6;
*(uint32_t*)0x20003004 = 0x2;
*(uint64_t*)0x20003008 = 0x6;
long r11 = syscall(SYS_epoll_ctl, r5, 0x1ul, r3, 0x20003000ul);
long r12 = syscall(SYS_mmap, 0x20002000ul, 0x1000ul, 0x3ul,
0x32ul, 0xfffffffffffffffful, 0x0ul);
memcpy((void*)0x20002000, "\x00", 1);
long r14 = syscall(SYS_write, r7, 0x20002000ul, 0x1ul);
*(uint64_t*)0x20001a4d = 0x5;
long r16 = syscall(SYS_epoll_pwait, r5, 0x20004000ul, 0x1ul,
0x1ul, 0x20001a4dul, 0x8ul);
return 0;
}
on commit bcee19f424a0d8c26ecf2607b73c690802658b29
(git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git)
BUG: KASan: use after free in remove_wait_queue+0xfb/0x120 at addr
ffff88002db3cf50
Write of size 8 by task syzkaller_execu/10568
================================================================
BUG UNIX (Not tainted): kasan: bad access detected
-----------------------------------------------------------------------------
Call Trace:
[< inline >] __list_del include/linux/list.h:89
[< inline >] list_del include/linux/list.h:107
[< inline >] __remove_wait_queue include/linux/wait.h:145
[<ffffffff811bfbab>] remove_wait_queue+0xfb/0x120 kernel/sched/wait.c:50
[< inline >] ep_remove_wait_queue fs/eventpoll.c:524
[<ffffffff8154094b>] ep_unregister_pollwait.isra.7+0x10b/0x1c0
fs/eventpoll.c:542
[<ffffffff81541417>] ep_free+0x97/0x190 fs/eventpoll.c:759 (discriminator 3)
[<ffffffff81541554>] ep_eventpoll_release+0x44/0x60 fs/eventpoll.c:791
[<ffffffff814772ad>] __fput+0x21d/0x6e0 fs/file_table.c:208
[<ffffffff814777f5>] ____fput+0x15/0x20 fs/file_table.c:244
[<ffffffff81150094>] task_work_run+0x164/0x1f0 kernel/task_work.c:115
(discriminator 1)
[< inline >] exit_task_work include/linux/task_work.h:21
[<ffffffff810fe7ae>] do_exit+0xa4e/0x2d40 kernel/exit.c:746
[<ffffffff81104426>] do_group_exit+0xf6/0x340 kernel/exit.c:874
[< inline >] SYSC_exit_group kernel/exit.c:885
[<ffffffff8110468d>] SyS_exit_group+0x1d/0x20 kernel/exit.c:883
[<ffffffff82e424d1>] entry_SYSCALL_64_fastpath+0x31/0x95
arch/x86/entry/entry_64.S:187
INFO: Allocated in sk_prot_alloc+0x69/0x340 age=6 cpu=1 pid=10568
[< none >] __slab_alloc+0x426/0x470 mm/slub.c:2402
[< inline >] slab_alloc_node mm/slub.c:2470
[< inline >] slab_alloc mm/slub.c:2512
[< none >] kmem_cache_alloc+0x10d/0x140 mm/slub.c:2517
[< none >] sk_prot_alloc+0x69/0x340 net/core/sock.c:1329
[< none >] sk_alloc+0x33/0x280 net/core/sock.c:1404
[< none >] unix_create1+0x5e/0x3d0 net/unix/af_unix.c:648
[< none >] unix_create+0x14d/0x1f0 net/unix/af_unix.c:707
[< none >] __sock_create+0x1f1/0x4c0 net/socket.c:1169
[< inline >] sock_create net/socket.c:1209
[< inline >] SYSC_socketpair net/socket.c:1281
[< none >] SyS_socketpair+0x10f/0x4d0 net/socket.c:1260
[< none >] entry_SYSCALL_64_fastpath+0x31/0x95
arch/x86/entry/entry_64.S:187
INFO: Freed in sk_destruct+0x2e9/0x400 age=9 cpu=1 pid=10568
[< none >] __slab_free+0x12d/0x280 mm/slub.c:2587 (discriminator 1)
[< inline >] slab_free mm/slub.c:2736
[< none >] kmem_cache_free+0x161/0x180 mm/slub.c:2745
[< inline >] sk_prot_free net/core/sock.c:1374
[< none >] sk_destruct+0x2e9/0x400 net/core/sock.c:1452
[< none >] __sk_free+0x57/0x200 net/core/sock.c:1460
[< none >] sk_free+0x30/0x40 net/core/sock.c:1471
[< inline >] sock_put include/net/sock.h:1593
[< none >] unix_dgram_sendmsg+0xeaf/0x1100 net/unix/af_unix.c:1571
[< inline >] sock_sendmsg_nosec net/socket.c:610
[< none >] sock_sendmsg+0xca/0x110 net/socket.c:620
[< none >] sock_write_iter+0x216/0x3a0 net/socket.c:819
[< inline >] new_sync_write fs/read_write.c:478
[< none >] __vfs_write+0x2ed/0x3d0 fs/read_write.c:491
[< none >] vfs_write+0x173/0x4a0 fs/read_write.c:538
[< inline >] SYSC_write fs/read_write.c:585
[< none >] SyS_write+0x108/0x220 fs/read_write.c:577
[< none >] entry_SYSCALL_64_fastpath+0x31/0x95
arch/x86/entry/entry_64.S:187
INFO: Slab 0xffffea0000b6ce00 objects=26 used=4 fp=0xffff88002db3cc00
flags=0x100000000004080
INFO: Object 0xffff88002db3cc00 @offset=19456 fp=0xffff88002db3c280
CPU: 1 PID: 10568 Comm: syzkaller_execu Tainted: G B 4.3.0-rc2+ #12
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
00000000ffffffff ffff880026647a80 ffffffff81a425a0 ffff88002e17c000
ffff88002db3cc00 ffff88002db38000 ffff880026647ab0 ffffffff8145a554
ffff88002e17c000 ffffea0000b6ce00 ffff88002db3cc00 ffff88002db3cf40
Memory state around the buggy address:
ffff88002db3ce00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88002db3ce80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88002db3cf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88002db3cf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88002db3d000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
=========================================================
Found with syzkaller fuzzer.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Use-after-free in ep_remove_wait_queue 2015-10-12 11:07 Use-after-free in ep_remove_wait_queue Dmitry Vyukov @ 2015-10-12 12:02 ` Michal Kubecek 2015-10-12 12:14 ` Eric Dumazet 0 siblings, 1 reply; 43+ messages in thread From: Michal Kubecek @ 2015-10-12 12:02 UTC (permalink / raw) To: Dmitry Vyukov Cc: Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, dhowells, paul, salyzyn, sds, ying.xue, netdev, syzkaller, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause, Jason Baron, Rainer Weikusat On Mon, Oct 12, 2015 at 01:07:55PM +0200, Dmitry Vyukov wrote: > Hello, > > The following program causes use-after-in kernel: > ... > long r0 = syscall(SYS_mmap, 0x20001000ul, 0x1000ul, 0x3ul, > 0x32ul, 0xfffffffffffffffful, 0x0ul); > long r1 = syscall(SYS_mmap, 0x20000000ul, 0x1000ul, 0x3ul, > 0x32ul, 0xfffffffffffffffful, 0x0ul); > long r2 = syscall(SYS_socketpair, 0x1ul, 0x3ul, 0x1ul, 0x20000ffcul); > long r3 = -1; > if (r2 != -1) > r3 = *(uint32_t*)0x20000ffc; > long r4 = -1; > if (r2 != -1) > r4 = *(uint32_t*)0x20001000; > long r5 = syscall(SYS_epoll_create, 0x1ul); > long r6 = syscall(SYS_mmap, 0x20003000ul, 0x1000ul, 0x3ul, > 0x32ul, 0xfffffffffffffffful, 0x0ul); > long r7 = syscall(SYS_dup3, r4, r3, 0x80000ul); > *(uint32_t*)0x20003000 = 0x6; > *(uint32_t*)0x20003004 = 0x2; > *(uint64_t*)0x20003008 = 0x6; > long r11 = syscall(SYS_epoll_ctl, r5, 0x1ul, r3, 0x20003000ul); > long r12 = syscall(SYS_mmap, 0x20002000ul, 0x1000ul, 0x3ul, > 0x32ul, 0xfffffffffffffffful, 0x0ul); > memcpy((void*)0x20002000, "\x00", 1); > long r14 = syscall(SYS_write, r7, 0x20002000ul, 0x1ul); > *(uint64_t*)0x20001a4d = 0x5; > long r16 = syscall(SYS_epoll_pwait, r5, 0x20004000ul, 0x1ul, > 0x1ul, 0x20001a4dul, 0x8ul); > return 0; ... > Call Trace: > [< inline >] __list_del include/linux/list.h:89 > [< inline >] list_del include/linux/list.h:107 > [< inline >] __remove_wait_queue include/linux/wait.h:145 > [<ffffffff811bfbab>] remove_wait_queue+0xfb/0x120 kernel/sched/wait.c:50 > [< inline >] ep_remove_wait_queue fs/eventpoll.c:524 > [<ffffffff8154094b>] ep_unregister_pollwait.isra.7+0x10b/0x1c0 > fs/eventpoll.c:542 > [<ffffffff81541417>] ep_free+0x97/0x190 fs/eventpoll.c:759 (discriminator 3) > [<ffffffff81541554>] ep_eventpoll_release+0x44/0x60 fs/eventpoll.c:791 > [<ffffffff814772ad>] __fput+0x21d/0x6e0 fs/file_table.c:208 > [<ffffffff814777f5>] ____fput+0x15/0x20 fs/file_table.c:244 > [<ffffffff81150094>] task_work_run+0x164/0x1f0 kernel/task_work.c:115 > (discriminator 1) > [< inline >] exit_task_work include/linux/task_work.h:21 > [<ffffffff810fe7ae>] do_exit+0xa4e/0x2d40 kernel/exit.c:746 > [<ffffffff81104426>] do_group_exit+0xf6/0x340 kernel/exit.c:874 > [< inline >] SYSC_exit_group kernel/exit.c:885 > [<ffffffff8110468d>] SyS_exit_group+0x1d/0x20 kernel/exit.c:883 > [<ffffffff82e424d1>] entry_SYSCALL_64_fastpath+0x31/0x95 > arch/x86/entry/entry_64.S:187 > > INFO: Allocated in sk_prot_alloc+0x69/0x340 age=6 cpu=1 pid=10568 > [< none >] __slab_alloc+0x426/0x470 mm/slub.c:2402 > [< inline >] slab_alloc_node mm/slub.c:2470 > [< inline >] slab_alloc mm/slub.c:2512 > [< none >] kmem_cache_alloc+0x10d/0x140 mm/slub.c:2517 > [< none >] sk_prot_alloc+0x69/0x340 net/core/sock.c:1329 > [< none >] sk_alloc+0x33/0x280 net/core/sock.c:1404 > [< none >] unix_create1+0x5e/0x3d0 net/unix/af_unix.c:648 > [< none >] unix_create+0x14d/0x1f0 net/unix/af_unix.c:707 > [< none >] __sock_create+0x1f1/0x4c0 net/socket.c:1169 > [< inline >] sock_create net/socket.c:1209 > [< inline >] SYSC_socketpair net/socket.c:1281 > [< none >] SyS_socketpair+0x10f/0x4d0 net/socket.c:1260 > [< none >] entry_SYSCALL_64_fastpath+0x31/0x95 > arch/x86/entry/entry_64.S:187 > > INFO: Freed in sk_destruct+0x2e9/0x400 age=9 cpu=1 pid=10568 > [< none >] __slab_free+0x12d/0x280 mm/slub.c:2587 (discriminator 1) > [< inline >] slab_free mm/slub.c:2736 > [< none >] kmem_cache_free+0x161/0x180 mm/slub.c:2745 > [< inline >] sk_prot_free net/core/sock.c:1374 > [< none >] sk_destruct+0x2e9/0x400 net/core/sock.c:1452 > [< none >] __sk_free+0x57/0x200 net/core/sock.c:1460 > [< none >] sk_free+0x30/0x40 net/core/sock.c:1471 > [< inline >] sock_put include/net/sock.h:1593 > [< none >] unix_dgram_sendmsg+0xeaf/0x1100 net/unix/af_unix.c:1571 > [< inline >] sock_sendmsg_nosec net/socket.c:610 > [< none >] sock_sendmsg+0xca/0x110 net/socket.c:620 > [< none >] sock_write_iter+0x216/0x3a0 net/socket.c:819 > [< inline >] new_sync_write fs/read_write.c:478 > [< none >] __vfs_write+0x2ed/0x3d0 fs/read_write.c:491 > [< none >] vfs_write+0x173/0x4a0 fs/read_write.c:538 > [< inline >] SYSC_write fs/read_write.c:585 > [< none >] SyS_write+0x108/0x220 fs/read_write.c:577 > [< none >] entry_SYSCALL_64_fastpath+0x31/0x95 > arch/x86/entry/entry_64.S:187 Probably the issue discussed in http://thread.gmane.org/gmane.linux.kernel/2057497/ and previous related threads. Michal Kubecek ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Use-after-free in ep_remove_wait_queue 2015-10-12 12:02 ` Michal Kubecek @ 2015-10-12 12:14 ` Eric Dumazet 2015-10-12 12:17 ` Dmitry Vyukov 0 siblings, 1 reply; 43+ messages in thread From: Eric Dumazet @ 2015-10-12 12:14 UTC (permalink / raw) To: Michal Kubecek Cc: Dmitry Vyukov, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, dhowells, paul, salyzyn, sds, ying.xue, netdev, syzkaller, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause, Jason Baron, Rainer Weikusat On Mon, 2015-10-12 at 14:02 +0200, Michal Kubecek wrote: > Probably the issue discussed in > > http://thread.gmane.org/gmane.linux.kernel/2057497/ > > and previous related threads. > Same issue, but Dmitry apparently did not trust me. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Use-after-free in ep_remove_wait_queue 2015-10-12 12:14 ` Eric Dumazet @ 2015-10-12 12:17 ` Dmitry Vyukov 2015-11-06 13:06 ` Dmitry Vyukov 0 siblings, 1 reply; 43+ messages in thread From: Dmitry Vyukov @ 2015-10-12 12:17 UTC (permalink / raw) To: syzkaller Cc: Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, dhowells, paul, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause, Jason Baron, Rainer Weikusat On Mon, Oct 12, 2015 at 2:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Mon, 2015-10-12 at 14:02 +0200, Michal Kubecek wrote: > >> Probably the issue discussed in >> >> http://thread.gmane.org/gmane.linux.kernel/2057497/ >> >> and previous related threads. >> > > Same issue, but Dmitry apparently did not trust me. Just wanted to make sure. Let's consider this as fixed. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Use-after-free in ep_remove_wait_queue 2015-10-12 12:17 ` Dmitry Vyukov @ 2015-11-06 13:06 ` Dmitry Vyukov 2015-11-06 14:58 ` Jason Baron 0 siblings, 1 reply; 43+ messages in thread From: Dmitry Vyukov @ 2015-11-06 13:06 UTC (permalink / raw) To: syzkaller Cc: Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause, Jason Baron, Rainer Weikusat On Mon, Oct 12, 2015 at 2:17 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Mon, Oct 12, 2015 at 2:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> On Mon, 2015-10-12 at 14:02 +0200, Michal Kubecek wrote: >> >>> Probably the issue discussed in >>> >>> http://thread.gmane.org/gmane.linux.kernel/2057497/ >>> >>> and previous related threads. >>> >> >> Same issue, but Dmitry apparently did not trust me. > > Just wanted to make sure. Let's consider this as fixed. Is it meant to be fixed now? I am still seeing use-after-free reports in ep_unregister_pollwait on d1e41ff11941784f469f17795a4d9425c2eb4b7a (Nov 5). ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Use-after-free in ep_remove_wait_queue 2015-11-06 13:06 ` Dmitry Vyukov @ 2015-11-06 14:58 ` Jason Baron 2015-11-06 15:15 ` Rainer Weikusat 0 siblings, 1 reply; 43+ messages in thread From: Jason Baron @ 2015-11-06 14:58 UTC (permalink / raw) To: Dmitry Vyukov, syzkaller, Rainer Weikusat Cc: Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause On 11/06/2015 08:06 AM, Dmitry Vyukov wrote: > On Mon, Oct 12, 2015 at 2:17 PM, Dmitry Vyukov <dvyukov@google.com> wrote: >> On Mon, Oct 12, 2015 at 2:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >>> On Mon, 2015-10-12 at 14:02 +0200, Michal Kubecek wrote: >>> >>>> Probably the issue discussed in >>>> >>>> http://thread.gmane.org/gmane.linux.kernel/2057497/ >>>> >>>> and previous related threads. >>>> >>> >>> Same issue, but Dmitry apparently did not trust me. >> >> Just wanted to make sure. Let's consider this as fixed. > > > Is it meant to be fixed now? > > I am still seeing use-after-free reports in ep_unregister_pollwait on > d1e41ff11941784f469f17795a4d9425c2eb4b7a (Nov 5). > Hi, Indeed - No formal patch has been posted for this yet - we've agreed that Rainer is going to post the patch for this issue. Thanks, -Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Use-after-free in ep_remove_wait_queue 2015-11-06 14:58 ` Jason Baron @ 2015-11-06 15:15 ` Rainer Weikusat 2015-11-09 14:40 ` [PATCH] unix: avoid use-after-free " Rainer Weikusat 0 siblings, 1 reply; 43+ messages in thread From: Rainer Weikusat @ 2015-11-06 15:15 UTC (permalink / raw) To: Jason Baron Cc: Dmitry Vyukov, syzkaller, Rainer Weikusat, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause Jason Baron <jbaron@akamai.com> writes: > On 11/06/2015 08:06 AM, Dmitry Vyukov wrote: >> On Mon, Oct 12, 2015 at 2:17 PM, Dmitry Vyukov <dvyukov@google.com> wrote: >>> On Mon, Oct 12, 2015 at 2:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >>>> On Mon, 2015-10-12 at 14:02 +0200, Michal Kubecek wrote: >>>> >>>>> Probably the issue discussed in >>>>> >>>>> http://thread.gmane.org/gmane.linux.kernel/2057497/ >>>>> >>>>> and previous related threads. >>>>> >>>> >>>> Same issue, but Dmitry apparently did not trust me. >>> >>> Just wanted to make sure. Let's consider this as fixed. >> >> >> Is it meant to be fixed now? >> >> I am still seeing use-after-free reports in ep_unregister_pollwait on >> d1e41ff11941784f469f17795a4d9425c2eb4b7a (Nov 5). >> > > Indeed - No formal patch has been posted for this yet - we've agreed > that Rainer is going to post the patch for this issue. I should be able to put some more work into this on Sunday or possibly (but unlikely) later today (supposed to mean that I'll also actually do that). ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-06 15:15 ` Rainer Weikusat @ 2015-11-09 14:40 ` Rainer Weikusat 2015-11-09 18:25 ` David Miller ` (2 more replies) 0 siblings, 3 replies; 43+ messages in thread From: Rainer Weikusat @ 2015-11-09 14:40 UTC (permalink / raw) To: Rainer Weikusat Cc: Jason Baron, Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause An AF_UNIX datagram socket being the client in an n:1 association with some server socket is only allowed to send messages to the server if the receive queue of this socket contains at most sk_max_ack_backlog datagrams. This implies that prospective writers might be forced to go to sleep despite none of the message presently enqueued on the server receive queue were sent by them. In order to ensure that these will be woken up once space becomes again available, the present unix_dgram_poll routine does a second sock_poll_wait call with the peer_wait wait queue of the server socket as queue argument (unix_dgram_recvmsg does a wake up on this queue after a datagram was received). This is inherently problematic because the server socket is only guaranteed to remain alive for as long as the client still holds a reference to it. In case the connection is dissolved via connect or by the dead peer detection logic in unix_dgram_sendmsg, the server socket may be freed despite "the polling mechanism" (in particular, epoll) still has a pointer to the corresponding peer_wait queue. There's no way to forcibly deregister a wait queue with epoll. Based on an idea by Jason Baron, the patch below changes the code such that a wait_queue_t belonging to the client socket is enqueued on the peer_wait queue of the server whenever the peer receive queue full condition is detected by either a sendmsg or a poll. A wake up on the peer queue is then relayed to the ordinary wait queue of the client socket via wake function. The connection to the peer wait queue is again dissolved if either a wake up is about to be relayed or the client socket reconnects or a dead peer is detected or the client socket is itself closed. This enables removing the second sock_poll_wait from unix_dgram_poll, thus avoiding the use-after-free, while still ensuring that no blocked writer sleeps forever. Signed-off-by: Rainer Weikusuat <rweikusat@mobileactivedefense.com> --- "Why do things always end up messy and complicated"? Patch is against 4.3.0. diff --git a/include/net/af_unix.h b/include/net/af_unix.h index b36d837..2a91a05 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -62,6 +62,7 @@ struct unix_sock { #define UNIX_GC_CANDIDATE 0 #define UNIX_GC_MAYBE_CYCLE 1 struct socket_wq peer_wq; + wait_queue_t peer_wake; }; static inline struct unix_sock *unix_sk(const struct sock *sk) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 94f6582..4f263e3 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -326,6 +326,122 @@ found: return s; } +/* Support code for asymmetrically connected dgram sockets + * + * If a datagram socket is connected to a socket not itself connected + * to the first socket (eg, /dev/log), clients may only enqueue more + * messages if the present receive queue of the server socket is not + * "too large". This means there's a second writeability condition + * poll and sendmsg need to test. The dgram recv code will do a wake + * up on the peer_wait wait queue of a socket upon reception of a + * datagram which needs to be propagated to sleeping would-be writers + * since these might not have sent anything so far. This can't be + * accomplished via poll_wait because the lifetime of the server + * socket might be less than that of its clients if these break their + * association with it or if the server socket is closed while clients + * are still connected to it and there's no way to inform "a polling + * implementation" that it should let go of a certain wait queue + * + * In order to propagate a wake up, a wait_queue_t of the client + * socket is enqueued on the peer_wait queue of the server socket + * whose wake function does a wake_up on the ordinary client socket + * wait queue. This connection is established whenever a write (or + * poll for write) hit the flow control condition and broken when the + * association to the server socket is dissolved or after a wake up + * was relayed. + */ + +static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags, + void *key) +{ + struct unix_sock *u; + wait_queue_head_t *u_sleep; + + u = container_of(q, struct unix_sock, peer_wake); + + __remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait, + &u->peer_wake); + u->peer_wake.private = NULL; + + /* relaying can only happen while the wq still exists */ + u_sleep = sk_sleep(&u->sk); + if (u_sleep) + wake_up_interruptible_poll(u_sleep, key); + + return 0; +} + +static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other) +{ + struct unix_sock *u, *u_other; + int rc; + + u = unix_sk(sk); + u_other = unix_sk(other); + rc = 0; + + spin_lock(&u_other->peer_wait.lock); + + if (!u->peer_wake.private) { + u->peer_wake.private = other; + __add_wait_queue(&u_other->peer_wait, &u->peer_wake); + + rc = 1; + } + + spin_unlock(&u_other->peer_wait.lock); + return rc; +} + +static int unix_dgram_peer_wake_disconnect(struct sock *sk, struct sock *other) +{ + struct unix_sock *u, *u_other; + int rc; + + u = unix_sk(sk); + u_other = unix_sk(other); + rc = 0; + + spin_lock(&u_other->peer_wait.lock); + + if (u->peer_wake.private == other) { + __remove_wait_queue(&u_other->peer_wait, &u->peer_wake); + u->peer_wake.private = NULL; + + rc = 1; + } + + spin_unlock(&u_other->peer_wait.lock); + return rc; +} + +/* Needs sk unix state lock. Otherwise lockless check if data can (likely) + * be sent. + */ +static inline int unix_dgram_peer_recv_ready(struct sock *sk, + struct sock *other) +{ + return unix_peer(other) == sk || !unix_recvq_full(other); +} + +/* Needs sk unix state lock. After recv_ready indicated not ready, + * establish peer_wait connection if still needed. + */ +static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other) +{ + int connected; + + connected = unix_dgram_peer_wake_connect(sk, other); + + if (unix_recvq_full(other)) + return 1; + + if (connected) + unix_dgram_peer_wake_disconnect(sk, other); + + return 0; +} + static inline int unix_writable(struct sock *sk) { return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf; @@ -430,6 +546,8 @@ static void unix_release_sock(struct sock *sk, int embrion) skpair->sk_state_change(skpair); sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP); } + + unix_dgram_peer_wake_disconnect(sk, skpair); sock_put(skpair); /* It may now die */ unix_peer(sk) = NULL; } @@ -664,6 +782,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern) INIT_LIST_HEAD(&u->link); mutex_init(&u->readlock); /* single task reading lock */ init_waitqueue_head(&u->peer_wait); + init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay); unix_insert_socket(unix_sockets_unbound(sk), sk); out: if (sk == NULL) @@ -1031,6 +1150,13 @@ restart: if (unix_peer(sk)) { struct sock *old_peer = unix_peer(sk); unix_peer(sk) = other; + + if (unix_dgram_peer_wake_disconnect(sk, other)) + wake_up_interruptible_poll(sk_sleep(sk), + POLLOUT | + POLLWRNORM | + POLLWRBAND); + unix_state_double_unlock(sk, other); if (other != old_peer) @@ -1565,6 +1691,13 @@ restart: unix_state_lock(sk); if (unix_peer(sk) == other) { unix_peer(sk) = NULL; + + if (unix_dgram_peer_wake_disconnect(sk, other)) + wake_up_interruptible_poll(sk_sleep(sk), + POLLOUT | + POLLWRNORM | + POLLWRBAND); + unix_state_unlock(sk); unix_dgram_disconnected(sk, other); @@ -1590,10 +1723,14 @@ restart: goto out_unlock; } - if (unix_peer(other) != sk && unix_recvq_full(other)) { + if (!unix_dgram_peer_recv_ready(sk, other)) { if (!timeo) { - err = -EAGAIN; - goto out_unlock; + if (unix_dgram_peer_wake_me(sk, other)) { + err = -EAGAIN; + goto out_unlock; + } + + goto restart; } timeo = unix_wait_for_peer(other, timeo); @@ -2453,14 +2590,16 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock, return mask; writable = unix_writable(sk); - other = unix_peer_get(sk); - if (other) { - if (unix_peer(other) != sk) { - sock_poll_wait(file, &unix_sk(other)->peer_wait, wait); - if (unix_recvq_full(other)) - writable = 0; - } - sock_put(other); + if (writable) { + unix_state_lock(sk); + + other = unix_peer(sk); + if (other && + !unix_dgram_peer_recv_ready(sk, other) && + unix_dgram_peer_wake_me(sk, other)) + writable = 0; + + unix_state_unlock(sk); } if (writable) ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-09 14:40 ` [PATCH] unix: avoid use-after-free " Rainer Weikusat @ 2015-11-09 18:25 ` David Miller 2015-11-10 17:16 ` Rainer Weikusat 2015-11-09 22:44 ` Jason Baron 2015-11-10 21:55 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue Rainer Weikusat 2 siblings, 1 reply; 43+ messages in thread From: David Miller @ 2015-11-09 18:25 UTC (permalink / raw) To: rweikusat Cc: jbaron, dvyukov, syzkaller, mkubecek, viro, linux-fsdevel, linux-kernel, hannes, dhowells, paul, salyzyn, sds, ying.xue, netdev, kcc, glider, andreyknvl, sasha.levin, jln, keescook, minipli From: Rainer Weikusat <rweikusat@mobileactivedefense.com> Date: Mon, 09 Nov 2015 14:40:48 +0000 > + __remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait, > + &u->peer_wake); This is more simply: __remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait, q); > +static inline int unix_dgram_peer_recv_ready(struct sock *sk, > + struct sock *other) Please do not us the inline keyword in foo.c files, let the compiler decide. Thanks. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-09 18:25 ` David Miller @ 2015-11-10 17:16 ` Rainer Weikusat 0 siblings, 0 replies; 43+ messages in thread From: Rainer Weikusat @ 2015-11-10 17:16 UTC (permalink / raw) To: David Miller Cc: rweikusat, jbaron, dvyukov, syzkaller, mkubecek, viro, linux-fsdevel, linux-kernel, hannes, dhowells, paul, salyzyn, sds, ying.xue, netdev, kcc, glider, andreyknvl, sasha.levin, jln, keescook, minipli David Miller <davem@davemloft.net> writes: > From: Rainer Weikusat <rweikusat@mobileactivedefense.com> > Date: Mon, 09 Nov 2015 14:40:48 +0000 > >> + __remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait, >> + &u->peer_wake); > > This is more simply: > > __remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait, q); Thank you for pointing this out. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-09 14:40 ` [PATCH] unix: avoid use-after-free " Rainer Weikusat 2015-11-09 18:25 ` David Miller @ 2015-11-09 22:44 ` Jason Baron 2015-11-10 17:38 ` Rainer Weikusat 2015-11-10 21:55 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue Rainer Weikusat 2 siblings, 1 reply; 43+ messages in thread From: Jason Baron @ 2015-11-09 22:44 UTC (permalink / raw) To: Rainer Weikusat Cc: Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause On 11/09/2015 09:40 AM, Rainer Weikusat wrote: > An AF_UNIX datagram socket being the client in an n:1 association with > some server socket is only allowed to send messages to the server if the > receive queue of this socket contains at most sk_max_ack_backlog > datagrams. This implies that prospective writers might be forced to go > to sleep despite none of the message presently enqueued on the server > receive queue were sent by them. In order to ensure that these will be > woken up once space becomes again available, the present unix_dgram_poll > routine does a second sock_poll_wait call with the peer_wait wait queue > of the server socket as queue argument (unix_dgram_recvmsg does a wake > up on this queue after a datagram was received). This is inherently > problematic because the server socket is only guaranteed to remain alive > for as long as the client still holds a reference to it. In case the > connection is dissolved via connect or by the dead peer detection logic > in unix_dgram_sendmsg, the server socket may be freed despite "the > polling mechanism" (in particular, epoll) still has a pointer to the > corresponding peer_wait queue. There's no way to forcibly deregister a > wait queue with epoll. > > Based on an idea by Jason Baron, the patch below changes the code such > that a wait_queue_t belonging to the client socket is enqueued on the > peer_wait queue of the server whenever the peer receive queue full > condition is detected by either a sendmsg or a poll. A wake up on the > peer queue is then relayed to the ordinary wait queue of the client > socket via wake function. The connection to the peer wait queue is again > dissolved if either a wake up is about to be relayed or the client > socket reconnects or a dead peer is detected or the client socket is > itself closed. This enables removing the second sock_poll_wait from > unix_dgram_poll, thus avoiding the use-after-free, while still ensuring > that no blocked writer sleeps forever. > > Signed-off-by: Rainer Weikusuat <rweikusat@mobileactivedefense.com> [...] > @@ -1590,10 +1723,14 @@ restart: > goto out_unlock; > } > > - if (unix_peer(other) != sk && unix_recvq_full(other)) { > + if (!unix_dgram_peer_recv_ready(sk, other)) { > if (!timeo) { > - err = -EAGAIN; > - goto out_unlock; > + if (unix_dgram_peer_wake_me(sk, other)) { > + err = -EAGAIN; > + goto out_unlock; > + } > + > + goto restart; > } So this will cause 'unix_state_lock(other) to be called twice in a row if we 'goto restart' (and hence will softlock the box). It just needs a 'unix_state_unlock(other);' before the 'goto restart'. I also tested this patch with a single unix server and 200 client threads doing roughly epoll() followed by write() until -EAGAIN in a loop. The throughput for the test was roughly the same as current upstream, but the cpu usage was a lot higher. I think its b/c this patch takes the server wait queue lock in the _poll() routine. This causes a lot of contention. The previous patch you posted for this where you did not clear the wait queue on every wakeup and thus didn't need the queue lock in poll() (unless we were adding to it), performed much better. However, the previous patch which tested better didn't add to the remote queue when it was full on sendmsg() - so it wouldn't be correct for epoll ET. Adding to the remote queue for every sendmsg() that fails does seem undesirable, if we aren't even doing poll(). So I'm not sure if just going back to the previous patch is a great option either....I'm also not sure how realistic the test case I have is. It would be great if we had some other workloads to test against. Thanks, -Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-09 22:44 ` Jason Baron @ 2015-11-10 17:38 ` Rainer Weikusat 2015-11-22 21:43 ` alternate queueing mechanism (was: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue) Rainer Weikusat 0 siblings, 1 reply; 43+ messages in thread From: Rainer Weikusat @ 2015-11-10 17:38 UTC (permalink / raw) To: Jason Baron Cc: Rainer Weikusat, Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause Jason Baron <jbaron@akamai.com> writes: > On 11/09/2015 09:40 AM, Rainer Weikusat wrote: [...] >> - if (unix_peer(other) != sk && unix_recvq_full(other)) { >> + if (!unix_dgram_peer_recv_ready(sk, other)) { >> if (!timeo) { >> - err = -EAGAIN; >> - goto out_unlock; >> + if (unix_dgram_peer_wake_me(sk, other)) { >> + err = -EAGAIN; >> + goto out_unlock; >> + } >> + >> + goto restart; >> } > > > So this will cause 'unix_state_lock(other) to be called twice in a > row if we 'goto restart' (and hence will softlock the box). It just > needs a 'unix_state_unlock(other);' before the 'goto restart'. The goto restart was nonsense to begin with in this code path: Restarting something is necessary after sleeping for some time but for the case above, execution just continues. I've changed that (updated patch should follow 'soon') to if (!unix_dgram_peer_recv_ready(sk, other)) { if (timeo) { timeo = unix_wait_for_peer(other, timeo); err = sock_intr_errno(timeo); if (signal_pending(current)) goto out_free; goto restart; } if (unix_dgram_peer_wake_me(sk, other)) { err = -EAGAIN; goto out_unlock; } } > I also tested this patch with a single unix server and 200 client > threads doing roughly epoll() followed by write() until -EAGAIN in a > loop. The throughput for the test was roughly the same as current > upstream, but the cpu usage was a lot higher. I think its b/c this patch > takes the server wait queue lock in the _poll() routine. This causes a > lot of contention. The previous patch you posted for this where you did > not clear the wait queue on every wakeup and thus didn't need the queue > lock in poll() (unless we were adding to it), performed much better. I'm somewhat unsure what to make of that: The previous patch would also take the wait queue lock whenever poll was about to return 'not writable' because of the length of the server receive queue unless another thread using the same client socket also noticed this and enqueued this same socket already. And "hundreds of clients using a single client socket in order to send data to a single server socket" doesn't seem very realistic to me. Also, this code shouldn't usually be executed as the server should usually be capable of keeping up with the data sent by clients. If it's permanently incapable of that, you're effectively performing a (successful) DDOS against it. Which should result in "high CPU utilization" in either case. It may be possible to improve this by tuning/ changing the flow control mechanism. Out of my head, I'd suggest making the queue longer (the default value is 10) and delaying wake ups until the server actually did catch up, IOW, the receive queue is empty or almost empty. But this ought to be done with a different patch. ^ permalink raw reply [flat|nested] 43+ messages in thread
* alternate queueing mechanism (was: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue) 2015-11-10 17:38 ` Rainer Weikusat @ 2015-11-22 21:43 ` Rainer Weikusat 0 siblings, 0 replies; 43+ messages in thread From: Rainer Weikusat @ 2015-11-22 21:43 UTC (permalink / raw) To: Rainer Weikusat Cc: Jason Baron, Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause Rainer Weikusat <rweikusat@mobileactivedefense.com> writes: [AF_UNIX SOCK_DGRAM throughput] > It may be possible to improve this by tuning/ changing the flow > control mechanism. Out of my head, I'd suggest making the queue longer > (the default value is 10) and delaying wake ups until the server > actually did catch up, IOW, the receive queue is empty or almost > empty. But this ought to be done with a different patch. Because I was curious about the effects, I implemented this using a slightly modified design than the one I originally suggested to account for the different uses of the 'is the receive queue full' check. The code uses a datagram-specific checking function, static int unix_dgram_recvq_full(struct sock const *sk) { struct unix_sock *u; u = unix_sk(sk); if (test_bit(UNIX_DG_FULL, &u->flags)) return 1; if (!unix_recvq_full(sk)) return 0; __set_bit(UNIX_DG_FULL, &u->flags); return 1; } which gets called instead of the other for the n:1 datagram checks and a if (test_bit(UNIX_DG_FULL, &u->flags) && !skb_queue_len(&sk->sk_receive_queue)) { __clear_bit(UNIX_DG_FULL, &u->flags); wake_up_interruptible_sync_poll(&u->peer_wait, POLLOUT | POLLWRNORM | POLLWRBAND); } in unix_dgram_recvmsg to delay wakeups until the queued datagrams have been consumed if the queue overflowed before. This has the additional, nice side effect that wakeups won't ever be done for 1:1 connected datagram sockets (both SOCK_DGRAM and SOCK_SEQPACKET) where they're of no use, anyway. Compared to a 'stock' 4.3 running the test program I posted (supposed to make the overhead noticable by sending lots of small messages), the average number of bytes sent per second increased by about 782,961.79 (ca 764.61K), about 5.32% of the 4.3 number (14,714,579.91), with a fairly simple code change. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-09 14:40 ` [PATCH] unix: avoid use-after-free " Rainer Weikusat 2015-11-09 18:25 ` David Miller 2015-11-09 22:44 ` Jason Baron @ 2015-11-10 21:55 ` Rainer Weikusat 2015-11-11 12:28 ` Hannes Frederic Sowa ` (2 more replies) 2 siblings, 3 replies; 43+ messages in thread From: Rainer Weikusat @ 2015-11-10 21:55 UTC (permalink / raw) To: Rainer Weikusat Cc: Jason Baron, Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause An AF_UNIX datagram socket being the client in an n:1 association with some server socket is only allowed to send messages to the server if the receive queue of this socket contains at most sk_max_ack_backlog datagrams. This implies that prospective writers might be forced to go to sleep despite none of the message presently enqueued on the server receive queue were sent by them. In order to ensure that these will be woken up once space becomes again available, the present unix_dgram_poll routine does a second sock_poll_wait call with the peer_wait wait queue of the server socket as queue argument (unix_dgram_recvmsg does a wake up on this queue after a datagram was received). This is inherently problematic because the server socket is only guaranteed to remain alive for as long as the client still holds a reference to it. In case the connection is dissolved via connect or by the dead peer detection logic in unix_dgram_sendmsg, the server socket may be freed despite "the polling mechanism" (in particular, epoll) still has a pointer to the corresponding peer_wait queue. There's no way to forcibly deregister a wait queue with epoll. Based on an idea by Jason Baron, the patch below changes the code such that a wait_queue_t belonging to the client socket is enqueued on the peer_wait queue of the server whenever the peer receive queue full condition is detected by either a sendmsg or a poll. A wake up on the peer queue is then relayed to the ordinary wait queue of the client socket via wake function. The connection to the peer wait queue is again dissolved if either a wake up is about to be relayed or the client socket reconnects or a dead peer is detected or the client socket is itself closed. This enables removing the second sock_poll_wait from unix_dgram_poll, thus avoiding the use-after-free, while still ensuring that no blocked writer sleeps forever. Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> --- - use wait_queue_t passed as argument to _relay - fix possible deadlock and logic error in _dgram_sendmsg by straightening the control flow ("spaghetti code considered confusing") diff --git a/include/net/af_unix.h b/include/net/af_unix.h index b36d837..2a91a05 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -62,6 +62,7 @@ struct unix_sock { #define UNIX_GC_CANDIDATE 0 #define UNIX_GC_MAYBE_CYCLE 1 struct socket_wq peer_wq; + wait_queue_t peer_wake; }; static inline struct unix_sock *unix_sk(const struct sock *sk) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 94f6582..4297d8e 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -326,6 +326,122 @@ found: return s; } +/* Support code for asymmetrically connected dgram sockets + * + * If a datagram socket is connected to a socket not itself connected + * to the first socket (eg, /dev/log), clients may only enqueue more + * messages if the present receive queue of the server socket is not + * "too large". This means there's a second writeability condition + * poll and sendmsg need to test. The dgram recv code will do a wake + * up on the peer_wait wait queue of a socket upon reception of a + * datagram which needs to be propagated to sleeping would-be writers + * since these might not have sent anything so far. This can't be + * accomplished via poll_wait because the lifetime of the server + * socket might be less than that of its clients if these break their + * association with it or if the server socket is closed while clients + * are still connected to it and there's no way to inform "a polling + * implementation" that it should let go of a certain wait queue + * + * In order to propagate a wake up, a wait_queue_t of the client + * socket is enqueued on the peer_wait queue of the server socket + * whose wake function does a wake_up on the ordinary client socket + * wait queue. This connection is established whenever a write (or + * poll for write) hit the flow control condition and broken when the + * association to the server socket is dissolved or after a wake up + * was relayed. + */ + +static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags, + void *key) +{ + struct unix_sock *u; + wait_queue_head_t *u_sleep; + + u = container_of(q, struct unix_sock, peer_wake); + + __remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait, + q); + u->peer_wake.private = NULL; + + /* relaying can only happen while the wq still exists */ + u_sleep = sk_sleep(&u->sk); + if (u_sleep) + wake_up_interruptible_poll(u_sleep, key); + + return 0; +} + +static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other) +{ + struct unix_sock *u, *u_other; + int rc; + + u = unix_sk(sk); + u_other = unix_sk(other); + rc = 0; + + spin_lock(&u_other->peer_wait.lock); + + if (!u->peer_wake.private) { + u->peer_wake.private = other; + __add_wait_queue(&u_other->peer_wait, &u->peer_wake); + + rc = 1; + } + + spin_unlock(&u_other->peer_wait.lock); + return rc; +} + +static int unix_dgram_peer_wake_disconnect(struct sock *sk, struct sock *other) +{ + struct unix_sock *u, *u_other; + int rc; + + u = unix_sk(sk); + u_other = unix_sk(other); + rc = 0; + + spin_lock(&u_other->peer_wait.lock); + + if (u->peer_wake.private == other) { + __remove_wait_queue(&u_other->peer_wait, &u->peer_wake); + u->peer_wake.private = NULL; + + rc = 1; + } + + spin_unlock(&u_other->peer_wait.lock); + return rc; +} + +/* Needs sk unix state lock. Otherwise lockless check if data can (likely) + * be sent. + */ +static int unix_dgram_peer_recv_ready(struct sock *sk, + struct sock *other) +{ + return unix_peer(other) == sk || !unix_recvq_full(other); +} + +/* Needs sk unix state lock. After recv_ready indicated not ready, + * establish peer_wait connection if still needed. + */ +static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other) +{ + int connected; + + connected = unix_dgram_peer_wake_connect(sk, other); + + if (unix_recvq_full(other)) + return 1; + + if (connected) + unix_dgram_peer_wake_disconnect(sk, other); + + return 0; +} + static inline int unix_writable(struct sock *sk) { return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf; @@ -430,6 +546,8 @@ static void unix_release_sock(struct sock *sk, int embrion) skpair->sk_state_change(skpair); sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP); } + + unix_dgram_peer_wake_disconnect(sk, skpair); sock_put(skpair); /* It may now die */ unix_peer(sk) = NULL; } @@ -664,6 +782,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern) INIT_LIST_HEAD(&u->link); mutex_init(&u->readlock); /* single task reading lock */ init_waitqueue_head(&u->peer_wait); + init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay); unix_insert_socket(unix_sockets_unbound(sk), sk); out: if (sk == NULL) @@ -1031,6 +1150,13 @@ restart: if (unix_peer(sk)) { struct sock *old_peer = unix_peer(sk); unix_peer(sk) = other; + + if (unix_dgram_peer_wake_disconnect(sk, other)) + wake_up_interruptible_poll(sk_sleep(sk), + POLLOUT | + POLLWRNORM | + POLLWRBAND); + unix_state_double_unlock(sk, other); if (other != old_peer) @@ -1565,6 +1691,13 @@ restart: unix_state_lock(sk); if (unix_peer(sk) == other) { unix_peer(sk) = NULL; + + if (unix_dgram_peer_wake_disconnect(sk, other)) + wake_up_interruptible_poll(sk_sleep(sk), + POLLOUT | + POLLWRNORM | + POLLWRBAND); + unix_state_unlock(sk); unix_dgram_disconnected(sk, other); @@ -1590,19 +1723,21 @@ restart: goto out_unlock; } - if (unix_peer(other) != sk && unix_recvq_full(other)) { - if (!timeo) { - err = -EAGAIN; - goto out_unlock; - } + if (!unix_dgram_peer_recv_ready(sk, other)) { + if (timeo) { + timeo = unix_wait_for_peer(other, timeo); - timeo = unix_wait_for_peer(other, timeo); + err = sock_intr_errno(timeo); + if (signal_pending(current)) + goto out_free; - err = sock_intr_errno(timeo); - if (signal_pending(current)) - goto out_free; + goto restart; + } - goto restart; + if (unix_dgram_peer_wake_me(sk, other)) { + err = -EAGAIN; + goto out_unlock; + } } if (sock_flag(other, SOCK_RCVTSTAMP)) @@ -2453,14 +2588,16 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock, return mask; writable = unix_writable(sk); - other = unix_peer_get(sk); - if (other) { - if (unix_peer(other) != sk) { - sock_poll_wait(file, &unix_sk(other)->peer_wait, wait); - if (unix_recvq_full(other)) - writable = 0; - } - sock_put(other); + if (writable) { + unix_state_lock(sk); + + other = unix_peer(sk); + if (other && + !unix_dgram_peer_recv_ready(sk, other) && + unix_dgram_peer_wake_me(sk, other)) + writable = 0; + + unix_state_unlock(sk); } if (writable) ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-10 21:55 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue Rainer Weikusat @ 2015-11-11 12:28 ` Hannes Frederic Sowa 2015-11-11 16:12 ` Rainer Weikusat 2015-11-11 17:35 ` Jason Baron 2015-11-13 18:51 ` Rainer Weikusat 2 siblings, 1 reply; 43+ messages in thread From: Hannes Frederic Sowa @ 2015-11-11 12:28 UTC (permalink / raw) To: Rainer Weikusat Cc: Jason Baron, Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause Hello, On Tue, Nov 10, 2015, at 22:55, Rainer Weikusat wrote: > An AF_UNIX datagram socket being the client in an n:1 association with > some server socket is only allowed to send messages to the server if the > receive queue of this socket contains at most sk_max_ack_backlog > datagrams. This implies that prospective writers might be forced to go > to sleep despite none of the message presently enqueued on the server > receive queue were sent by them. In order to ensure that these will be > woken up once space becomes again available, the present unix_dgram_poll > routine does a second sock_poll_wait call with the peer_wait wait queue > of the server socket as queue argument (unix_dgram_recvmsg does a wake > up on this queue after a datagram was received). This is inherently > problematic because the server socket is only guaranteed to remain alive > for as long as the client still holds a reference to it. In case the > connection is dissolved via connect or by the dead peer detection logic > in unix_dgram_sendmsg, the server socket may be freed despite "the > polling mechanism" (in particular, epoll) still has a pointer to the > corresponding peer_wait queue. There's no way to forcibly deregister a > wait queue with epoll. > > Based on an idea by Jason Baron, the patch below changes the code such > that a wait_queue_t belonging to the client socket is enqueued on the > peer_wait queue of the server whenever the peer receive queue full > condition is detected by either a sendmsg or a poll. A wake up on the > peer queue is then relayed to the ordinary wait queue of the client > socket via wake function. The connection to the peer wait queue is again > dissolved if either a wake up is about to be relayed or the client > socket reconnects or a dead peer is detected or the client socket is > itself closed. This enables removing the second sock_poll_wait from > unix_dgram_poll, thus avoiding the use-after-free, while still ensuring > that no blocked writer sleeps forever. This whole patch seems pretty complicated to me. Can't we just remove the unix_recvq_full checks alltogether and unify unix_dgram_poll with unix_poll? If we want to be cautious we could simply make unix_max_dgram_qlen limit the number of skbs which are in flight from a sending socket. The skb destructor can then decrement this. This seems much simpler. Would this work? Thanks, Hannes ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-11 12:28 ` Hannes Frederic Sowa @ 2015-11-11 16:12 ` Rainer Weikusat 2015-11-11 18:52 ` Hannes Frederic Sowa 0 siblings, 1 reply; 43+ messages in thread From: Rainer Weikusat @ 2015-11-11 16:12 UTC (permalink / raw) To: Hannes Frederic Sowa Cc: Rainer Weikusat, Jason Baron, Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause Hannes Frederic Sowa <hannes@stressinduktion.org> writes: > On Tue, Nov 10, 2015, at 22:55, Rainer Weikusat wrote: >> An AF_UNIX datagram socket being the client in an n:1 association with >> some server socket is only allowed to send messages to the server if the >> receive queue of this socket contains at most sk_max_ack_backlog >> datagrams. [...] > This whole patch seems pretty complicated to me. > > Can't we just remove the unix_recvq_full checks alltogether and unify > unix_dgram_poll with unix_poll? > > If we want to be cautious we could simply make unix_max_dgram_qlen limit > the number of skbs which are in flight from a sending socket. The skb > destructor can then decrement this. This seems much simpler. > > Would this work? In the way this is intended to work, cf http://marc.info/?t=115627606000002&r=1&w=2 only if the limit would also apply to sockets which didn't sent anything so far. Which means it'll end up in the exact same situation as before: Sending something using a certain socket may not be possible because of data sent by other sockets, so either, code trying to send using this sockets ends up busy-waiting for "space again available" despite it's trying to use select/ poll/ epolll/ $whatnot to get notified of this condition and sleep until then or this notification needs to be propagated to sleeping threads which didn't get to send anything yet. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-11 16:12 ` Rainer Weikusat @ 2015-11-11 18:52 ` Hannes Frederic Sowa 2015-11-13 19:06 ` Rainer Weikusat 0 siblings, 1 reply; 43+ messages in thread From: Hannes Frederic Sowa @ 2015-11-11 18:52 UTC (permalink / raw) To: Rainer Weikusat Cc: Jason Baron, Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause Hi, On Wed, Nov 11, 2015, at 17:12, Rainer Weikusat wrote: > Hannes Frederic Sowa <hannes@stressinduktion.org> writes: > > On Tue, Nov 10, 2015, at 22:55, Rainer Weikusat wrote: > >> An AF_UNIX datagram socket being the client in an n:1 association with > >> some server socket is only allowed to send messages to the server if the > >> receive queue of this socket contains at most sk_max_ack_backlog > >> datagrams. > > [...] > > > This whole patch seems pretty complicated to me. > > > > Can't we just remove the unix_recvq_full checks alltogether and unify > > unix_dgram_poll with unix_poll? > > > > If we want to be cautious we could simply make unix_max_dgram_qlen limit > > the number of skbs which are in flight from a sending socket. The skb > > destructor can then decrement this. This seems much simpler. > > > > Would this work? > > In the way this is intended to work, cf > > http://marc.info/?t=115627606000002&r=1&w=2 Oh, I see, we don't limit closed but still referenced sockets. This actually makes sense on how fd handling is implemented, just as a range check. Have you checked if we can somehow deregister the socket in the poll event framework? You wrote that it does not provide such a function but maybe it would be easy to add? Thanks, Hannes ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-11 18:52 ` Hannes Frederic Sowa @ 2015-11-13 19:06 ` Rainer Weikusat 0 siblings, 0 replies; 43+ messages in thread From: Rainer Weikusat @ 2015-11-13 19:06 UTC (permalink / raw) To: Hannes Frederic Sowa Cc: Rainer Weikusat, Jason Baron, Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause Hannes Frederic Sowa <hannes@stressinduktion.org> writes: > On Wed, Nov 11, 2015, at 17:12, Rainer Weikusat wrote: >> Hannes Frederic Sowa <hannes@stressinduktion.org> writes: >> > On Tue, Nov 10, 2015, at 22:55, Rainer Weikusat wrote: >> >> An AF_UNIX datagram socket being the client in an n:1 association with >> >> some server socket is only allowed to send messages to the server if the >> >> receive queue of this socket contains at most sk_max_ack_backlog >> >> datagrams. >> >> [...] >> >> > This whole patch seems pretty complicated to me. >> > >> > Can't we just remove the unix_recvq_full checks alltogether and unify >> > unix_dgram_poll with unix_poll? >> > >> > If we want to be cautious we could simply make unix_max_dgram_qlen limit >> > the number of skbs which are in flight from a sending socket. The skb >> > destructor can then decrement this. This seems much simpler. >> > >> > Would this work? >> >> In the way this is intended to work, cf >> >> http://marc.info/?t=115627606000002&r=1&w=2 > > Oh, I see, we don't limit closed but still referenced sockets. This > actually makes sense on how fd handling is implemented, just as a range > check. > > Have you checked if we can somehow deregister the socket in the poll > event framework? You wrote that it does not provide such a function but > maybe it would be easy to add? I thought about this but this would amount to adding a general interface for the sole purpose of enabling the af_unix code to talk to the eventpoll code and I don't really like this idea: IMHO, there should be at least two users (preferably three) before creating any kind of 'abstract interface'. An even more ideal "castle in the air" (hypothetical) solution would be "change the eventpoll.c code such that it won't be affected if a wait queue just goes away". That's at least theoretically possible (although it might not be in practice). I wouldn't mind doing that (assuming it was possible) if it was just for the kernels my employer uses because I'm aware of the uses these will be put to and in control of the corresponding userland code. But for "general Linux code", changing epoll in order to help the af_unix code is more potential trouble than it's worth: Exchanging a relatively unimportant bug in some module for a much more visibly damaging bug in a central facility would be a bad tradeoff. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-10 21:55 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue Rainer Weikusat 2015-11-11 12:28 ` Hannes Frederic Sowa @ 2015-11-11 17:35 ` Jason Baron 2015-11-12 19:11 ` Rainer Weikusat 2015-11-13 18:51 ` Rainer Weikusat 2 siblings, 1 reply; 43+ messages in thread From: Jason Baron @ 2015-11-11 17:35 UTC (permalink / raw) To: Rainer Weikusat Cc: Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause Hi Rainer, > + > +/* Needs sk unix state lock. After recv_ready indicated not ready, > + * establish peer_wait connection if still needed. > + */ > +static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other) > +{ > + int connected; > + > + connected = unix_dgram_peer_wake_connect(sk, other); > + > + if (unix_recvq_full(other)) > + return 1; > + > + if (connected) > + unix_dgram_peer_wake_disconnect(sk, other); > + > + return 0; > +} > + So the comment above this function says 'needs unix state lock', however the usage in unix_dgram_sendmsg() has the 'other' lock, while the usage in unix_dgram_poll() has the 'sk' lock. So this looks racy. Also, another tweak on this scheme: Instead of calling '__remove_wait_queue()' in unix_dgram_peer_wake_relay(). We could instead simply mark each item in the queue as 'WQ_FLAG_EXCLUSIVE'. Then, since 'unix_dgram_recvmsg()' does an exclusive wakeup the queue has effectively been disabled (minus the first exlusive item in the list which can just return if its marked exclusive). This means that in dgram_poll(), we add to the list if we have not yet been added, and if we are on the list, we do a remove and then add removing the exclusive flag. Thus, all the waiters that need a wakeup are at the beginning of the queue, and the disabled ones are at the end with the 'WQ_FLAG_EXCLUSIVE' flag set. This does make the list potentially long, but if we only walk it to the point we are doing the wakeup, it has no impact. I like the fact that in this scheme the wakeup doesn't have to call remove against a long of waiters - its just setting the exclusive flag. Thanks, -Jason > static inline int unix_writable(struct sock *sk) > { > return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf; > @@ -430,6 +546,8 @@ static void unix_release_sock(struct sock *sk, int embrion) > skpair->sk_state_change(skpair); > sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP); > } > + > + unix_dgram_peer_wake_disconnect(sk, skpair); > sock_put(skpair); /* It may now die */ > unix_peer(sk) = NULL; > } > @@ -664,6 +782,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern) > INIT_LIST_HEAD(&u->link); > mutex_init(&u->readlock); /* single task reading lock */ > init_waitqueue_head(&u->peer_wait); > + init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay); > unix_insert_socket(unix_sockets_unbound(sk), sk); > out: > if (sk == NULL) > @@ -1031,6 +1150,13 @@ restart: > if (unix_peer(sk)) { > struct sock *old_peer = unix_peer(sk); > unix_peer(sk) = other; > + > + if (unix_dgram_peer_wake_disconnect(sk, other)) > + wake_up_interruptible_poll(sk_sleep(sk), > + POLLOUT | > + POLLWRNORM | > + POLLWRBAND); > + > unix_state_double_unlock(sk, other); > > if (other != old_peer) > @@ -1565,6 +1691,13 @@ restart: > unix_state_lock(sk); > if (unix_peer(sk) == other) { > unix_peer(sk) = NULL; > + > + if (unix_dgram_peer_wake_disconnect(sk, other)) > + wake_up_interruptible_poll(sk_sleep(sk), > + POLLOUT | > + POLLWRNORM | > + POLLWRBAND); > + > unix_state_unlock(sk); > > unix_dgram_disconnected(sk, other); > @@ -1590,19 +1723,21 @@ restart: > goto out_unlock; > } > > - if (unix_peer(other) != sk && unix_recvq_full(other)) { > - if (!timeo) { > - err = -EAGAIN; > - goto out_unlock; > - } > + if (!unix_dgram_peer_recv_ready(sk, other)) { > + if (timeo) { > + timeo = unix_wait_for_peer(other, timeo); > > - timeo = unix_wait_for_peer(other, timeo); > + err = sock_intr_errno(timeo); > + if (signal_pending(current)) > + goto out_free; > > - err = sock_intr_errno(timeo); > - if (signal_pending(current)) > - goto out_free; > + goto restart; > + } > > - goto restart; > + if (unix_dgram_peer_wake_me(sk, other)) { > + err = -EAGAIN; > + goto out_unlock; > + } > } > > if (sock_flag(other, SOCK_RCVTSTAMP)) > @@ -2453,14 +2588,16 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock, > return mask; > > writable = unix_writable(sk); > - other = unix_peer_get(sk); > - if (other) { > - if (unix_peer(other) != sk) { > - sock_poll_wait(file, &unix_sk(other)->peer_wait, wait); > - if (unix_recvq_full(other)) > - writable = 0; > - } > - sock_put(other); > + if (writable) { > + unix_state_lock(sk); > + > + other = unix_peer(sk); > + if (other && > + !unix_dgram_peer_recv_ready(sk, other) && > + unix_dgram_peer_wake_me(sk, other)) > + writable = 0; > + > + unix_state_unlock(sk); > } > > if (writable) > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-11 17:35 ` Jason Baron @ 2015-11-12 19:11 ` Rainer Weikusat 0 siblings, 0 replies; 43+ messages in thread From: Rainer Weikusat @ 2015-11-12 19:11 UTC (permalink / raw) To: Jason Baron Cc: Rainer Weikusat, Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause Jason Baron <jbaron@akamai.com> writes: >> + >> +/* Needs sk unix state lock. After recv_ready indicated not ready, >> + * establish peer_wait connection if still needed. >> + */ >> +static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other) >> +{ >> + int connected; >> + >> + connected = unix_dgram_peer_wake_connect(sk, other); >> + >> + if (unix_recvq_full(other)) >> + return 1; >> + >> + if (connected) >> + unix_dgram_peer_wake_disconnect(sk, other); >> + >> + return 0; >> +} >> + > > So the comment above this function says 'needs unix state lock', however > the usage in unix_dgram_sendmsg() has the 'other' lock, while the usage > in unix_dgram_poll() has the 'sk' lock. So this looks racy. That's one thing which is broken with this patch. Judging from a 'quick' look at the _dgram_sendmsg code, the unix_state_lock(other) will need to be turned into a unix_state_double_lock(sk, other) and the remaining code changed accordingly (since all of the checks must be done without unlocking other). There's also something else seriously wrong with the present patch: Some code in unix_dgram_connect presently (with this change) looks like this: /* * If it was connected, reconnect. */ if (unix_peer(sk)) { struct sock *old_peer = unix_peer(sk); unix_peer(sk) = other; if (unix_dgram_peer_wake_disconnect(sk, other)) wake_up_interruptible_poll(sk_sleep(sk), POLLOUT | POLLWRNORM | POLLWRBAND); unix_state_double_unlock(sk, other); if (other != old_peer) unix_dgram_disconnected(sk, old_peer); sock_put(old_peer); and trying to disconnect from a peer the socket is just being connected to is - of course - "flowering tomfoolery" (literal translation of the German "bluehender Bloedsinn") --- it needs to disconnect from old_peer instead. I'll address the suggestion and send an updated patch "later today" (may become "early tomorrow"). I have some code addressing both issues but that's part of a release of 'our' kernel fork, ie, 3.2.54-based I'll need to do 'soon'. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-10 21:55 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue Rainer Weikusat 2015-11-11 12:28 ` Hannes Frederic Sowa 2015-11-11 17:35 ` Jason Baron @ 2015-11-13 18:51 ` Rainer Weikusat 2015-11-13 22:17 ` Jason Baron 2015-11-16 22:15 ` Rainer Weikusat 2 siblings, 2 replies; 43+ messages in thread From: Rainer Weikusat @ 2015-11-13 18:51 UTC (permalink / raw) To: Rainer Weikusat Cc: Jason Baron, Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause An AF_UNIX datagram socket being the client in an n:1 association with some server socket is only allowed to send messages to the server if the receive queue of this socket contains at most sk_max_ack_backlog datagrams. This implies that prospective writers might be forced to go to sleep despite none of the message presently enqueued on the server receive queue were sent by them. In order to ensure that these will be woken up once space becomes again available, the present unix_dgram_poll routine does a second sock_poll_wait call with the peer_wait wait queue of the server socket as queue argument (unix_dgram_recvmsg does a wake up on this queue after a datagram was received). This is inherently problematic because the server socket is only guaranteed to remain alive for as long as the client still holds a reference to it. In case the connection is dissolved via connect or by the dead peer detection logic in unix_dgram_sendmsg, the server socket may be freed despite "the polling mechanism" (in particular, epoll) still has a pointer to the corresponding peer_wait queue. There's no way to forcibly deregister a wait queue with epoll. Based on an idea by Jason Baron, the patch below changes the code such that a wait_queue_t belonging to the client socket is enqueued on the peer_wait queue of the server whenever the peer receive queue full condition is detected by either a sendmsg or a poll. A wake up on the peer queue is then relayed to the ordinary wait queue of the client socket via wake function. The connection to the peer wait queue is again dissolved if either a wake up is about to be relayed or the client socket reconnects or a dead peer is detected or the client socket is itself closed. This enables removing the second sock_poll_wait from unix_dgram_poll, thus avoiding the use-after-free, while still ensuring that no blocked writer sleeps forever. Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> --- "Believed to be least buggy version" - disconnect from former peer in _dgram_connect - use unix_state_double_lock in _dgram_sendmsg to ensure recv_ready/ wake_me preconditions are met (noted by Jason Baron) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index b36d837..2a91a05 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -62,6 +62,7 @@ struct unix_sock { #define UNIX_GC_CANDIDATE 0 #define UNIX_GC_MAYBE_CYCLE 1 struct socket_wq peer_wq; + wait_queue_t peer_wake; }; static inline struct unix_sock *unix_sk(const struct sock *sk) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 94f6582..30e7c56 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -326,6 +326,122 @@ found: return s; } +/* Support code for asymmetrically connected dgram sockets + * + * If a datagram socket is connected to a socket not itself connected + * to the first socket (eg, /dev/log), clients may only enqueue more + * messages if the present receive queue of the server socket is not + * "too large". This means there's a second writeability condition + * poll and sendmsg need to test. The dgram recv code will do a wake + * up on the peer_wait wait queue of a socket upon reception of a + * datagram which needs to be propagated to sleeping would-be writers + * since these might not have sent anything so far. This can't be + * accomplished via poll_wait because the lifetime of the server + * socket might be less than that of its clients if these break their + * association with it or if the server socket is closed while clients + * are still connected to it and there's no way to inform "a polling + * implementation" that it should let go of a certain wait queue + * + * In order to propagate a wake up, a wait_queue_t of the client + * socket is enqueued on the peer_wait queue of the server socket + * whose wake function does a wake_up on the ordinary client socket + * wait queue. This connection is established whenever a write (or + * poll for write) hit the flow control condition and broken when the + * association to the server socket is dissolved or after a wake up + * was relayed. + */ + +static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags, + void *key) +{ + struct unix_sock *u; + wait_queue_head_t *u_sleep; + + u = container_of(q, struct unix_sock, peer_wake); + + __remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait, + q); + u->peer_wake.private = NULL; + + /* relaying can only happen while the wq still exists */ + u_sleep = sk_sleep(&u->sk); + if (u_sleep) + wake_up_interruptible_poll(u_sleep, key); + + return 0; +} + +static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other) +{ + struct unix_sock *u, *u_other; + int rc; + + u = unix_sk(sk); + u_other = unix_sk(other); + rc = 0; + + spin_lock(&u_other->peer_wait.lock); + + if (!u->peer_wake.private) { + u->peer_wake.private = other; + __add_wait_queue(&u_other->peer_wait, &u->peer_wake); + + rc = 1; + } + + spin_unlock(&u_other->peer_wait.lock); + return rc; +} + +static int unix_dgram_peer_wake_disconnect(struct sock *sk, struct sock *other) +{ + struct unix_sock *u, *u_other; + int rc; + + u = unix_sk(sk); + u_other = unix_sk(other); + rc = 0; + + spin_lock(&u_other->peer_wait.lock); + + if (u->peer_wake.private == other) { + __remove_wait_queue(&u_other->peer_wait, &u->peer_wake); + u->peer_wake.private = NULL; + + rc = 1; + } + + spin_unlock(&u_other->peer_wait.lock); + return rc; +} + +/* Preconditions for the following two functions: + * + * - unix_peer(sk) == other + * - association is stable + */ + +static int unix_dgram_peer_recv_ready(struct sock *sk, + struct sock *other) +{ + return unix_peer(other) == sk || !unix_recvq_full(other); +} + +static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other) +{ + int connected; + + connected = unix_dgram_peer_wake_connect(sk, other); + + if (unix_recvq_full(other)) + return 1; + + if (connected) + unix_dgram_peer_wake_disconnect(sk, other); + + return 0; +} + static inline int unix_writable(struct sock *sk) { return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf; @@ -430,6 +546,8 @@ static void unix_release_sock(struct sock *sk, int embrion) skpair->sk_state_change(skpair); sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP); } + + unix_dgram_peer_wake_disconnect(sk, skpair); sock_put(skpair); /* It may now die */ unix_peer(sk) = NULL; } @@ -664,6 +782,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern) INIT_LIST_HEAD(&u->link); mutex_init(&u->readlock); /* single task reading lock */ init_waitqueue_head(&u->peer_wait); + init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay); unix_insert_socket(unix_sockets_unbound(sk), sk); out: if (sk == NULL) @@ -1031,6 +1150,13 @@ restart: if (unix_peer(sk)) { struct sock *old_peer = unix_peer(sk); unix_peer(sk) = other; + + if (unix_dgram_peer_wake_disconnect(sk, old_peer)) + wake_up_interruptible_poll(sk_sleep(sk), + POLLOUT | + POLLWRNORM | + POLLWRBAND); + unix_state_double_unlock(sk, other); if (other != old_peer) @@ -1548,7 +1674,7 @@ restart: goto out_free; } - unix_state_lock(other); + unix_state_double_lock(sk, other); err = -EPERM; if (!unix_may_send(sk, other)) goto out_unlock; @@ -1562,9 +1688,15 @@ restart: sock_put(other); err = 0; - unix_state_lock(sk); if (unix_peer(sk) == other) { unix_peer(sk) = NULL; + + if (unix_dgram_peer_wake_disconnect(sk, other)) + wake_up_interruptible_poll(sk_sleep(sk), + POLLOUT | + POLLWRNORM | + POLLWRBAND); + unix_state_unlock(sk); unix_dgram_disconnected(sk, other); @@ -1590,21 +1722,27 @@ restart: goto out_unlock; } - if (unix_peer(other) != sk && unix_recvq_full(other)) { - if (!timeo) { - err = -EAGAIN; - goto out_unlock; - } + if (unix_peer(sk) == other && !unix_dgram_peer_recv_ready(sk, other)) { + if (timeo) { + unix_state_unlock(sk); - timeo = unix_wait_for_peer(other, timeo); + timeo = unix_wait_for_peer(other, timeo); - err = sock_intr_errno(timeo); - if (signal_pending(current)) - goto out_free; + err = sock_intr_errno(timeo); + if (signal_pending(current)) + goto out_free; - goto restart; + goto restart; + } + + if (unix_dgram_peer_wake_me(sk, other)) { + err = -EAGAIN; + goto out_unlock; + } } + unix_state_unlock(sk); + if (sock_flag(other, SOCK_RCVTSTAMP)) __net_timestamp(skb); maybe_add_creds(skb, sock, other); @@ -1618,7 +1756,7 @@ restart: return len; out_unlock: - unix_state_unlock(other); + unix_state_double_unlock(sk, other); out_free: kfree_skb(skb); out: @@ -2453,14 +2591,16 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock, return mask; writable = unix_writable(sk); - other = unix_peer_get(sk); - if (other) { - if (unix_peer(other) != sk) { - sock_poll_wait(file, &unix_sk(other)->peer_wait, wait); - if (unix_recvq_full(other)) - writable = 0; - } - sock_put(other); + if (writable) { + unix_state_lock(sk); + + other = unix_peer(sk); + if (other && + !unix_dgram_peer_recv_ready(sk, other) && + unix_dgram_peer_wake_me(sk, other)) + writable = 0; + + unix_state_unlock(sk); } if (writable) ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-13 18:51 ` Rainer Weikusat @ 2015-11-13 22:17 ` Jason Baron 2015-11-15 18:32 ` Rainer Weikusat 2015-11-16 22:15 ` Rainer Weikusat 1 sibling, 1 reply; 43+ messages in thread From: Jason Baron @ 2015-11-13 22:17 UTC (permalink / raw) To: Rainer Weikusat Cc: Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause On 11/13/2015 01:51 PM, Rainer Weikusat wrote: [...] > > - if (unix_peer(other) != sk && unix_recvq_full(other)) { > - if (!timeo) { > - err = -EAGAIN; > - goto out_unlock; > - } > + if (unix_peer(sk) == other && !unix_dgram_peer_recv_ready(sk, other)) { Remind me why the 'unix_peer(sk) == other' is added here? If the remote is not connected we still want to make sure that we don't overflow the the remote rcv queue, right? In terms of this added 'double' lock for both sk and other, where previously we just held the 'other' lock. I think we could continue to just hold the 'other' lock unless the remote queue is full, so something like: if (unix_peer(other) != sk && unix_recvq_full(other)) { bool need_wakeup = false; ....skipping the blocking case... err = -EAGAIN; if (!other_connected) goto out_unlock; unix_state_unlock(other); unix_state_lock(sk); /* if remote peer has changed under us, the connect() will wake up any pending waiter, just return -EAGAIN if (unix_peer(sk) == other) { /* In case we see there is space available queue the wakeup and we will try again. This this should be an unlikely condition */ if (!unix_dgram_peer_wake_me(sk, other)) need_wakeup = true; } unix_state_unlock(sk); if (need_wakeup) wake_up_interruptible_poll(sk_sleep(sk),POLLOUT | POLLWRNORM | POLLWRBAND); goto out_free; } So I'm not sure if the 'double' lock really affects any workload, but the above might be away to avoid it. Also - it might be helpful to add a 'Fixes:' tag referencing where this issue started, in the changelog. Worth mentioning too is that this patch should improve the polling case here dramatically, as we currently wake the entire queue on every remote read even when we have room in the rcv buffer. So this patch will cut down on ctxt switching rate dramatically from what we currently have. Thanks, -Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-13 22:17 ` Jason Baron @ 2015-11-15 18:32 ` Rainer Weikusat 2015-11-17 16:08 ` Jason Baron 0 siblings, 1 reply; 43+ messages in thread From: Rainer Weikusat @ 2015-11-15 18:32 UTC (permalink / raw) To: Jason Baron Cc: Rainer Weikusat, Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause Jason Baron <jbaron@akamai.com> writes: > On 11/13/2015 01:51 PM, Rainer Weikusat wrote: > > [...] > >> >> - if (unix_peer(other) != sk && unix_recvq_full(other)) { >> - if (!timeo) { >> - err = -EAGAIN; >> - goto out_unlock; >> - } >> + if (unix_peer(sk) == other && !unix_dgram_peer_recv_ready(sk, other)) { > > Remind me why the 'unix_peer(sk) == other' is added here? If the remote > is not connected we still want to make sure that we don't overflow the > the remote rcv queue, right? Good point. The check is actually wrong there as the original code would also check the limit in case of an unconnected send to a socket found via address lookup. It belongs into the 2nd if (were I originally put it). > > In terms of this added 'double' lock for both sk and other, where > previously we just held the 'other' lock. I think we could continue to > just hold the 'other' lock unless the remote queue is full, so something > like: > > if (unix_peer(other) != sk && unix_recvq_full(other)) { > bool need_wakeup = false; > > ....skipping the blocking case... > > err = -EAGAIN; > if (!other_connected) > goto out_unlock; > unix_state_unlock(other); > unix_state_lock(sk); That was my original idea. The problem with this is that the code starting after the _lock and running until the main code path unlock has to be executed in one go with the other lock held as the results of the tests above this one may become invalid as soon as the other lock is released. This means instead of continuing execution with the send code proper after the block in case other became receive-ready between the first and the second test (possible because _dgram_recvmsg does not take the unix state lock), the whole procedure starting with acquiring the other lock would need to be restarted. Given sufficiently unfavorable circumstances, this could even turn into an endless loop which couldn't be interrupted. (unless code for this was added). [...] > we currently wake the entire queue on every remote read even when we > have room in the rcv buffer. So this patch will cut down on ctxt > switching rate dramatically from what we currently have. In my opinion, this could be improved by making the throttling mechanism work like a flip flop: If the queue lenght hits the limit after a _sendmsg, set a "no more applicants" flag blocking future sends until cleared (checking the flag would replace the present check). After the receive queue ran empty (or almost empty), _dgram_sendmsg would clear the flag and do a wakeup. But this should be an independent patch (if implemented). ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-15 18:32 ` Rainer Weikusat @ 2015-11-17 16:08 ` Jason Baron 2015-11-17 18:38 ` Rainer Weikusat 0 siblings, 1 reply; 43+ messages in thread From: Jason Baron @ 2015-11-17 16:08 UTC (permalink / raw) To: Rainer Weikusat Cc: Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause On 11/15/2015 01:32 PM, Rainer Weikusat wrote: > > That was my original idea. The problem with this is that the code > starting after the _lock and running until the main code path unlock has > to be executed in one go with the other lock held as the results of the > tests above this one may become invalid as soon as the other lock is > released. This means instead of continuing execution with the send code > proper after the block in case other became receive-ready between the > first and the second test (possible because _dgram_recvmsg does not > take the unix state lock), the whole procedure starting with acquiring > the other lock would need to be restarted. Given sufficiently unfavorable > circumstances, this could even turn into an endless loop which couldn't > be interrupted. (unless code for this was added). > hmmm - I think we can avoid it by doing the wakeup from the write path in the rare case that the queue has emptied - and avoid the double lock. IE: unix_state_unlock(other); unix_state_lock(sk); err = -EAGAIN; if (unix_peer(sk) == other) { unix_dgram_peer_wake_connect(sk, other); if (skb_queue_len(&other->sk_receive_queue) == 0) need_wakeup = true; } unix_state_unlock(sk); if (need_wakeup) wake_up_interruptible_poll(sk_sleep(sk), POLLOUT | POLLWRNORM | POLLWRBAND); goto out_free; Thanks, -Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-17 16:08 ` Jason Baron @ 2015-11-17 18:38 ` Rainer Weikusat 0 siblings, 0 replies; 43+ messages in thread From: Rainer Weikusat @ 2015-11-17 18:38 UTC (permalink / raw) To: Jason Baron Cc: Rainer Weikusat, Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause Jason Baron <jbaron@akamai.com> writes: > On 11/15/2015 01:32 PM, Rainer Weikusat wrote: > >> >> That was my original idea. The problem with this is that the code >> starting after the _lock and running until the main code path unlock has >> to be executed in one go with the other lock held as the results of the >> tests above this one may become invalid as soon as the other lock is >> released. This means instead of continuing execution with the send code >> proper after the block in case other became receive-ready between the >> first and the second test (possible because _dgram_recvmsg does not >> take the unix state lock), the whole procedure starting with acquiring >> the other lock would need to be restarted. Given sufficiently unfavorable >> circumstances, this could even turn into an endless loop which couldn't >> be interrupted. (unless code for this was added). >> > > hmmm - I think we can avoid it by doing the wakeup from the write path > in the rare case that the queue has emptied - and avoid the double lock. IE: > > unix_state_unlock(other); > unix_state_lock(sk); > err = -EAGAIN; > if (unix_peer(sk) == other) { > unix_dgram_peer_wake_connect(sk, other); > if (skb_queue_len(&other->sk_receive_queue) == 0) > need_wakeup = true; > } > unix_state_unlock(sk); > if (need_wakeup) > wake_up_interruptible_poll(sk_sleep(sk), POLLOUT > | POLLWRNORM | POLLWRBAND); > goto out_free; This should probably rather be if (unix_dgram_peer_wake_connect(sk, other) && skb_queue_len(&other->sk_receive_queue) == 0) need_wakeup = 1; as there's no need to do the wake up if someone else already connected and then, the double lock could be avoided at the expense of returning a gratuitous EAGAIN to the caller and throwing all of the work _dgram_sendmsg did so far, eg, allocate a skb, copy the data into the kernel, do all the other checks, away. This would enable another thread to do one of the following things in parallell with the 'locked' part of _dgram_sendmsg 1) connect sk to a socket != other 2) use sk to send to a socket != other 3) do a shutdown on sk 4) determine write-readyness of sk via poll callback IMHO, the only thing which could possibly matter is 2) and my suggestion for this would rather be "use a send socket per sending thread if this matters to you" than "cause something to fail which could as well have succeeded". ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-13 18:51 ` Rainer Weikusat 2015-11-13 22:17 ` Jason Baron @ 2015-11-16 22:15 ` Rainer Weikusat 2015-11-16 22:28 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) Rainer Weikusat 1 sibling, 1 reply; 43+ messages in thread From: Rainer Weikusat @ 2015-11-16 22:15 UTC (permalink / raw) To: Rainer Weikusat Cc: Jason Baron, Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause An AF_UNIX datagram socket being the client in an n:1 association with some server socket is only allowed to send messages to the server if the receive queue of this socket contains at most sk_max_ack_backlog datagrams. This implies that prospective writers might be forced to go to sleep despite none of the message presently enqueued on the server receive queue were sent by them. In order to ensure that these will be woken up once space becomes again available, the present unix_dgram_poll routine does a second sock_poll_wait call with the peer_wait wait queue of the server socket as queue argument (unix_dgram_recvmsg does a wake up on this queue after a datagram was received). This is inherently problematic because the server socket is only guaranteed to remain alive for as long as the client still holds a reference to it. In case the connection is dissolved via connect or by the dead peer detection logic in unix_dgram_sendmsg, the server socket may be freed despite "the polling mechanism" (in particular, epoll) still has a pointer to the corresponding peer_wait queue. There's no way to forcibly deregister a wait queue with epoll. Based on an idea by Jason Baron, the patch below changes the code such that a wait_queue_t belonging to the client socket is enqueued on the peer_wait queue of the server whenever the peer receive queue full condition is detected by either a sendmsg or a poll. A wake up on the peer queue is then relayed to the ordinary wait queue of the client socket via wake function. The connection to the peer wait queue is again dissolved if either a wake up is about to be relayed or the client socket reconnects or a dead peer is detected or the client socket is itself closed. This enables removing the second sock_poll_wait from unix_dgram_poll, thus avoiding the use-after-free, while still ensuring that no blocked writer sleeps forever. Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> --- - fix logic in _dgram_sendmsg: queue limit also needs to be enforced for unconnected sends - drop _recv_ready helper function: I'm usually a big fan of functional decomposition but in this case, the abstraction seemed to obscure things rather than making them easier to understand diff --git a/include/net/af_unix.h b/include/net/af_unix.h index b36d837..2a91a05 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -62,6 +62,7 @@ struct unix_sock { #define UNIX_GC_CANDIDATE 0 #define UNIX_GC_MAYBE_CYCLE 1 struct socket_wq peer_wq; + wait_queue_t peer_wake; }; static inline struct unix_sock *unix_sk(const struct sock *sk) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 94f6582..3f4974d 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -326,6 +326,112 @@ found: return s; } +/* Support code for asymmetrically connected dgram sockets + * + * If a datagram socket is connected to a socket not itself connected + * to the first socket (eg, /dev/log), clients may only enqueue more + * messages if the present receive queue of the server socket is not + * "too large". This means there's a second writeability condition + * poll and sendmsg need to test. The dgram recv code will do a wake + * up on the peer_wait wait queue of a socket upon reception of a + * datagram which needs to be propagated to sleeping would-be writers + * since these might not have sent anything so far. This can't be + * accomplished via poll_wait because the lifetime of the server + * socket might be less than that of its clients if these break their + * association with it or if the server socket is closed while clients + * are still connected to it and there's no way to inform "a polling + * implementation" that it should let go of a certain wait queue + * + * In order to propagate a wake up, a wait_queue_t of the client + * socket is enqueued on the peer_wait queue of the server socket + * whose wake function does a wake_up on the ordinary client socket + * wait queue. This connection is established whenever a write (or + * poll for write) hit the flow control condition and broken when the + * association to the server socket is dissolved or after a wake up + * was relayed. + */ + +static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags, + void *key) +{ + struct unix_sock *u; + wait_queue_head_t *u_sleep; + + u = container_of(q, struct unix_sock, peer_wake); + + __remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait, + q); + u->peer_wake.private = NULL; + + /* relaying can only happen while the wq still exists */ + u_sleep = sk_sleep(&u->sk); + if (u_sleep) + wake_up_interruptible_poll(u_sleep, key); + + return 0; +} + +static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other) +{ + struct unix_sock *u, *u_other; + int rc; + + u = unix_sk(sk); + u_other = unix_sk(other); + rc = 0; + spin_lock(&u_other->peer_wait.lock); + + if (!u->peer_wake.private) { + u->peer_wake.private = other; + __add_wait_queue(&u_other->peer_wait, &u->peer_wake); + + rc = 1; + } + + spin_unlock(&u_other->peer_wait.lock); + return rc; +} + +static int unix_dgram_peer_wake_disconnect(struct sock *sk, struct sock *other) +{ + struct unix_sock *u, *u_other; + int rc; + + u = unix_sk(sk); + u_other = unix_sk(other); + rc = 0; + spin_lock(&u_other->peer_wait.lock); + + if (u->peer_wake.private == other) { + __remove_wait_queue(&u_other->peer_wait, &u->peer_wake); + u->peer_wake.private = NULL; + + rc = 1; + } + + spin_unlock(&u_other->peer_wait.lock); + return rc; +} + +/* preconditions: + * - unix_peer(sk) == other + * - association is stable + */ +static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other) +{ + int connected; + + connected = unix_dgram_peer_wake_connect(sk, other); + + if (unix_recvq_full(other)) + return 1; + + if (connected) + unix_dgram_peer_wake_disconnect(sk, other); + + return 0; +} + static inline int unix_writable(struct sock *sk) { return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf; @@ -430,6 +536,8 @@ static void unix_release_sock(struct sock *sk, int embrion) skpair->sk_state_change(skpair); sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP); } + + unix_dgram_peer_wake_disconnect(sk, skpair); sock_put(skpair); /* It may now die */ unix_peer(sk) = NULL; } @@ -664,6 +772,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern) INIT_LIST_HEAD(&u->link); mutex_init(&u->readlock); /* single task reading lock */ init_waitqueue_head(&u->peer_wait); + init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay); unix_insert_socket(unix_sockets_unbound(sk), sk); out: if (sk == NULL) @@ -1031,6 +1140,13 @@ restart: if (unix_peer(sk)) { struct sock *old_peer = unix_peer(sk); unix_peer(sk) = other; + + if (unix_dgram_peer_wake_disconnect(sk, old_peer)) + wake_up_interruptible_poll(sk_sleep(sk), + POLLOUT | + POLLWRNORM | + POLLWRBAND); + unix_state_double_unlock(sk, other); if (other != old_peer) @@ -1548,7 +1664,7 @@ restart: goto out_free; } - unix_state_lock(other); + unix_state_double_lock(sk, other); err = -EPERM; if (!unix_may_send(sk, other)) goto out_unlock; @@ -1562,9 +1678,15 @@ restart: sock_put(other); err = 0; - unix_state_lock(sk); if (unix_peer(sk) == other) { unix_peer(sk) = NULL; + + if (unix_dgram_peer_wake_disconnect(sk, other)) + wake_up_interruptible_poll(sk_sleep(sk), + POLLOUT | + POLLWRNORM | + POLLWRBAND); + unix_state_unlock(sk); unix_dgram_disconnected(sk, other); @@ -1591,20 +1713,27 @@ restart: } if (unix_peer(other) != sk && unix_recvq_full(other)) { - if (!timeo) { - err = -EAGAIN; - goto out_unlock; - } + if (timeo) { + unix_state_unlock(sk); - timeo = unix_wait_for_peer(other, timeo); + timeo = unix_wait_for_peer(other, timeo); - err = sock_intr_errno(timeo); - if (signal_pending(current)) - goto out_free; + err = sock_intr_errno(timeo); + if (signal_pending(current)) + goto out_free; - goto restart; + goto restart; + } + + if (unix_peer(sk) != other || + unix_dgram_peer_wake_me(sk, other)) { + err = -EAGAIN; + goto out_unlock; + } } + unix_state_unlock(sk); + if (sock_flag(other, SOCK_RCVTSTAMP)) __net_timestamp(skb); maybe_add_creds(skb, sock, other); @@ -1618,7 +1747,7 @@ restart: return len; out_unlock: - unix_state_unlock(other); + unix_state_double_unlock(sk, other); out_free: kfree_skb(skb); out: @@ -2453,14 +2582,16 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock, return mask; writable = unix_writable(sk); - other = unix_peer_get(sk); - if (other) { - if (unix_peer(other) != sk) { - sock_poll_wait(file, &unix_sk(other)->peer_wait, wait); - if (unix_recvq_full(other)) - writable = 0; - } - sock_put(other); + if (writable) { + unix_state_lock(sk); + + other = unix_peer(sk); + if (other && unix_peer(other) != sk && + unix_recvq_full(other) && + unix_dgram_peer_wake_me(sk, other)) + writable = 0; + + unix_state_unlock(sk); } if (writable) ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) 2015-11-16 22:15 ` Rainer Weikusat @ 2015-11-16 22:28 ` Rainer Weikusat 2015-11-17 16:13 ` Jason Baron ` (2 more replies) 0 siblings, 3 replies; 43+ messages in thread From: Rainer Weikusat @ 2015-11-16 22:28 UTC (permalink / raw) To: Rainer Weikusat Cc: Jason Baron, Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause An AF_UNIX datagram socket being the client in an n:1 association with some server socket is only allowed to send messages to the server if the receive queue of this socket contains at most sk_max_ack_backlog datagrams. This implies that prospective writers might be forced to go to sleep despite none of the message presently enqueued on the server receive queue were sent by them. In order to ensure that these will be woken up once space becomes again available, the present unix_dgram_poll routine does a second sock_poll_wait call with the peer_wait wait queue of the server socket as queue argument (unix_dgram_recvmsg does a wake up on this queue after a datagram was received). This is inherently problematic because the server socket is only guaranteed to remain alive for as long as the client still holds a reference to it. In case the connection is dissolved via connect or by the dead peer detection logic in unix_dgram_sendmsg, the server socket may be freed despite "the polling mechanism" (in particular, epoll) still has a pointer to the corresponding peer_wait queue. There's no way to forcibly deregister a wait queue with epoll. Based on an idea by Jason Baron, the patch below changes the code such that a wait_queue_t belonging to the client socket is enqueued on the peer_wait queue of the server whenever the peer receive queue full condition is detected by either a sendmsg or a poll. A wake up on the peer queue is then relayed to the ordinary wait queue of the client socket via wake function. The connection to the peer wait queue is again dissolved if either a wake up is about to be relayed or the client socket reconnects or a dead peer is detected or the client socket is itself closed. This enables removing the second sock_poll_wait from unix_dgram_poll, thus avoiding the use-after-free, while still ensuring that no blocked writer sleeps forever. Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets") --- Additional remark about "5456f09aaf88/ af_unix: fix unix_dgram_poll() behavior for EPOLLOUT event": This shouldn't be an issue anymore with this change despite it restores the "only when writable" behaviour" as the wake up relay will also be set up once _dgram_sendmsg returned EAGAIN for a send attempt on a n:1 connected socket. diff --git a/include/net/af_unix.h b/include/net/af_unix.h index b36d837..2a91a05 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -62,6 +62,7 @@ struct unix_sock { #define UNIX_GC_CANDIDATE 0 #define UNIX_GC_MAYBE_CYCLE 1 struct socket_wq peer_wq; + wait_queue_t peer_wake; }; static inline struct unix_sock *unix_sk(const struct sock *sk) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 94f6582..3f4974d 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -326,6 +326,112 @@ found: return s; } +/* Support code for asymmetrically connected dgram sockets + * + * If a datagram socket is connected to a socket not itself connected + * to the first socket (eg, /dev/log), clients may only enqueue more + * messages if the present receive queue of the server socket is not + * "too large". This means there's a second writeability condition + * poll and sendmsg need to test. The dgram recv code will do a wake + * up on the peer_wait wait queue of a socket upon reception of a + * datagram which needs to be propagated to sleeping would-be writers + * since these might not have sent anything so far. This can't be + * accomplished via poll_wait because the lifetime of the server + * socket might be less than that of its clients if these break their + * association with it or if the server socket is closed while clients + * are still connected to it and there's no way to inform "a polling + * implementation" that it should let go of a certain wait queue + * + * In order to propagate a wake up, a wait_queue_t of the client + * socket is enqueued on the peer_wait queue of the server socket + * whose wake function does a wake_up on the ordinary client socket + * wait queue. This connection is established whenever a write (or + * poll for write) hit the flow control condition and broken when the + * association to the server socket is dissolved or after a wake up + * was relayed. + */ + +static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags, + void *key) +{ + struct unix_sock *u; + wait_queue_head_t *u_sleep; + + u = container_of(q, struct unix_sock, peer_wake); + + __remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait, + q); + u->peer_wake.private = NULL; + + /* relaying can only happen while the wq still exists */ + u_sleep = sk_sleep(&u->sk); + if (u_sleep) + wake_up_interruptible_poll(u_sleep, key); + + return 0; +} + +static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other) +{ + struct unix_sock *u, *u_other; + int rc; + + u = unix_sk(sk); + u_other = unix_sk(other); + rc = 0; + spin_lock(&u_other->peer_wait.lock); + + if (!u->peer_wake.private) { + u->peer_wake.private = other; + __add_wait_queue(&u_other->peer_wait, &u->peer_wake); + + rc = 1; + } + + spin_unlock(&u_other->peer_wait.lock); + return rc; +} + +static int unix_dgram_peer_wake_disconnect(struct sock *sk, struct sock *other) +{ + struct unix_sock *u, *u_other; + int rc; + + u = unix_sk(sk); + u_other = unix_sk(other); + rc = 0; + spin_lock(&u_other->peer_wait.lock); + + if (u->peer_wake.private == other) { + __remove_wait_queue(&u_other->peer_wait, &u->peer_wake); + u->peer_wake.private = NULL; + + rc = 1; + } + + spin_unlock(&u_other->peer_wait.lock); + return rc; +} + +/* preconditions: + * - unix_peer(sk) == other + * - association is stable + */ +static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other) +{ + int connected; + + connected = unix_dgram_peer_wake_connect(sk, other); + + if (unix_recvq_full(other)) + return 1; + + if (connected) + unix_dgram_peer_wake_disconnect(sk, other); + + return 0; +} + static inline int unix_writable(struct sock *sk) { return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf; @@ -430,6 +536,8 @@ static void unix_release_sock(struct sock *sk, int embrion) skpair->sk_state_change(skpair); sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP); } + + unix_dgram_peer_wake_disconnect(sk, skpair); sock_put(skpair); /* It may now die */ unix_peer(sk) = NULL; } @@ -664,6 +772,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern) INIT_LIST_HEAD(&u->link); mutex_init(&u->readlock); /* single task reading lock */ init_waitqueue_head(&u->peer_wait); + init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay); unix_insert_socket(unix_sockets_unbound(sk), sk); out: if (sk == NULL) @@ -1031,6 +1140,13 @@ restart: if (unix_peer(sk)) { struct sock *old_peer = unix_peer(sk); unix_peer(sk) = other; + + if (unix_dgram_peer_wake_disconnect(sk, old_peer)) + wake_up_interruptible_poll(sk_sleep(sk), + POLLOUT | + POLLWRNORM | + POLLWRBAND); + unix_state_double_unlock(sk, other); if (other != old_peer) @@ -1548,7 +1664,7 @@ restart: goto out_free; } - unix_state_lock(other); + unix_state_double_lock(sk, other); err = -EPERM; if (!unix_may_send(sk, other)) goto out_unlock; @@ -1562,9 +1678,15 @@ restart: sock_put(other); err = 0; - unix_state_lock(sk); if (unix_peer(sk) == other) { unix_peer(sk) = NULL; + + if (unix_dgram_peer_wake_disconnect(sk, other)) + wake_up_interruptible_poll(sk_sleep(sk), + POLLOUT | + POLLWRNORM | + POLLWRBAND); + unix_state_unlock(sk); unix_dgram_disconnected(sk, other); @@ -1591,20 +1713,27 @@ restart: } if (unix_peer(other) != sk && unix_recvq_full(other)) { - if (!timeo) { - err = -EAGAIN; - goto out_unlock; - } + if (timeo) { + unix_state_unlock(sk); - timeo = unix_wait_for_peer(other, timeo); + timeo = unix_wait_for_peer(other, timeo); - err = sock_intr_errno(timeo); - if (signal_pending(current)) - goto out_free; + err = sock_intr_errno(timeo); + if (signal_pending(current)) + goto out_free; - goto restart; + goto restart; + } + + if (unix_peer(sk) != other || + unix_dgram_peer_wake_me(sk, other)) { + err = -EAGAIN; + goto out_unlock; + } } + unix_state_unlock(sk); + if (sock_flag(other, SOCK_RCVTSTAMP)) __net_timestamp(skb); maybe_add_creds(skb, sock, other); @@ -1618,7 +1747,7 @@ restart: return len; out_unlock: - unix_state_unlock(other); + unix_state_double_unlock(sk, other); out_free: kfree_skb(skb); out: @@ -2453,14 +2582,16 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock, return mask; writable = unix_writable(sk); - other = unix_peer_get(sk); - if (other) { - if (unix_peer(other) != sk) { - sock_poll_wait(file, &unix_sk(other)->peer_wait, wait); - if (unix_recvq_full(other)) - writable = 0; - } - sock_put(other); + if (writable) { + unix_state_lock(sk); + + other = unix_peer(sk); + if (other && unix_peer(other) != sk && + unix_recvq_full(other) && + unix_dgram_peer_wake_me(sk, other)) + writable = 0; + + unix_state_unlock(sk); } if (writable) ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) 2015-11-16 22:28 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) Rainer Weikusat @ 2015-11-17 16:13 ` Jason Baron 2015-11-17 20:14 ` David Miller 2015-11-19 23:52 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) Rainer Weikusat 2 siblings, 0 replies; 43+ messages in thread From: Jason Baron @ 2015-11-17 16:13 UTC (permalink / raw) To: Rainer Weikusat Cc: Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause On 11/16/2015 05:28 PM, Rainer Weikusat wrote: > An AF_UNIX datagram socket being the client in an n:1 association with > some server socket is only allowed to send messages to the server if the > receive queue of this socket contains at most sk_max_ack_backlog > datagrams. This implies that prospective writers might be forced to go > to sleep despite none of the message presently enqueued on the server > receive queue were sent by them. In order to ensure that these will be > woken up once space becomes again available, the present unix_dgram_poll > routine does a second sock_poll_wait call with the peer_wait wait queue > of the server socket as queue argument (unix_dgram_recvmsg does a wake > up on this queue after a datagram was received). This is inherently > problematic because the server socket is only guaranteed to remain alive > for as long as the client still holds a reference to it. In case the > connection is dissolved via connect or by the dead peer detection logic > in unix_dgram_sendmsg, the server socket may be freed despite "the > polling mechanism" (in particular, epoll) still has a pointer to the > corresponding peer_wait queue. There's no way to forcibly deregister a > wait queue with epoll. > > Based on an idea by Jason Baron, the patch below changes the code such > that a wait_queue_t belonging to the client socket is enqueued on the > peer_wait queue of the server whenever the peer receive queue full > condition is detected by either a sendmsg or a poll. A wake up on the > peer queue is then relayed to the ordinary wait queue of the client > socket via wake function. The connection to the peer wait queue is again > dissolved if either a wake up is about to be relayed or the client > socket reconnects or a dead peer is detected or the client socket is > itself closed. This enables removing the second sock_poll_wait from > unix_dgram_poll, thus avoiding the use-after-free, while still ensuring > that no blocked writer sleeps forever. > > Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> > Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets") > --- > > Additional remark about "5456f09aaf88/ af_unix: fix unix_dgram_poll() > behavior for EPOLLOUT event": This shouldn't be an issue anymore with > this change despite it restores the "only when writable" behaviour" as > the wake up relay will also be set up once _dgram_sendmsg returned > EAGAIN for a send attempt on a n:1 connected socket. > > Hi, My only comment was about potentially avoiding the double lock in the write path, otherwise this looks ok to me. Thanks, -Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) 2015-11-16 22:28 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) Rainer Weikusat 2015-11-17 16:13 ` Jason Baron @ 2015-11-17 20:14 ` David Miller 2015-11-17 21:37 ` Rainer Weikusat 2015-11-18 18:15 ` Rainer Weikusat 2015-11-19 23:52 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) Rainer Weikusat 2 siblings, 2 replies; 43+ messages in thread From: David Miller @ 2015-11-17 20:14 UTC (permalink / raw) To: rweikusat Cc: jbaron, dvyukov, syzkaller, mkubecek, viro, linux-fsdevel, linux-kernel, hannes, dhowells, paul, salyzyn, sds, ying.xue, netdev, kcc, glider, andreyknvl, sasha.levin, jln, keescook, minipli From: Rainer Weikusat <rweikusat@mobileactivedefense.com> Date: Mon, 16 Nov 2015 22:28:40 +0000 > An AF_UNIX datagram socket being the client in an n:1 association with > some server socket is only allowed to send messages to the server if the > receive queue of this socket contains at most sk_max_ack_backlog > datagrams. This implies that prospective writers might be forced to go > to sleep despite none of the message presently enqueued on the server > receive queue were sent by them. In order to ensure that these will be > woken up once space becomes again available, the present unix_dgram_poll > routine does a second sock_poll_wait call with the peer_wait wait queue > of the server socket as queue argument (unix_dgram_recvmsg does a wake > up on this queue after a datagram was received). This is inherently > problematic because the server socket is only guaranteed to remain alive > for as long as the client still holds a reference to it. In case the > connection is dissolved via connect or by the dead peer detection logic > in unix_dgram_sendmsg, the server socket may be freed despite "the > polling mechanism" (in particular, epoll) still has a pointer to the > corresponding peer_wait queue. There's no way to forcibly deregister a > wait queue with epoll. > > Based on an idea by Jason Baron, the patch below changes the code such > that a wait_queue_t belonging to the client socket is enqueued on the > peer_wait queue of the server whenever the peer receive queue full > condition is detected by either a sendmsg or a poll. A wake up on the > peer queue is then relayed to the ordinary wait queue of the client > socket via wake function. The connection to the peer wait queue is again > dissolved if either a wake up is about to be relayed or the client > socket reconnects or a dead peer is detected or the client socket is > itself closed. This enables removing the second sock_poll_wait from > unix_dgram_poll, thus avoiding the use-after-free, while still ensuring > that no blocked writer sleeps forever. > > Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> > Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets") So because of a corner case of epoll handling and sender socket release, every single datagram sendmsg has to do a double lock now? I do not dispute the correctness of your fix at this point, but that added cost in the fast path is really too high. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) 2015-11-17 20:14 ` David Miller @ 2015-11-17 21:37 ` Rainer Weikusat 2015-11-17 22:09 ` Rainer Weikusat 2015-11-17 22:48 ` Rainer Weikusat 2015-11-18 18:15 ` Rainer Weikusat 1 sibling, 2 replies; 43+ messages in thread From: Rainer Weikusat @ 2015-11-17 21:37 UTC (permalink / raw) To: David Miller Cc: rweikusat, jbaron, dvyukov, syzkaller, mkubecek, viro, linux-fsdevel, linux-kernel, hannes, dhowells, paul, salyzyn, sds, ying.xue, netdev, kcc, glider, andreyknvl, sasha.levin, jln, keescook, minipli David Miller <davem@davemloft.net> writes: > From: Rainer Weikusat <rweikusat@mobileactivedefense.com> > Date: Mon, 16 Nov 2015 22:28:40 +0000 > >> An AF_UNIX datagram socket being the client in an n:1 association with >> some server socket is only allowed to send messages to the server if the >> receive queue of this socket contains at most sk_max_ack_backlog >> datagrams. [...] >> Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> >> Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets") > > So because of a corner case of epoll handling and sender socket release, > every single datagram sendmsg has to do a double lock now? > > I do not dispute the correctness of your fix at this point, but that > added cost in the fast path is really too high. This leaves only the option of a somewhat incorrect solution and what is or isn't acceptable in this respect is somewhat difficult to decide. The basic options would be - return EAGAIN even if sending became possible (Jason's most recent suggestions) - retry sending a limited number of times, eg, once, before returning EAGAIN, on the grounds that this is nicer to the application and that redoing all the stuff up to the _lock in dgram_sendmsg can possibly/ likely be avoided Which one do you prefer? ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) 2015-11-17 21:37 ` Rainer Weikusat @ 2015-11-17 22:09 ` Rainer Weikusat 2015-11-19 23:48 ` Rainer Weikusat 2015-11-17 22:48 ` Rainer Weikusat 1 sibling, 1 reply; 43+ messages in thread From: Rainer Weikusat @ 2015-11-17 22:09 UTC (permalink / raw) To: David Miller Cc: rweikusat, jbaron, dvyukov, syzkaller, mkubecek, viro, linux-fsdevel, linux-kernel, hannes, dhowells, paul, salyzyn, sds, ying.xue, netdev, kcc, glider, andreyknvl, sasha.levin, jln, keescook, minipli Rainer Weikusat <rw@doppelsaurus.mobileactivedefense.com> writes: [...] > The basic options would be > > - return EAGAIN even if sending became possible (Jason's most > recent suggestions) > > - retry sending a limited number of times, eg, once, before > returning EAGAIN, on the grounds that this is nicer to the > application and that redoing all the stuff up to the _lock in > dgram_sendmsg can possibly/ likely be avoided A third option: Use trylock to acquire the sk lock. If this succeeds, there's no risk of deadlocking anyone even if acquiring the locks in the wrong order. This could look as follows (NB: I didn't even compile this, I just wrote the code to get an idea how complicated it would be): int need_wakeup; [...] need_wakeup = 0; err = 0; if (spin_lock_trylock(unix_sk(sk)->lock)) { if (unix_peer(sk) != other || unix_dgram_peer_wake_me(sk, other)) err = -EAGAIN; } else { err = -EAGAIN; unix_state_unlock(other); unix_state_lock(sk); need_wakeup = unix_peer(sk) != other && unix_dgram_peer_wake_connect(sk, other) && sk_receive_queue_len(other) == 0; } unix_state_unlock(sk); if (err) { if (need_wakeup) wake_up_interruptible_poll(sk_sleep(sk), POLLOUT | POLLWRNORM | POLLWRBAND); goto out_free; } ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) 2015-11-17 22:09 ` Rainer Weikusat @ 2015-11-19 23:48 ` Rainer Weikusat 0 siblings, 0 replies; 43+ messages in thread From: Rainer Weikusat @ 2015-11-19 23:48 UTC (permalink / raw) To: Rainer Weikusat Cc: David Miller, jbaron, dvyukov, syzkaller, mkubecek, viro, linux-fsdevel, linux-kernel, hannes, dhowells, paul, salyzyn, sds, ying.xue, netdev, kcc, glider, andreyknvl, sasha.levin, jln, keescook, minipli Rainer Weikusat <rweikusat@mobileactivedefense.com> writes: > Rainer Weikusat <rw@doppelsaurus.mobileactivedefense.com> writes: > > [...] > >> The basic options would be >> >> - return EAGAIN even if sending became possible (Jason's most >> recent suggestions) >> >> - retry sending a limited number of times, eg, once, before >> returning EAGAIN, on the grounds that this is nicer to the >> application and that redoing all the stuff up to the _lock in >> dgram_sendmsg can possibly/ likely be avoided > > A third option: A fourth and even one that's reasonably simple to implement: In case other became ready during the checks, drop other lock, do a double-lock sk, other, set a flag variable indicating this and restart the procedure after the unix_state_lock_other[*], using the value of the flag to lock/ unlock sk as needed. Should other still be ready to receive data, execution can then continue with the 'queue it' code as the other lock was held all the time this time. Combined with a few unlikely annotations in place where they're IMHO appropriate, this is speed-wise comparable to the stock kernel. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) 2015-11-17 21:37 ` Rainer Weikusat 2015-11-17 22:09 ` Rainer Weikusat @ 2015-11-17 22:48 ` Rainer Weikusat 1 sibling, 0 replies; 43+ messages in thread From: Rainer Weikusat @ 2015-11-17 22:48 UTC (permalink / raw) To: Rainer Weikusat Cc: David Miller, jbaron, dvyukov, syzkaller, mkubecek, viro, linux-fsdevel, linux-kernel, hannes, dhowells, paul, salyzyn, sds, ying.xue, netdev, kcc, glider, andreyknvl, sasha.levin, jln, keescook, minipli Rainer Weikusat <rweikusat@mobileactivedefense.com> writes: [...] > This leaves only the option of a somewhat incorrect solution and what is > or isn't acceptable in this respect is somewhat difficult to decide. The > basic options would be [...] > - retry sending a limited number of times, eg, once, before > returning EAGAIN, on the grounds that this is nicer to the > application and that redoing all the stuff up to the _lock in > dgram_sendmsg can possibly/ likely be avoided Since it's better to have a specific example of something: Here's another 'code sketch' of this option (hopefully with less errors this time, there's an int restart = 0 above): if (unix_peer(other) != sk && unix_recvq_full(other)) { int need_wakeup; [...] need_wakeup = 0; err = 0; unix_state_unlock(other); unix_state_lock(sk); if (unix_peer(sk) == other) { if (++restart == 2) { need_wakeup = unix_dgram_peer_wake_connect(sk, other) && sk_receive_queue_len(other) == 0; err = -EAGAIN; } else if (unix_dgram_peer_wake_me(sk, other)) err = -EAGAIN; } else err = -EAGAIN; unix_state_unlock(sk); if (err || !restart) { if (need_wakeup) wake_up_interruptible_poll(sk_sleep(sk), POLLOUT | POLLWRNORM | POLLWRBAND); goto out_free; } goto restart; } I don't particularly like that, either, and to me, the best option seems to be to return the spurious EAGAIN if taking both locks unconditionally is not an option as that's the simplest choice. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) 2015-11-17 20:14 ` David Miller 2015-11-17 21:37 ` Rainer Weikusat @ 2015-11-18 18:15 ` Rainer Weikusat 2015-11-18 23:39 ` more statistics (was: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:)) Rainer Weikusat 1 sibling, 1 reply; 43+ messages in thread From: Rainer Weikusat @ 2015-11-18 18:15 UTC (permalink / raw) To: David Miller Cc: rweikusat, jbaron, dvyukov, syzkaller, mkubecek, viro, linux-fsdevel, linux-kernel, hannes, dhowells, paul, salyzyn, sds, ying.xue, netdev, kcc, glider, andreyknvl, sasha.levin, jln, keescook, minipli David Miller <davem@davemloft.net> writes: > From: Rainer Weikusat <rweikusat@mobileactivedefense.com> > Date: Mon, 16 Nov 2015 22:28:40 +0000 > >> An AF_UNIX datagram socket being the client in an n:1 [...] > So because of a corner case of epoll handling and sender socket release, > every single datagram sendmsg has to do a double lock now? > > I do not dispute the correctness of your fix at this point, but that > added cost in the fast path is really too high. Some more information on this: Running the test program included below on my 'work' system (otherwise idle, after logging in via VT with no GUI running)/ quadcore AMD A10-5700, 3393.984 for 20 times/ patched 4.3 resulted in the following throughput statistics[*]: avg 13.617 M/s median 13.393 M/s max 17.14 M/s min 13.047 M/s deviation 0.85 I'll try to post the results for 'unpatched' later as I'm also working on a couple of other things. [*] I do not use my fingers for counting, hence, these are binary and not decimal units. ------------ #include <inttypes.h> #include <stdlib.h> #include <stdio.h> #include <sys/socket.h> #include <sys/time.h> #include <sys/wait.h> #include <unistd.h> enum { MSG_SZ = 16, MSGS = 1000000 }; static char msg[MSG_SZ]; static uint64_t tv2u(struct timeval *tv) { uint64_t u; u = tv->tv_sec; u *= 1000000; return u + tv->tv_usec; } int main(void) { struct timeval start, stop; uint64_t t_diff; double rate; int sks[2]; unsigned remain; char buf[MSG_SZ]; socketpair(AF_UNIX, SOCK_SEQPACKET, 0, sks); if (fork() == 0) { close(*sks); gettimeofday(&start, 0); while (read(sks[1], buf, sizeof(buf)) > 0); gettimeofday(&stop, 0); t_diff = tv2u(&stop); t_diff -= tv2u(&start); rate = MSG_SZ * MSGS; rate /= t_diff; rate *= 1000000; printf("rate %fM/s\n", rate / (1 << 20)); fflush(stdout); _exit(0); } close(sks[1]); remain = MSGS; do write(*sks, msg, sizeof(msg)); while (--remain); close(*sks); wait(NULL); return 0; } ^ permalink raw reply [flat|nested] 43+ messages in thread
* more statistics (was: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:)) 2015-11-18 18:15 ` Rainer Weikusat @ 2015-11-18 23:39 ` Rainer Weikusat 0 siblings, 0 replies; 43+ messages in thread From: Rainer Weikusat @ 2015-11-18 23:39 UTC (permalink / raw) To: David Miller Cc: rweikusat, jbaron, dvyukov, syzkaller, mkubecek, viro, linux-fsdevel, linux-kernel, hannes, dhowells, paul, salyzyn, sds, ying.xue, netdev, kcc, glider, andreyknvl, sasha.levin, jln, keescook, minipli Rainer Weikusat <rw@doppelsaurus.mobileactivedefense.com> writes: [...] > Some more information on this: Running the test program included below > on my 'work' system (otherwise idle, after logging in via VT with no GUI > running)/ quadcore AMD A10-5700, 3393.984 for 20 times/ patched 4.3 resulted in the > following throughput statistics[*]: Since the results were too variable with only 20 runs, I've also tested this with 100 for three kernels, stock 4.3, 4.3 plus the published patch, 4.3 plus the published patch plus the "just return EAGAIN" modification". The 1st and the 3rd perform about identical for the test program I used (slightly modified version included below), the 2nd is markedly slower. This is most easily visible when grouping the printed data rates (B/s) 'by millions': stock 4.3 --------- 13000000.000-13999999.000 3 (3%) 14000000.000-14999999.000 82 (82%) 15000000.000-15999999.000 15 (15%) 4.3 + patch ----------- 13000000.000-13999999.000 54 (54%) 14000000.000-14999999.000 35 (35%) 15000000.000-15999999.000 7 (7%) 16000000.000-16999999.000 1 (1%) 18000000.000-18999999.000 1 (1%) 22000000.000-22999999.000 2 (2%) 4.3 + modified patch -------------------- 13000000.000-13999999.000 3 (3%) 14000000.000-14999999.000 82 (82%) 15000000.000-15999999.000 14 (14%) 24000000.000-24999999.000 1 (1%) IMHO, the 3rd option would be the way to go if this was considered an acceptable option (ie, despite it returns spurious errors in 'rare cases'). modified test program ===================== #include <inttypes.h> #include <stdlib.h> #include <stdio.h> #include <sys/socket.h> #include <sys/time.h> #include <sys/wait.h> #include <unistd.h> enum { MSG_SZ = 16, MSGS = 1000000 }; static char msg[MSG_SZ]; static uint64_t tv2u(struct timeval *tv) { uint64_t u; u = tv->tv_sec; u *= 1000000; return u + tv->tv_usec; } int main(void) { struct timeval start, stop; uint64_t t_diff; double rate; int sks[2]; unsigned remain; char buf[MSG_SZ]; socketpair(AF_UNIX, SOCK_SEQPACKET, 0, sks); if (fork() == 0) { close(*sks); gettimeofday(&start, 0); while (read(sks[1], buf, sizeof(buf)) > 0); gettimeofday(&stop, 0); t_diff = tv2u(&stop); t_diff -= tv2u(&start); rate = MSG_SZ * MSGS; rate /= t_diff; rate *= 1000000; printf("%f\n", rate); fflush(stdout); _exit(0); } close(sks[1]); remain = MSGS; do write(*sks, msg, sizeof(msg)); while (--remain); close(*sks); wait(NULL); return 0; } ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) 2015-11-16 22:28 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) Rainer Weikusat 2015-11-17 16:13 ` Jason Baron 2015-11-17 20:14 ` David Miller @ 2015-11-19 23:52 ` Rainer Weikusat 2015-11-20 16:03 ` Jason Baron 2015-11-20 22:07 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue Rainer Weikusat 2 siblings, 2 replies; 43+ messages in thread From: Rainer Weikusat @ 2015-11-19 23:52 UTC (permalink / raw) To: Rainer Weikusat Cc: Jason Baron, Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause An AF_UNIX datagram socket being the client in an n:1 association with some server socket is only allowed to send messages to the server if the receive queue of this socket contains at most sk_max_ack_backlog datagrams. This implies that prospective writers might be forced to go to sleep despite none of the message presently enqueued on the server receive queue were sent by them. In order to ensure that these will be woken up once space becomes again available, the present unix_dgram_poll routine does a second sock_poll_wait call with the peer_wait wait queue of the server socket as queue argument (unix_dgram_recvmsg does a wake up on this queue after a datagram was received). This is inherently problematic because the server socket is only guaranteed to remain alive for as long as the client still holds a reference to it. In case the connection is dissolved via connect or by the dead peer detection logic in unix_dgram_sendmsg, the server socket may be freed despite "the polling mechanism" (in particular, epoll) still has a pointer to the corresponding peer_wait queue. There's no way to forcibly deregister a wait queue with epoll. Based on an idea by Jason Baron, the patch below changes the code such that a wait_queue_t belonging to the client socket is enqueued on the peer_wait queue of the server whenever the peer receive queue full condition is detected by either a sendmsg or a poll. A wake up on the peer queue is then relayed to the ordinary wait queue of the client socket via wake function. The connection to the peer wait queue is again dissolved if either a wake up is about to be relayed or the client socket reconnects or a dead peer is detected or the client socket is itself closed. This enables removing the second sock_poll_wait from unix_dgram_poll, thus avoiding the use-after-free, while still ensuring that no blocked writer sleeps forever. Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets") --- This has been created around midnight at the end of a work day. I've read through it a couple of times and found no errors. But I'm not "ideally awake and attentive" right now. diff --git a/include/net/af_unix.h b/include/net/af_unix.h index b36d837..2a91a05 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -62,6 +62,7 @@ struct unix_sock { #define UNIX_GC_CANDIDATE 0 #define UNIX_GC_MAYBE_CYCLE 1 struct socket_wq peer_wq; + wait_queue_t peer_wake; }; static inline struct unix_sock *unix_sk(const struct sock *sk) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 94f6582..6962ff1 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -326,6 +326,112 @@ found: return s; } +/* Support code for asymmetrically connected dgram sockets + * + * If a datagram socket is connected to a socket not itself connected + * to the first socket (eg, /dev/log), clients may only enqueue more + * messages if the present receive queue of the server socket is not + * "too large". This means there's a second writeability condition + * poll and sendmsg need to test. The dgram recv code will do a wake + * up on the peer_wait wait queue of a socket upon reception of a + * datagram which needs to be propagated to sleeping would-be writers + * since these might not have sent anything so far. This can't be + * accomplished via poll_wait because the lifetime of the server + * socket might be less than that of its clients if these break their + * association with it or if the server socket is closed while clients + * are still connected to it and there's no way to inform "a polling + * implementation" that it should let go of a certain wait queue + * + * In order to propagate a wake up, a wait_queue_t of the client + * socket is enqueued on the peer_wait queue of the server socket + * whose wake function does a wake_up on the ordinary client socket + * wait queue. This connection is established whenever a write (or + * poll for write) hit the flow control condition and broken when the + * association to the server socket is dissolved or after a wake up + * was relayed. + */ + +static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags, + void *key) +{ + struct unix_sock *u; + wait_queue_head_t *u_sleep; + + u = container_of(q, struct unix_sock, peer_wake); + + __remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait, + q); + u->peer_wake.private = NULL; + + /* relaying can only happen while the wq still exists */ + u_sleep = sk_sleep(&u->sk); + if (u_sleep) + wake_up_interruptible_poll(u_sleep, key); + + return 0; +} + +static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other) +{ + struct unix_sock *u, *u_other; + int rc; + + u = unix_sk(sk); + u_other = unix_sk(other); + rc = 0; + spin_lock(&u_other->peer_wait.lock); + + if (!u->peer_wake.private) { + u->peer_wake.private = other; + __add_wait_queue(&u_other->peer_wait, &u->peer_wake); + + rc = 1; + } + + spin_unlock(&u_other->peer_wait.lock); + return rc; +} + +static int unix_dgram_peer_wake_disconnect(struct sock *sk, struct sock *other) +{ + struct unix_sock *u, *u_other; + int rc; + + u = unix_sk(sk); + u_other = unix_sk(other); + rc = 0; + spin_lock(&u_other->peer_wait.lock); + + if (u->peer_wake.private == other) { + __remove_wait_queue(&u_other->peer_wait, &u->peer_wake); + u->peer_wake.private = NULL; + + rc = 1; + } + + spin_unlock(&u_other->peer_wait.lock); + return rc; +} + +/* preconditions: + * - unix_peer(sk) == other + * - association is stable + */ +static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other) +{ + int connected; + + connected = unix_dgram_peer_wake_connect(sk, other); + + if (unix_recvq_full(other)) + return 1; + + if (connected) + unix_dgram_peer_wake_disconnect(sk, other); + + return 0; +} + static inline int unix_writable(struct sock *sk) { return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf; @@ -430,6 +536,8 @@ static void unix_release_sock(struct sock *sk, int embrion) skpair->sk_state_change(skpair); sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP); } + + unix_dgram_peer_wake_disconnect(sk, skpair); sock_put(skpair); /* It may now die */ unix_peer(sk) = NULL; } @@ -664,6 +772,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern) INIT_LIST_HEAD(&u->link); mutex_init(&u->readlock); /* single task reading lock */ init_waitqueue_head(&u->peer_wait); + init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay); unix_insert_socket(unix_sockets_unbound(sk), sk); out: if (sk == NULL) @@ -1031,6 +1140,13 @@ restart: if (unix_peer(sk)) { struct sock *old_peer = unix_peer(sk); unix_peer(sk) = other; + + if (unix_dgram_peer_wake_disconnect(sk, old_peer)) + wake_up_interruptible_poll(sk_sleep(sk), + POLLOUT | + POLLWRNORM | + POLLWRBAND); + unix_state_double_unlock(sk, other); if (other != old_peer) @@ -1470,6 +1586,7 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg, struct scm_cookie scm; int max_level; int data_len = 0; + int sk_locked; wait_for_unix_gc(); err = scm_send(sock, msg, &scm, false); @@ -1548,12 +1665,14 @@ restart: goto out_free; } + sk_locked = 0; unix_state_lock(other); +restart_locked: err = -EPERM; if (!unix_may_send(sk, other)) goto out_unlock; - if (sock_flag(other, SOCK_DEAD)) { + if (unlikely(sock_flag(other, SOCK_DEAD))) { /* * Check with 1003.1g - what should * datagram error @@ -1561,10 +1680,19 @@ restart: unix_state_unlock(other); sock_put(other); + if (!sk_locked) + unix_state_lock(sk); + err = 0; - unix_state_lock(sk); if (unix_peer(sk) == other) { unix_peer(sk) = NULL; + + if (unix_dgram_peer_wake_disconnect(sk, other)) + wake_up_interruptible_poll(sk_sleep(sk), + POLLOUT | + POLLWRNORM | + POLLWRBAND); + unix_state_unlock(sk); unix_dgram_disconnected(sk, other); @@ -1590,21 +1718,35 @@ restart: goto out_unlock; } - if (unix_peer(other) != sk && unix_recvq_full(other)) { - if (!timeo) { + if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) { + if (timeo) { + timeo = unix_wait_for_peer(other, timeo); + + err = sock_intr_errno(timeo); + if (signal_pending(current)) + goto out_free; + + goto restart; + } + + if (unix_peer(sk) != other || + unix_dgram_peer_wake_me(sk, other)) { err = -EAGAIN; goto out_unlock; } - timeo = unix_wait_for_peer(other, timeo); + if (!sk_locked) { + unix_state_unlock(other); + unix_state_double_lock(sk, other); - err = sock_intr_errno(timeo); - if (signal_pending(current)) - goto out_free; - - goto restart; + sk_locked = 1; + goto restart_locked; + } } + if (unlikely(sk_locked)) + unix_state_unlock(sk); + if (sock_flag(other, SOCK_RCVTSTAMP)) __net_timestamp(skb); maybe_add_creds(skb, sock, other); @@ -1618,6 +1760,8 @@ restart: return len; out_unlock: + if (sk_locked) + unix_state_unlock(sk); unix_state_unlock(other); out_free: kfree_skb(skb); @@ -2453,14 +2597,16 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock, return mask; writable = unix_writable(sk); - other = unix_peer_get(sk); - if (other) { - if (unix_peer(other) != sk) { - sock_poll_wait(file, &unix_sk(other)->peer_wait, wait); - if (unix_recvq_full(other)) - writable = 0; - } - sock_put(other); + if (writable) { + unix_state_lock(sk); + + other = unix_peer(sk); + if (other && unix_peer(other) != sk && + unix_recvq_full(other) && + unix_dgram_peer_wake_me(sk, other)) + writable = 0; + + unix_state_unlock(sk); } if (writable) ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) 2015-11-19 23:52 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) Rainer Weikusat @ 2015-11-20 16:03 ` Jason Baron 2015-11-20 16:21 ` Rainer Weikusat 2015-11-20 22:07 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue Rainer Weikusat 1 sibling, 1 reply; 43+ messages in thread From: Jason Baron @ 2015-11-20 16:03 UTC (permalink / raw) To: Rainer Weikusat Cc: Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause On 11/19/2015 06:52 PM, Rainer Weikusat wrote: [...] > @@ -1590,21 +1718,35 @@ restart: > goto out_unlock; > } > > - if (unix_peer(other) != sk && unix_recvq_full(other)) { > - if (!timeo) { > + if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) { > + if (timeo) { > + timeo = unix_wait_for_peer(other, timeo); > + > + err = sock_intr_errno(timeo); > + if (signal_pending(current)) > + goto out_free; > + > + goto restart; > + } > + > + if (unix_peer(sk) != other || > + unix_dgram_peer_wake_me(sk, other)) { > err = -EAGAIN; > goto out_unlock; > } Hi, So here we are calling unix_dgram_peer_wake_me() without the sk lock the first time through - right? In that case, we can end up registering on the queue of other for the callback but we might have already connected to a different remote. In that case, the wakeup will crash if 'sk' has freed in the meantime. Thanks, -Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) 2015-11-20 16:03 ` Jason Baron @ 2015-11-20 16:21 ` Rainer Weikusat 0 siblings, 0 replies; 43+ messages in thread From: Rainer Weikusat @ 2015-11-20 16:21 UTC (permalink / raw) To: Jason Baron Cc: Rainer Weikusat, Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause Jason Baron <jbaron@akamai.com> writes: > On 11/19/2015 06:52 PM, Rainer Weikusat wrote: > > [...] > >> @@ -1590,21 +1718,35 @@ restart: >> goto out_unlock; >> } >> >> - if (unix_peer(other) != sk && unix_recvq_full(other)) { >> - if (!timeo) { >> + if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) { >> + if (timeo) { >> + timeo = unix_wait_for_peer(other, timeo); >> + >> + err = sock_intr_errno(timeo); >> + if (signal_pending(current)) >> + goto out_free; >> + >> + goto restart; >> + } >> + >> + if (unix_peer(sk) != other || >> + unix_dgram_peer_wake_me(sk, other)) { >> err = -EAGAIN; >> goto out_unlock; >> } > > Hi, > > So here we are calling unix_dgram_peer_wake_me() without the sk lock the first time > through - right? Yes. And this is obviously wrong. I spend most of the 'evening time' (some people would call that 'night time') with testing this and didn't get to read through it again yet. Thank you for pointing this out. I'll send an updated patch shortly. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-19 23:52 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) Rainer Weikusat 2015-11-20 16:03 ` Jason Baron @ 2015-11-20 22:07 ` Rainer Weikusat 2015-11-23 16:21 ` Jason Baron 2015-11-23 17:30 ` David Miller 1 sibling, 2 replies; 43+ messages in thread From: Rainer Weikusat @ 2015-11-20 22:07 UTC (permalink / raw) To: Rainer Weikusat Cc: Jason Baron, Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause Rainer Weikusat <rweikusat@mobileactivedefense.com> writes: An AF_UNIX datagram socket being the client in an n:1 association with some server socket is only allowed to send messages to the server if the receive queue of this socket contains at most sk_max_ack_backlog datagrams. This implies that prospective writers might be forced to go to sleep despite none of the message presently enqueued on the server receive queue were sent by them. In order to ensure that these will be woken up once space becomes again available, the present unix_dgram_poll routine does a second sock_poll_wait call with the peer_wait wait queue of the server socket as queue argument (unix_dgram_recvmsg does a wake up on this queue after a datagram was received). This is inherently problematic because the server socket is only guaranteed to remain alive for as long as the client still holds a reference to it. In case the connection is dissolved via connect or by the dead peer detection logic in unix_dgram_sendmsg, the server socket may be freed despite "the polling mechanism" (in particular, epoll) still has a pointer to the corresponding peer_wait queue. There's no way to forcibly deregister a wait queue with epoll. Based on an idea by Jason Baron, the patch below changes the code such that a wait_queue_t belonging to the client socket is enqueued on the peer_wait queue of the server whenever the peer receive queue full condition is detected by either a sendmsg or a poll. A wake up on the peer queue is then relayed to the ordinary wait queue of the client socket via wake function. The connection to the peer wait queue is again dissolved if either a wake up is about to be relayed or the client socket reconnects or a dead peer is detected or the client socket is itself closed. This enables removing the second sock_poll_wait from unix_dgram_poll, thus avoiding the use-after-free, while still ensuring that no blocked writer sleeps forever. Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets") --- - uninvert the lock/ check code in _dgram_sendmsg - introduce a unix_dgram_peer_wake_disconnect_wakuep helper function as there were two calls with a wakeup immediately following and two without diff --git a/include/net/af_unix.h b/include/net/af_unix.h index b36d837..2a91a05 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -62,6 +62,7 @@ struct unix_sock { #define UNIX_GC_CANDIDATE 0 #define UNIX_GC_MAYBE_CYCLE 1 struct socket_wq peer_wq; + wait_queue_t peer_wake; }; static inline struct unix_sock *unix_sk(const struct sock *sk) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 94f6582..3d93b0d 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -326,6 +326,118 @@ found: return s; } +/* Support code for asymmetrically connected dgram sockets + * + * If a datagram socket is connected to a socket not itself connected + * to the first socket (eg, /dev/log), clients may only enqueue more + * messages if the present receive queue of the server socket is not + * "too large". This means there's a second writeability condition + * poll and sendmsg need to test. The dgram recv code will do a wake + * up on the peer_wait wait queue of a socket upon reception of a + * datagram which needs to be propagated to sleeping would-be writers + * since these might not have sent anything so far. This can't be + * accomplished via poll_wait because the lifetime of the server + * socket might be less than that of its clients if these break their + * association with it or if the server socket is closed while clients + * are still connected to it and there's no way to inform "a polling + * implementation" that it should let go of a certain wait queue + * + * In order to propagate a wake up, a wait_queue_t of the client + * socket is enqueued on the peer_wait queue of the server socket + * whose wake function does a wake_up on the ordinary client socket + * wait queue. This connection is established whenever a write (or + * poll for write) hit the flow control condition and broken when the + * association to the server socket is dissolved or after a wake up + * was relayed. + */ + +static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags, + void *key) +{ + struct unix_sock *u; + wait_queue_head_t *u_sleep; + + u = container_of(q, struct unix_sock, peer_wake); + + __remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait, + q); + u->peer_wake.private = NULL; + + /* relaying can only happen while the wq still exists */ + u_sleep = sk_sleep(&u->sk); + if (u_sleep) + wake_up_interruptible_poll(u_sleep, key); + + return 0; +} + +static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other) +{ + struct unix_sock *u, *u_other; + int rc; + + u = unix_sk(sk); + u_other = unix_sk(other); + rc = 0; + spin_lock(&u_other->peer_wait.lock); + + if (!u->peer_wake.private) { + u->peer_wake.private = other; + __add_wait_queue(&u_other->peer_wait, &u->peer_wake); + + rc = 1; + } + + spin_unlock(&u_other->peer_wait.lock); + return rc; +} + +static void unix_dgram_peer_wake_disconnect(struct sock *sk, + struct sock *other) +{ + struct unix_sock *u, *u_other; + + u = unix_sk(sk); + u_other = unix_sk(other); + spin_lock(&u_other->peer_wait.lock); + + if (u->peer_wake.private == other) { + __remove_wait_queue(&u_other->peer_wait, &u->peer_wake); + u->peer_wake.private = NULL; + } + + spin_unlock(&u_other->peer_wait.lock); +} + +static void unix_dgram_peer_wake_disconnect_wakeup(struct sock *sk, + struct sock *other) +{ + unix_dgram_peer_wake_disconnect(sk, other); + wake_up_interruptible_poll(sk_sleep(sk), + POLLOUT | + POLLWRNORM | + POLLWRBAND); +} + +/* preconditions: + * - unix_peer(sk) == other + * - association is stable + */ +static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other) +{ + int connected; + + connected = unix_dgram_peer_wake_connect(sk, other); + + if (unix_recvq_full(other)) + return 1; + + if (connected) + unix_dgram_peer_wake_disconnect(sk, other); + + return 0; +} + static inline int unix_writable(struct sock *sk) { return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf; @@ -430,6 +542,8 @@ static void unix_release_sock(struct sock *sk, int embrion) skpair->sk_state_change(skpair); sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP); } + + unix_dgram_peer_wake_disconnect(sk, skpair); sock_put(skpair); /* It may now die */ unix_peer(sk) = NULL; } @@ -664,6 +778,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern) INIT_LIST_HEAD(&u->link); mutex_init(&u->readlock); /* single task reading lock */ init_waitqueue_head(&u->peer_wait); + init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay); unix_insert_socket(unix_sockets_unbound(sk), sk); out: if (sk == NULL) @@ -1031,6 +1146,8 @@ restart: if (unix_peer(sk)) { struct sock *old_peer = unix_peer(sk); unix_peer(sk) = other; + unix_dgram_peer_wake_disconnect_wakeup(sk, old_peer); + unix_state_double_unlock(sk, other); if (other != old_peer) @@ -1470,6 +1587,7 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg, struct scm_cookie scm; int max_level; int data_len = 0; + int sk_locked; wait_for_unix_gc(); err = scm_send(sock, msg, &scm, false); @@ -1548,12 +1666,14 @@ restart: goto out_free; } + sk_locked = 0; unix_state_lock(other); +restart_locked: err = -EPERM; if (!unix_may_send(sk, other)) goto out_unlock; - if (sock_flag(other, SOCK_DEAD)) { + if (unlikely(sock_flag(other, SOCK_DEAD))) { /* * Check with 1003.1g - what should * datagram error @@ -1561,10 +1681,14 @@ restart: unix_state_unlock(other); sock_put(other); + if (!sk_locked) + unix_state_lock(sk); + err = 0; - unix_state_lock(sk); if (unix_peer(sk) == other) { unix_peer(sk) = NULL; + unix_dgram_peer_wake_disconnect_wakeup(sk, other); + unix_state_unlock(sk); unix_dgram_disconnected(sk, other); @@ -1590,21 +1714,38 @@ restart: goto out_unlock; } - if (unix_peer(other) != sk && unix_recvq_full(other)) { - if (!timeo) { - err = -EAGAIN; - goto out_unlock; + if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) { + if (timeo) { + timeo = unix_wait_for_peer(other, timeo); + + err = sock_intr_errno(timeo); + if (signal_pending(current)) + goto out_free; + + goto restart; } - timeo = unix_wait_for_peer(other, timeo); + if (!sk_locked) { + unix_state_unlock(other); + unix_state_double_lock(sk, other); + } - err = sock_intr_errno(timeo); - if (signal_pending(current)) - goto out_free; + if (unix_peer(sk) != other || + unix_dgram_peer_wake_me(sk, other)) { + err = -EAGAIN; + sk_locked = 1; + goto out_unlock; + } - goto restart; + if (!sk_locked) { + sk_locked = 1; + goto restart_locked; + } } + if (unlikely(sk_locked)) + unix_state_unlock(sk); + if (sock_flag(other, SOCK_RCVTSTAMP)) __net_timestamp(skb); maybe_add_creds(skb, sock, other); @@ -1618,6 +1759,8 @@ restart: return len; out_unlock: + if (sk_locked) + unix_state_unlock(sk); unix_state_unlock(other); out_free: kfree_skb(skb); @@ -2453,14 +2596,16 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock, return mask; writable = unix_writable(sk); - other = unix_peer_get(sk); - if (other) { - if (unix_peer(other) != sk) { - sock_poll_wait(file, &unix_sk(other)->peer_wait, wait); - if (unix_recvq_full(other)) - writable = 0; - } - sock_put(other); + if (writable) { + unix_state_lock(sk); + + other = unix_peer(sk); + if (other && unix_peer(other) != sk && + unix_recvq_full(other) && + unix_dgram_peer_wake_me(sk, other)) + writable = 0; + + unix_state_unlock(sk); } if (writable) ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-20 22:07 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue Rainer Weikusat @ 2015-11-23 16:21 ` Jason Baron 2015-11-23 17:30 ` David Miller 1 sibling, 0 replies; 43+ messages in thread From: Jason Baron @ 2015-11-23 16:21 UTC (permalink / raw) To: Rainer Weikusat Cc: Dmitry Vyukov, syzkaller, Michal Kubecek, Al Viro, linux-fsdevel, LKML, David Miller, Hannes Frederic Sowa, David Howells, Paul Moore, salyzyn, sds, ying.xue, netdev, Kostya Serebryany, Alexander Potapenko, Andrey Konovalov, Sasha Levin, Julien Tinnes, Kees Cook, Mathias Krause On 11/20/2015 05:07 PM, Rainer Weikusat wrote: > Rainer Weikusat <rweikusat@mobileactivedefense.com> writes: > An AF_UNIX datagram socket being the client in an n:1 association with > some server socket is only allowed to send messages to the server if the > receive queue of this socket contains at most sk_max_ack_backlog > datagrams. This implies that prospective writers might be forced to go > to sleep despite none of the message presently enqueued on the server > receive queue were sent by them. In order to ensure that these will be > woken up once space becomes again available, the present unix_dgram_poll > routine does a second sock_poll_wait call with the peer_wait wait queue > of the server socket as queue argument (unix_dgram_recvmsg does a wake > up on this queue after a datagram was received). This is inherently > problematic because the server socket is only guaranteed to remain alive > for as long as the client still holds a reference to it. In case the > connection is dissolved via connect or by the dead peer detection logic > in unix_dgram_sendmsg, the server socket may be freed despite "the > polling mechanism" (in particular, epoll) still has a pointer to the > corresponding peer_wait queue. There's no way to forcibly deregister a > wait queue with epoll. > > Based on an idea by Jason Baron, the patch below changes the code such > that a wait_queue_t belonging to the client socket is enqueued on the > peer_wait queue of the server whenever the peer receive queue full > condition is detected by either a sendmsg or a poll. A wake up on the > peer queue is then relayed to the ordinary wait queue of the client > socket via wake function. The connection to the peer wait queue is again > dissolved if either a wake up is about to be relayed or the client > socket reconnects or a dead peer is detected or the client socket is > itself closed. This enables removing the second sock_poll_wait from > unix_dgram_poll, thus avoiding the use-after-free, while still ensuring > that no blocked writer sleeps forever. > > Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> > Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets") Looks good to me. Reviewed-by: Jason Baron <jbaron@akamai.com> Thanks, -Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-20 22:07 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue Rainer Weikusat 2015-11-23 16:21 ` Jason Baron @ 2015-11-23 17:30 ` David Miller 2015-11-23 21:37 ` Rainer Weikusat 1 sibling, 1 reply; 43+ messages in thread From: David Miller @ 2015-11-23 17:30 UTC (permalink / raw) To: rweikusat Cc: jbaron, dvyukov, syzkaller, mkubecek, viro, linux-fsdevel, linux-kernel, hannes, dhowells, paul, salyzyn, sds, ying.xue, netdev, kcc, glider, andreyknvl, sasha.levin, jln, keescook, minipli From: Rainer Weikusat <rweikusat@mobileactivedefense.com> Date: Fri, 20 Nov 2015 22:07:23 +0000 > Rainer Weikusat <rweikusat@mobileactivedefense.com> writes: > An AF_UNIX datagram socket being the client in an n:1 association with > some server socket is only allowed to send messages to the server if the > receive queue of this socket contains at most sk_max_ack_backlog > datagrams. This implies that prospective writers might be forced to go > to sleep despite none of the message presently enqueued on the server > receive queue were sent by them. In order to ensure that these will be > woken up once space becomes again available, the present unix_dgram_poll > routine does a second sock_poll_wait call with the peer_wait wait queue > of the server socket as queue argument (unix_dgram_recvmsg does a wake > up on this queue after a datagram was received). This is inherently > problematic because the server socket is only guaranteed to remain alive > for as long as the client still holds a reference to it. In case the > connection is dissolved via connect or by the dead peer detection logic > in unix_dgram_sendmsg, the server socket may be freed despite "the > polling mechanism" (in particular, epoll) still has a pointer to the > corresponding peer_wait queue. There's no way to forcibly deregister a > wait queue with epoll. > > Based on an idea by Jason Baron, the patch below changes the code such > that a wait_queue_t belonging to the client socket is enqueued on the > peer_wait queue of the server whenever the peer receive queue full > condition is detected by either a sendmsg or a poll. A wake up on the > peer queue is then relayed to the ordinary wait queue of the client > socket via wake function. The connection to the peer wait queue is again > dissolved if either a wake up is about to be relayed or the client > socket reconnects or a dead peer is detected or the client socket is > itself closed. This enables removing the second sock_poll_wait from > unix_dgram_poll, thus avoiding the use-after-free, while still ensuring > that no blocked writer sleeps forever. > > Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> > Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets") Applied and queued up for -stable, thanks to you and Jason for all of your hard work on this fix. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-23 17:30 ` David Miller @ 2015-11-23 21:37 ` Rainer Weikusat 2015-11-23 23:06 ` Rainer Weikusat 0 siblings, 1 reply; 43+ messages in thread From: Rainer Weikusat @ 2015-11-23 21:37 UTC (permalink / raw) To: David Miller Cc: rweikusat, jbaron, dvyukov, syzkaller, mkubecek, viro, linux-fsdevel, linux-kernel, hannes, dhowells, paul, salyzyn, sds, ying.xue, netdev, kcc, glider, andreyknvl, sasha.levin, jln, keescook, minipli David Miller <davem@davemloft.net> writes: > From: Rainer Weikusat <rweikusat@mobileactivedefense.com> >> Rainer Weikusat <rweikusat@mobileactivedefense.com> writes: >> An AF_UNIX datagram socket being the client in an n:1 association [...] > Applied and queued up for -stable, I'm sorry for this 13th hour request/ suggestion but while thinking about a reply to Dmitry, it occurred to me that the restart_locked/ sk_locked logic could be avoided by moving the test for this condition in front of all the others while leaving the 'act on it' code at its back, ie, reorganize unix_dgram_sendmsg such that it looks like this: unix_state_lock(other); if (unix_peer(other) != sk && unix_recvq_full(other)) { need_wait = 1; if (!timeo) { unix_state_unlock(other); unix_state_double_lock(sk, other); if (unix_peer(other) == sk || (unix_peer(sk) == other && !unix_dgram_peer_wake_me(sk, other))) need_wait = 0; unix_state_unlock(sk); } } /* original code here */ if (need_wait) { if (timeo) { timeo = unix_wait_for_peer(other, timeo); err = sock_intr_errno(timeo); if (signal_pending(current)) goto out_free; goto restart; } err = -EAGAIN; goto out_unlock; } /* original tail here */ This might cause a socket to be enqueued to the peer despite it's not allowed to send to it but I don't think this matters much. This is a less conservative modification but one which results in simpler code overall. The kernel I'm currently running has been modified like this and 'survived' the usual tests. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue 2015-11-23 21:37 ` Rainer Weikusat @ 2015-11-23 23:06 ` Rainer Weikusat 0 siblings, 0 replies; 43+ messages in thread From: Rainer Weikusat @ 2015-11-23 23:06 UTC (permalink / raw) To: David Miller Cc: rweikusat, jbaron, dvyukov, syzkaller, mkubecek, viro, linux-fsdevel, linux-kernel, hannes, dhowells, paul, salyzyn, sds, ying.xue, netdev, kcc, glider, andreyknvl, sasha.levin, jln, keescook, minipli Rainer Weikusat <rw@doppelsaurus.mobileactivedefense.com> writes: > David Miller <davem@davemloft.net> writes: [...] > I'm sorry for this 13th hour request/ suggestion but while thinking > about a reply to Dmitry, it occurred to me that the restart_locked/ > sk_locked logic could be avoided by moving the test for this condition > in front of all the others while leaving the 'act on it' code at its > back, ie, reorganize unix_dgram_sendmsg such that it looks like this: [...] Just in case this is unclear on its own: If this was considered an improvement by someone other than me, I could supply either a "complete" patch with this re-arrangement or a cleanup delta patch changing the previous change. ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2015-11-23 23:06 UTC | newest] Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-10-12 11:07 Use-after-free in ep_remove_wait_queue Dmitry Vyukov 2015-10-12 12:02 ` Michal Kubecek 2015-10-12 12:14 ` Eric Dumazet 2015-10-12 12:17 ` Dmitry Vyukov 2015-11-06 13:06 ` Dmitry Vyukov 2015-11-06 14:58 ` Jason Baron 2015-11-06 15:15 ` Rainer Weikusat 2015-11-09 14:40 ` [PATCH] unix: avoid use-after-free " Rainer Weikusat 2015-11-09 18:25 ` David Miller 2015-11-10 17:16 ` Rainer Weikusat 2015-11-09 22:44 ` Jason Baron 2015-11-10 17:38 ` Rainer Weikusat 2015-11-22 21:43 ` alternate queueing mechanism (was: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue) Rainer Weikusat 2015-11-10 21:55 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue Rainer Weikusat 2015-11-11 12:28 ` Hannes Frederic Sowa 2015-11-11 16:12 ` Rainer Weikusat 2015-11-11 18:52 ` Hannes Frederic Sowa 2015-11-13 19:06 ` Rainer Weikusat 2015-11-11 17:35 ` Jason Baron 2015-11-12 19:11 ` Rainer Weikusat 2015-11-13 18:51 ` Rainer Weikusat 2015-11-13 22:17 ` Jason Baron 2015-11-15 18:32 ` Rainer Weikusat 2015-11-17 16:08 ` Jason Baron 2015-11-17 18:38 ` Rainer Weikusat 2015-11-16 22:15 ` Rainer Weikusat 2015-11-16 22:28 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) Rainer Weikusat 2015-11-17 16:13 ` Jason Baron 2015-11-17 20:14 ` David Miller 2015-11-17 21:37 ` Rainer Weikusat 2015-11-17 22:09 ` Rainer Weikusat 2015-11-19 23:48 ` Rainer Weikusat 2015-11-17 22:48 ` Rainer Weikusat 2015-11-18 18:15 ` Rainer Weikusat 2015-11-18 23:39 ` more statistics (was: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:)) Rainer Weikusat 2015-11-19 23:52 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:) Rainer Weikusat 2015-11-20 16:03 ` Jason Baron 2015-11-20 16:21 ` Rainer Weikusat 2015-11-20 22:07 ` [PATCH] unix: avoid use-after-free in ep_remove_wait_queue Rainer Weikusat 2015-11-23 16:21 ` Jason Baron 2015-11-23 17:30 ` David Miller 2015-11-23 21:37 ` Rainer Weikusat 2015-11-23 23:06 ` Rainer Weikusat
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).