linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bug 4.1.16: self-detected stall in net/unix/?
@ 2016-02-02 16:25 Philipp Hahn
  2016-02-03  1:43 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 17+ messages in thread
From: Philipp Hahn @ 2016-02-02 16:25 UTC (permalink / raw)
  To: Sasha Levin, Hannes Frederic Sowa, Rainer Weikusat, Andrey Vagin,
	Aaron Conole, David S. Miller, linux-kernel
  Cc: Greg Kroah-Hartman

Hi,

we recently updated our kernel to 4.1.16 + patch for "unix: properly
account for FDs passed over unix sockets" and have since then
self-detected stalls triggered by the Samba daemon:

> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007] INFO: rcu_sched self-detected stall on CPU { 3}  (t=162780 jiffies g=47565 c=47564 q=1055670)
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007] Task dump for CPU 3:
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007] smbd            R  running task        0  5938      1 0x0000000c
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  0000000000000004 ffffffff81851340 ffffffff810d3c84 000000000000b9cd
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  ffff8801bfd97100 ffffffff81851340 ffffffff81851340 ffffffff818f6c60
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  ffffffff810d7659 0000000000000000 0000000000000000 00001e847fc2f700
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007] Call Trace:
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  <IRQ>  [<ffffffff810d3c84>] ? rcu_dump_cpu_stacks+0x84/0xc0
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff810d7659>] ? rcu_check_callbacks+0x449/0x740
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff810ec7c0>] ? tick_sched_do_timer+0x40/0x40
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff810dcc54>] ? update_process_times+0x34/0x70
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff810ec45c>] ? tick_sched_handle.isra.12+0x2c/0x70
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff810ec809>] ? tick_sched_timer+0x49/0x80
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff810dd57d>] ? __run_hrtimer+0x6d/0x1b0
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff810ddd4d>] ? hrtimer_interrupt+0xed/0x210
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff815a0ed9>] ? smp_apic_timer_interrupt+0x39/0x50
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff8159ef7e>] ? apic_timer_interrupt+0x6e/0x80

> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  <EOI>  [<ffffffff8159de85>] ? _raw_spin_lock+0x35/0x50
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff8153b343>] ? unix_dgram_connect+0x93/0x200
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff8147f248>] ? SYSC_connect+0xe8/0x100
> Feb  1 09:03:14 dcs1 kernel: [ 1152.840007]  [<ffffffff8159e0f2>] ? system_call_fast_compare_end+0xc/0x6b


> Feb  1 11:48:13 ucs22f kernel: [307999.162254] INFO: rcu_sched self-detected stall on CPU { 0}  (t=5250 jiffies                                                                   g=6586733 c=6586732 q=6757)
> Feb  1 11:48:13 ucs22f kernel: [307999.162264] Task dump for CPU 0:
> Feb  1 11:48:13 ucs22f kernel: [307999.162267] smbd            R running      0  4615   4609 0x00000008
> Feb  1 11:48:13 ucs22f kernel: [307999.162272]  00200082 f5863b90 c10b3fe9 c1682cc0 c1682cc0 c1682cc0 f79d2b00 f                                                                  5863bdc
> Feb  1 11:48:13 ucs22f kernel: [307999.162276]  c10b722d c15bd400 00001482 0064816d 0064816c 00001a65 f79cd840 c                                                                  108b166
> Feb  1 11:48:13 ucs22f kernel: [307999.162280]  00000001 f5863bb8 f5863bb8 00000000 c1682cc0 f6808cf0 00001a65 f                                                                  6808cf0
> Feb  1 11:48:13 ucs22f kernel: [307999.162285] Call Trace:
> Feb  1 11:48:13 ucs22f kernel: [307999.162296]  [<c10b3fe9>] ? rcu_dump_cpu_stacks+0x79/0xc0
> Feb  1 11:48:13 ucs22f kernel: [307999.162300]  [<c10b722d>] ? rcu_check_callbacks+0x3cd/0x630
> Feb  1 11:48:13 ucs22f kernel: [307999.162304]  [<c108b166>] ? account_process_tick+0x66/0x160
> Feb  1 11:48:13 ucs22f kernel: [307999.162307]  [<c10bbe4f>] ? update_process_times+0x2f/0x60
> Feb  1 11:48:13 ucs22f kernel: [307999.162310]  [<c10cbf9d>] ? tick_sched_handle.isra.12+0x2d/0x60
> Feb  1 11:48:13 ucs22f kernel: [307999.162328]  [<c10cc210>] ? tick_sched_timer+0x40/0x80
> Feb  1 11:48:13 ucs22f kernel: [307999.162331]  [<c10bc6b0>] ? __remove_hrtimer+0x40/0xa0
> Feb  1 11:48:13 ucs22f kernel: [307999.162334]  [<c10bc97f>] ? __run_hrtimer+0x6f/0x190
> Feb  1 11:48:13 ucs22f kernel: [307999.162337]  [<c10cc1d0>] ? tick_sched_do_timer+0x30/0x30
> Feb  1 11:48:13 ucs22f kernel: [307999.162339]  [<c10bd15f>] ? hrtimer_interrupt+0xef/0x260
> Feb  1 11:48:13 ucs22f kernel: [307999.162343]  [<c119ae3d>] ? getname_kernel+0x2d/0x100
> Feb  1 11:48:13 ucs22f kernel: [307999.162348]  [<c1046f7f>] ? local_apic_timer_interrupt+0x2f/0x60
> Feb  1 11:48:13 ucs22f kernel: [307999.162353]  [<c14e4543>] ? smp_apic_timer_interrupt+0x33/0x50
> Feb  1 11:48:13 ucs22f kernel: [307999.162355]  [<c14e3c7c>] ? apic_timer_interrupt+0x34/0x3c

> Feb  1 11:48:13 ucs22f kernel: [307999.162358]  [<c14e2dc1>] ? _raw_spin_lock+0x51/0x70
> Feb  1 11:48:13 ucs22f kernel: [307999.162362]  [<c148c075>] ? unix_state_double_lock+0x25/0x60
> Feb  1 11:48:13 ucs22f kernel: [307999.162365]  [<c148de10>] ? unix_dgram_connect+0x90/0x1f0
> Feb  1 11:48:13 ucs22f kernel: [307999.162369]  [<c13e4267>] ? SYSC_connect+0xc7/0xe0
> Feb  1 11:48:13 ucs22f kernel: [307999.162371]  [<c13e2931>] ? sock_map_fd+0x41/0x60
> Feb  1 11:48:13 ucs22f kernel: [307999.162374]  [<c13e5014>] ? SYSC_socketcall+0x1b4/0xa20
> Feb  1 11:48:13 ucs22f kernel: [307999.162376]  [<c10c2940>] ? ktime_get+0x50/0x100
> Feb  1 11:48:13 ucs22f kernel: [307999.162379]  [<c10466db>] ? lapic_next_event+0x1b/0x20
> Feb  1 11:48:13 ucs22f kernel: [307999.162381]  [<c10ca0ed>] ? clockevents_program_event+0x9d/0x140
> Feb  1 11:48:13 ucs22f kernel: [307999.162385]  [<c129e068>] ? list_del+0x8/0x20
> Feb  1 11:48:13 ucs22f kernel: [307999.162388]  [<c1097ef7>] ? remove_wait_queue+0x27/0x40
> Feb  1 11:48:13 ucs22f kernel: [307999.162392]  [<c11c8795>] ? inotify_read+0x295/0x340
> Feb  1 11:48:13 ucs22f kernel: [307999.162396]  [<c10acc76>] ? handle_irq_event_percpu+0xa6/0x1a0
> Feb  1 11:48:13 ucs22f kernel: [307999.162399]  [<c11a786f>] ? set_close_on_exec+0x2f/0x60
> Feb  1 11:48:13 ucs22f kernel: [307999.162402]  [<c119d084>] ? do_fcntl+0x2f4/0x4e0
> Feb  1 11:48:13 ucs22f kernel: [307999.162405]  [<c107d6df>] ? commit_creds+0xff/0x1f0
> Feb  1 11:48:13 ucs22f kernel: [307999.162407]  [<c119d380>] ? SyS_fcntl64+0x60/0x100
> Feb  1 11:48:13 ucs22f kernel: [307999.162409]  [<c13e5953>] ? SyS_socketcall+0x13/0x20
> Feb  1 11:48:13 ucs22f kernel: [307999.162412]  [<c14e30db>] ? sysenter_do_call+0x12/0x12

We have not yet been able to reproduce the hang, but going back to our
previous kernel 4.1.12 makes the problem go away.

Is this a known issue or do you have an idea where to look?
What information should I collect next time it happens?

(Can unix_diag.ko with `ss` help?)
What other kernel configs should I enable do debug this dead-lock?

Thanks in advance
Philipp

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug 4.1.16: self-detected stall in net/unix/?
  2016-02-02 16:25 Bug 4.1.16: self-detected stall in net/unix/? Philipp Hahn
@ 2016-02-03  1:43 ` Hannes Frederic Sowa
  2016-02-05 15:28   ` Philipp Hahn
  0 siblings, 1 reply; 17+ messages in thread
From: Hannes Frederic Sowa @ 2016-02-03  1:43 UTC (permalink / raw)
  To: Philipp Hahn, Sasha Levin, Rainer Weikusat, Andrey Vagin,
	Aaron Conole, David S. Miller, linux-kernel
  Cc: Greg Kroah-Hartman

On 02.02.2016 17:25, Philipp Hahn wrote:
> Hi,
>
> we recently updated our kernel to 4.1.16 + patch for "unix: properly
> account for FDs passed over unix sockets" and have since then
> self-detected stalls triggered by the Samba daemon:
>
 > >
 > > [...]
 > >
>
> We have not yet been able to reproduce the hang, but going back to our
> previous kernel 4.1.12 makes the problem go away.

Can you remove the patch "unix: properly account for FDs passed over 
unix sockets" and see if the problem still happens?

I couldn't quickly see any problems with your added patch. I currently 
suspect a tight loop because of a SOCK_DEAD flag set but the socket not 
removed from unix_socket_table or the vfs. Hmmm...

The stack trace is rather unreliable, maybe something completely 
different happend. Do you happend to see better reports?

> Is this a known issue or do you have an idea where to look?
> What information should I collect next time it happens?
>
> (Can unix_diag.ko with `ss` help?)

Would be interesting if it is a conventional file based socket or 
autobounded one (sun_path[0] != '\0').

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug 4.1.16: self-detected stall in net/unix/?
  2016-02-03  1:43 ` Hannes Frederic Sowa
@ 2016-02-05 15:28   ` Philipp Hahn
  2016-02-11 13:47     ` Philipp Hahn
  0 siblings, 1 reply; 17+ messages in thread
From: Philipp Hahn @ 2016-02-05 15:28 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Sasha Levin, Rainer Weikusat, Andrey Vagin,
	Aaron Conole, David S. Miller, linux-kernel, Eric Dumazet
  Cc: Greg Kroah-Hartman

Hello Hannes,

thank you for your previous reply.

Am 03.02.2016 um 02:43 schrieb Hannes Frederic Sowa:
> On 02.02.2016 17:25, Philipp Hahn wrote:
>> we recently updated our kernel to 4.1.16 + patch for "unix: properly
>> account for FDs passed over unix sockets" and have since then
>> self-detected stalls triggered by the Samba daemon:
...
>> We have not yet been able to reproduce the hang, but going back to our
>> previous kernel 4.1.12 makes the problem go away.
> 
> Can you remove the patch "unix: properly account for FDs passed over
> unix sockets" and see if the problem still happens?

I will try.
The problem is that I can't trigger the bug reliably. It always happens
to "smbd", but I don't know the triggering condition.

> I couldn't quickly see any problems with your added patch. I currently
> suspect a tight loop because of a SOCK_DEAD flag set but the socket not
> removed from unix_socket_table or the vfs. Hmmm...

All occurrences of the bug show _raw_spin_lock() as the sleeping
function, never anything other thus far.
I have seen the bug both on amd64 and i386.

If it would spin in
> 1097 restart:
> 1098 »···»···other = unix_find_other(net, sunaddr, alen, sock->type, hash, &err);
> 1099 »···»···if (!other)
> 1100 »···»···»···goto out;
> 1101 
> 1102 »···»···unix_state_double_lock(sk, other);
> 1103 
> 1104 »···»···/* Apparently VFS overslept socket death. Retry. */
> 1105 »···»···if (sock_flag(other, SOCK_DEAD)) {
> 1106 »···»···»···unix_state_double_unlock(sk, other);
> 1107 »···»···»···sock_put(other);
> 1108 »···»···»···goto restart;
> 1109 »···»···}

I would expect some other function to show up in the trace, but it's
always the call chain "unix_dgram_connect -> unix_state_double_lock ->
_raw_spin_lock".

The only case I see is that unix_find_other() returns the same "other"
again and again.

Q: How will that dead "other" be garbage collected finally? The kernel is
> # CONFIG_PREEMPT_NONE is not set
> CONFIG_PREEMPT_VOLUNTARY=y
> # CONFIG_PREEMPT is not set


During my hunt I found the following difference between 4.4 and 4.1.16
in net/unix/af_unix.c:

> commit 1586a5877db9eee313379738d6581bc7c6ffb5e3
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Fri Oct 23 10:59:16 2015 -0700
> 
>     af_unix: do not report POLLOUT on listeners
>     
>     poll(POLLOUT) on a listener should not report fd is ready for
>     a write().
>     
>     This would break some applications using poll() and pfd.events = -1,
>     as they would not block in poll()
>     
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Reported-by: Alan Burlison <Alan.Burlison@oracle.com>
>     Tested-by: Eric Dumazet <edumazet@google.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
...
> -static inline int unix_writable(struct sock *sk)
> +static int unix_writable(const struct sock *sk)
>  {
> -       return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
> +       return sk->sk_state != TCP_LISTEN &&
> +              (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
>  }

Q: That *TCP*_LISTEN in net/*unix*/ looks strange to my eyes, but I'm
not a kernel guru (yet).

There is one hunk in 5c77e26862ce604edea05b3442ed765e9756fe0f, which
uses the result of that function, which might explain the difference,
why it shows with 4.1.16 but not with 4.4?

> @@ -2245,14 +2388,16 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
>         writable = unix_writable(sk);
> -       other = unix_peer_get(sk);
> -       if (other) {
> -               if (unix_peer(other) != sk) {
> -                       sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
> -                       if (unix_recvq_full(other))
> -                               writable = 0;
> -               }
> -               sock_put(other);
> +       if (writable) {
> +               unix_state_lock(sk);
> +
> +               other = unix_peer(sk);
> +               if (other && unix_peer(other) != sk &&
> +                   unix_recvq_full(other) &&
> +                   unix_dgram_peer_wake_me(sk, other))
> +                       writable = 0;
> +
> +               unix_state_unlock(sk);
>         }
>         if (writable)


I will for now build a new kernel with
> $ git log --oneline  v4.1.12..v4.1.17 -- net/unix
> dc6b0ec unix: properly account for FDs passed over unix sockets
> cc01a0a af_unix: Revert 'lock_interruptible' in stream receive code
> 5c77e26 unix: avoid use-after-free in ep_remove_wait_queue
reverted to see if it still happens. The "middle" patch seems harmless,
as it only changes a code path for STREAMS, while the bug triggers with
DGRAMS only.

> The stack trace is rather unreliable, maybe something completely
> different happend. Do you happend to see better reports?

So far they look all the same.
Anything more I can do to prepare for collection better information next
time I get that bug?

Thanks again.

Philipp

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug 4.1.16: self-detected stall in net/unix/?
  2016-02-05 15:28   ` Philipp Hahn
@ 2016-02-11 13:47     ` Philipp Hahn
  2016-02-11 15:55       ` Rainer Weikusat
  0 siblings, 1 reply; 17+ messages in thread
From: Philipp Hahn @ 2016-02-11 13:47 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Sasha Levin, Rainer Weikusat,
	David S. Miller, linux-kernel, Karolin Seeger, Jason Baron,
	Ben Hutchings
  Cc: Greg Kroah-Hartman, Arvid Requate, Stefan Gohmann

[-- Attachment #1: Type: text/plain, Size: 4587 bytes --]

Hi,

Am 05.02.2016 um 16:28 schrieb Philipp Hahn:
> Am 03.02.2016 um 02:43 schrieb Hannes Frederic Sowa:
>> On 02.02.2016 17:25, Philipp Hahn wrote:
>>> we recently updated our kernel to 4.1.16 + patch for "unix: properly
>>> account for FDs passed over unix sockets" and have since then
>>> self-detected stalls triggered by the Samba daemon:
> ...
>>> We have not yet been able to reproduce the hang, but going back to our
>>> previous kernel 4.1.12 makes the problem go away.
>>
>> Can you remove the patch "unix: properly account for FDs passed over
>> unix sockets" and see if the problem still happens?
> 
> I will try.
> The problem is that I can't trigger the bug reliably. It always happens
> to "smbd", but I don't know the triggering condition.

Probably the same bug was also reported to samba-technical by Karolin
Seeger; she filed the bug for 3.19-ckt with Ubuntu:

<https://bugs.launchpad.net/ubuntu/+source/linux-lts-trusty/+bug/1543980>

Running the Samba test suite reproduces the problem; see bug for details.


> I will for now build a new kernel with
>> $ git log --oneline  v4.1.12..v4.1.17 -- net/unix
>> dc6b0ec unix: properly account for FDs passed over unix sockets
>> cc01a0a af_unix: Revert 'lock_interruptible' in stream receive code
>> 5c77e26 unix: avoid use-after-free in ep_remove_wait_queue
> reverted to see if it still happens. The "middle" patch seems harmless,
> as it only changes a code path for STREAMS, while the bug triggers with
> DGRAMS only.
> 
>> The stack trace is rather unreliable, maybe something completely
>> different happend. Do you happend to see better reports?
> 
> So far they look all the same.
> Anything more I can do to prepare for collection better information next
> time I get that bug?

I've enabled more Kernel debug options and got the following:

> [  598.482787] 
> [  598.492559] =====================================
> [  598.502646] [ BUG: bad unlock balance detected! ]
> [  598.512874] 4.1.16+ #24 Not tainted
> [  598.523134] -------------------------------------
> [  598.533592] smbd/8659 is trying to release lock (&(&u->lock)->rlock) at:
> [  598.544429] [<ffffffff815d1319>] spin_unlock+0x9/0x10
> [  598.555148] but there are no more locks to release!
> [  598.565892] 
> [  598.565892] other info that might help us debug this:
> [  598.586936] no locks held by smbd/8659.
> [  598.597478] 
> [  598.597478] stack backtrace:
> [  598.618275] CPU: 3 PID: 8659 Comm: smbd Not tainted 4.1.16+ #24
> [  598.628820] Hardware name: System manufacturer System Product Name/P7F-X Series, BIOS 0703    09/24/2010
> [  598.650020]  ffffffff815d1319 ffff8800b8efbb88 ffffffff8163ee73 0000000000000000
> [  598.661051]  ffff880034fc4110 ffff8800b8efbbb8 ffffffff810db540 ffff880034fc4110
> [  598.671990]  ffff880034fc4110 ffff88023206bd40 ffffffff815d1319 ffff8800b8efbc08
> [  598.682736] Call Trace:
> [  598.693187]  [<ffffffff815d1319>] ? spin_unlock+0x9/0x10
> [  598.703798]  [<ffffffff8163ee73>] dump_stack+0x4c/0x65
> [  598.714223]  [<ffffffff810db540>] print_unlock_imbalance_bug+0x100/0x110
> [  598.724611]  [<ffffffff815d1319>] ? spin_unlock+0x9/0x10
> [  598.734763]  [<ffffffff810e0d8e>] lock_release+0x2be/0x430
> [  598.744636]  [<ffffffff81648303>] _raw_spin_unlock+0x23/0x40
> [  598.754230]  [<ffffffff815d41a8>] ? unix_dgram_sendmsg+0x288/0x6f0
> [  598.763840]  [<ffffffff815d1319>] spin_unlock+0x9/0x10
> [  598.773126]  [<ffffffff815d41e7>] unix_dgram_sendmsg+0x2c7/0x6f0
> [  598.782209]  [<ffffffff814f6c9d>] sock_sendmsg+0x4d/0x60
> [  598.791313]  [<ffffffff814f7c3b>] ___sys_sendmsg+0x2db/0x2f0
> [  598.800369]  [<ffffffff812083c8>] ? kmem_cache_free+0x328/0x360
> [  598.809383]  [<ffffffff8127c1c0>] ? locks_free_lock+0x50/0x60
> [  598.818157]  [<ffffffff814f8649>] __sys_sendmsg+0x49/0x90
> [  598.826742]  [<ffffffff814f86a2>] SyS_sendmsg+0x12/0x20
> [  598.835110]  [<ffffffff816486f2>] system_call_fastpath+0x16/0x7a
> [  598.843546] ------------[ cut here ]------------
> [  598.851999] WARNING: CPU: 3 PID: 8659 at net/core/skbuff.c:691 skb_release_head_state+0xaa/0xb0()

I continued bisecting "v4.1.12..v4.1.17 -- net/unix/" and found:

Reverting the attached patch in 4.1 fixes my problem (for now). The
original patch went into 4.4, but was back-ported to several stable trees:

v3.2: a3b0f6e8a21ef02f69a15abac440572d8cde8c2a
v3.18: 72032798034d921ed565e3bf8dfdc3098f6473e2
v4.1: 5c77e26862ce604edea05b3442ed765e9756fe0f
v4.2: bad967fdd8ecbdd171f5f243657be033d2d081a7
v4.3: 58a6a46a036ce81a2a8ecaa6fc1537c894349e3f
v4.4: 7d267278a9ece963d77eefec61630223fce08c6c

Philipp

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-unix-avoid-use-after-free-in-ep_remove_wait_queue.patch --]
[-- Type: text/x-diff; name="0001-unix-avoid-use-after-free-in-ep_remove_wait_queue.patch", Size: 10713 bytes --]

From 7d267278a9ece963d77eefec61630223fce08c6c Mon Sep 17 00:00:00 2001
Message-Id: <7d267278a9ece963d77eefec61630223fce08c6c.1455198101.git.hahn@univention.de>
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Date: Fri, 20 Nov 2015 22:07:23 +0000
Subject: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue
Organization: Univention GmbH, Bremen, Germany

Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.

Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.

Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
Reviewed-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/af_unix.h |   1 +
 net/unix/af_unix.c    | 183 ++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 165 insertions(+), 19 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index b36d837..2a91a05 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -62,6 +62,7 @@ struct unix_sock {
 #define UNIX_GC_CANDIDATE	0
 #define UNIX_GC_MAYBE_CYCLE	1
 	struct socket_wq	peer_wq;
+	wait_queue_t		peer_wake;
 };
 
 static inline struct unix_sock *unix_sk(const struct sock *sk)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 955ec15..4e95bdf 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -326,6 +326,118 @@ found:
 	return s;
 }
 
+/* Support code for asymmetrically connected dgram sockets
+ *
+ * If a datagram socket is connected to a socket not itself connected
+ * to the first socket (eg, /dev/log), clients may only enqueue more
+ * messages if the present receive queue of the server socket is not
+ * "too large". This means there's a second writeability condition
+ * poll and sendmsg need to test. The dgram recv code will do a wake
+ * up on the peer_wait wait queue of a socket upon reception of a
+ * datagram which needs to be propagated to sleeping would-be writers
+ * since these might not have sent anything so far. This can't be
+ * accomplished via poll_wait because the lifetime of the server
+ * socket might be less than that of its clients if these break their
+ * association with it or if the server socket is closed while clients
+ * are still connected to it and there's no way to inform "a polling
+ * implementation" that it should let go of a certain wait queue
+ *
+ * In order to propagate a wake up, a wait_queue_t of the client
+ * socket is enqueued on the peer_wait queue of the server socket
+ * whose wake function does a wake_up on the ordinary client socket
+ * wait queue. This connection is established whenever a write (or
+ * poll for write) hit the flow control condition and broken when the
+ * association to the server socket is dissolved or after a wake up
+ * was relayed.
+ */
+
+static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags,
+				      void *key)
+{
+	struct unix_sock *u;
+	wait_queue_head_t *u_sleep;
+
+	u = container_of(q, struct unix_sock, peer_wake);
+
+	__remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait,
+			    q);
+	u->peer_wake.private = NULL;
+
+	/* relaying can only happen while the wq still exists */
+	u_sleep = sk_sleep(&u->sk);
+	if (u_sleep)
+		wake_up_interruptible_poll(u_sleep, key);
+
+	return 0;
+}
+
+static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other)
+{
+	struct unix_sock *u, *u_other;
+	int rc;
+
+	u = unix_sk(sk);
+	u_other = unix_sk(other);
+	rc = 0;
+	spin_lock(&u_other->peer_wait.lock);
+
+	if (!u->peer_wake.private) {
+		u->peer_wake.private = other;
+		__add_wait_queue(&u_other->peer_wait, &u->peer_wake);
+
+		rc = 1;
+	}
+
+	spin_unlock(&u_other->peer_wait.lock);
+	return rc;
+}
+
+static void unix_dgram_peer_wake_disconnect(struct sock *sk,
+					    struct sock *other)
+{
+	struct unix_sock *u, *u_other;
+
+	u = unix_sk(sk);
+	u_other = unix_sk(other);
+	spin_lock(&u_other->peer_wait.lock);
+
+	if (u->peer_wake.private == other) {
+		__remove_wait_queue(&u_other->peer_wait, &u->peer_wake);
+		u->peer_wake.private = NULL;
+	}
+
+	spin_unlock(&u_other->peer_wait.lock);
+}
+
+static void unix_dgram_peer_wake_disconnect_wakeup(struct sock *sk,
+						   struct sock *other)
+{
+	unix_dgram_peer_wake_disconnect(sk, other);
+	wake_up_interruptible_poll(sk_sleep(sk),
+				   POLLOUT |
+				   POLLWRNORM |
+				   POLLWRBAND);
+}
+
+/* preconditions:
+ *	- unix_peer(sk) == other
+ *	- association is stable
+ */
+static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other)
+{
+	int connected;
+
+	connected = unix_dgram_peer_wake_connect(sk, other);
+
+	if (unix_recvq_full(other))
+		return 1;
+
+	if (connected)
+		unix_dgram_peer_wake_disconnect(sk, other);
+
+	return 0;
+}
+
 static int unix_writable(const struct sock *sk)
 {
 	return sk->sk_state != TCP_LISTEN &&
@@ -431,6 +543,8 @@ static void unix_release_sock(struct sock *sk, int embrion)
 			skpair->sk_state_change(skpair);
 			sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP);
 		}
+
+		unix_dgram_peer_wake_disconnect(sk, skpair);
 		sock_put(skpair); /* It may now die */
 		unix_peer(sk) = NULL;
 	}
@@ -666,6 +780,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern)
 	INIT_LIST_HEAD(&u->link);
 	mutex_init(&u->readlock); /* single task reading lock */
 	init_waitqueue_head(&u->peer_wait);
+	init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay);
 	unix_insert_socket(unix_sockets_unbound(sk), sk);
 out:
 	if (sk == NULL)
@@ -1033,6 +1148,8 @@ restart:
 	if (unix_peer(sk)) {
 		struct sock *old_peer = unix_peer(sk);
 		unix_peer(sk) = other;
+		unix_dgram_peer_wake_disconnect_wakeup(sk, old_peer);
+
 		unix_state_double_unlock(sk, other);
 
 		if (other != old_peer)
@@ -1472,6 +1589,7 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 	struct scm_cookie scm;
 	int max_level;
 	int data_len = 0;
+	int sk_locked;
 
 	wait_for_unix_gc();
 	err = scm_send(sock, msg, &scm, false);
@@ -1550,12 +1668,14 @@ restart:
 		goto out_free;
 	}
 
+	sk_locked = 0;
 	unix_state_lock(other);
+restart_locked:
 	err = -EPERM;
 	if (!unix_may_send(sk, other))
 		goto out_unlock;
 
-	if (sock_flag(other, SOCK_DEAD)) {
+	if (unlikely(sock_flag(other, SOCK_DEAD))) {
 		/*
 		 *	Check with 1003.1g - what should
 		 *	datagram error
@@ -1563,10 +1683,14 @@ restart:
 		unix_state_unlock(other);
 		sock_put(other);
 
+		if (!sk_locked)
+			unix_state_lock(sk);
+
 		err = 0;
-		unix_state_lock(sk);
 		if (unix_peer(sk) == other) {
 			unix_peer(sk) = NULL;
+			unix_dgram_peer_wake_disconnect_wakeup(sk, other);
+
 			unix_state_unlock(sk);
 
 			unix_dgram_disconnected(sk, other);
@@ -1592,21 +1716,38 @@ restart:
 			goto out_unlock;
 	}
 
-	if (unix_peer(other) != sk && unix_recvq_full(other)) {
-		if (!timeo) {
-			err = -EAGAIN;
-			goto out_unlock;
+	if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
+		if (timeo) {
+			timeo = unix_wait_for_peer(other, timeo);
+
+			err = sock_intr_errno(timeo);
+			if (signal_pending(current))
+				goto out_free;
+
+			goto restart;
 		}
 
-		timeo = unix_wait_for_peer(other, timeo);
+		if (!sk_locked) {
+			unix_state_unlock(other);
+			unix_state_double_lock(sk, other);
+		}
 
-		err = sock_intr_errno(timeo);
-		if (signal_pending(current))
-			goto out_free;
+		if (unix_peer(sk) != other ||
+		    unix_dgram_peer_wake_me(sk, other)) {
+			err = -EAGAIN;
+			sk_locked = 1;
+			goto out_unlock;
+		}
 
-		goto restart;
+		if (!sk_locked) {
+			sk_locked = 1;
+			goto restart_locked;
+		}
 	}
 
+	if (unlikely(sk_locked))
+		unix_state_unlock(sk);
+
 	if (sock_flag(other, SOCK_RCVTSTAMP))
 		__net_timestamp(skb);
 	maybe_add_creds(skb, sock, other);
@@ -1620,6 +1761,8 @@ restart:
 	return len;
 
 out_unlock:
+	if (sk_locked)
+		unix_state_unlock(sk);
 	unix_state_unlock(other);
 out_free:
 	kfree_skb(skb);
@@ -2476,14 +2619,16 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
 		return mask;
 
 	writable = unix_writable(sk);
-	other = unix_peer_get(sk);
-	if (other) {
-		if (unix_peer(other) != sk) {
-			sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
-			if (unix_recvq_full(other))
-				writable = 0;
-		}
-		sock_put(other);
+	if (writable) {
+		unix_state_lock(sk);
+
+		other = unix_peer(sk);
+		if (other && unix_peer(other) != sk &&
+		    unix_recvq_full(other) &&
+		    unix_dgram_peer_wake_me(sk, other))
+			writable = 0;
+
+		unix_state_unlock(sk);
 	}
 
 	if (writable)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Bug 4.1.16: self-detected stall in net/unix/?
  2016-02-11 13:47     ` Philipp Hahn
@ 2016-02-11 15:55       ` Rainer Weikusat
  2016-02-11 17:03         ` Ben Hutchings
  0 siblings, 1 reply; 17+ messages in thread
From: Rainer Weikusat @ 2016-02-11 15:55 UTC (permalink / raw)
  To: Philipp Hahn
  Cc: Hannes Frederic Sowa, Sasha Levin, Rainer Weikusat,
	David S. Miller, linux-kernel, Karolin Seeger, Jason Baron,
	Ben Hutchings, Greg Kroah-Hartman, Arvid Requate, Stefan Gohmann

Philipp Hahn <pmhahn@pmhahn.de> writes:

[...]

> Probably the same bug was also reported to samba-technical by Karolin
> Seeger; she filed the bug for 3.19-ckt with Ubuntu:
>
> <https://bugs.launchpad.net/ubuntu/+source/linux-lts-trusty/+bug/1543980>
>
> Running the Samba test suite reproduces the problem; see bug for
> details.


JFTR: The oops in this bug report is for 3.13.0-77 and the patch you
reverted for 4.1 is not part of that (at least not of the upstream 3.13).


[...]

>> [  598.492559] =====================================
>> [  598.502646] [ BUG: bad unlock balance detected! ]
>> [  598.512874] 4.1.16+ #24 Not tainted
>> [  598.523134] -------------------------------------
>> [  598.533592] smbd/8659 is trying to release lock (&(&u->lock)->rlock) at:
>> [  598.544429] [<ffffffff815d1319>] spin_unlock+0x9/0x10
>> [  598.555148] but there are no more locks to release!
>> [  598.565892] 
>> [  598.565892] other info that might help us debug this:
>> [  598.586936] no locks held by smbd/8659.
>> [  598.597478] 
>> [  598.597478] stack backtrace:
>> [  598.618275] CPU: 3 PID: 8659 Comm: smbd Not tainted 4.1.16+ #24
>> [  598.628820] Hardware name: System manufacturer System Product Name/P7F-X Series, BIOS 0703    09/24/2010
>> [  598.650020]  ffffffff815d1319 ffff8800b8efbb88 ffffffff8163ee73 0000000000000000
>> [  598.661051]  ffff880034fc4110 ffff8800b8efbbb8 ffffffff810db540 ffff880034fc4110
>> [  598.671990]  ffff880034fc4110 ffff88023206bd40 ffffffff815d1319 ffff8800b8efbc08
>> [  598.682736] Call Trace:
>> [  598.693187]  [<ffffffff815d1319>] ? spin_unlock+0x9/0x10
>> [  598.703798]  [<ffffffff8163ee73>] dump_stack+0x4c/0x65
>> [  598.714223]  [<ffffffff810db540>] print_unlock_imbalance_bug+0x100/0x110
>> [  598.724611]  [<ffffffff815d1319>] ? spin_unlock+0x9/0x10
>> [  598.734763]  [<ffffffff810e0d8e>] lock_release+0x2be/0x430
>> [  598.744636]  [<ffffffff81648303>] _raw_spin_unlock+0x23/0x40
>> [  598.754230]  [<ffffffff815d41a8>] ? unix_dgram_sendmsg+0x288/0x6f0
>> [  598.763840]  [<ffffffff815d1319>] spin_unlock+0x9/0x10
>> [  598.773126]  [<ffffffff815d41e7>] unix_dgram_sendmsg+0x2c7/0x6f0
>> [  598.782209]  [<ffffffff814f6c9d>] sock_sendmsg+0x4d/0x60
>> [  598.791313]  [<ffffffff814f7c3b>] ___sys_sendmsg+0x2db/0x2f0
>> [  598.800369]  [<ffffffff812083c8>] ? kmem_cache_free+0x328/0x360
>> [  598.809383]  [<ffffffff8127c1c0>] ? locks_free_lock+0x50/0x60
>> [  598.818157]  [<ffffffff814f8649>] __sys_sendmsg+0x49/0x90
>> [  598.826742]  [<ffffffff814f86a2>] SyS_sendmsg+0x12/0x20
>> [  598.835110]  [<ffffffff816486f2>] system_call_fastpath+0x16/0x7a
>> [  598.843546] ------------[ cut here ]------------
>> [  598.851999] WARNING: CPU: 3 PID: 8659 at net/core/skbuff.c:691 skb_release_head_state+0xaa/0xb0()

Could you send your disassembled unix_dgram_sendmsg (objdump -d)? This
would be enormously helpful wrt determining which unlock is involved
here.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug 4.1.16: self-detected stall in net/unix/?
  2016-02-11 15:55       ` Rainer Weikusat
@ 2016-02-11 17:03         ` Ben Hutchings
  2016-02-11 17:40           ` Rainer Weikusat
  0 siblings, 1 reply; 17+ messages in thread
From: Ben Hutchings @ 2016-02-11 17:03 UTC (permalink / raw)
  To: Rainer Weikusat, Philipp Hahn
  Cc: Hannes Frederic Sowa, Sasha Levin, David S. Miller, linux-kernel,
	Karolin Seeger, Jason Baron, Greg Kroah-Hartman, Arvid Requate,
	Stefan Gohmann

[-- Attachment #1: Type: text/plain, Size: 1920 bytes --]

On Thu, 2016-02-11 at 15:55 +0000, Rainer Weikusat wrote:
> Philipp Hahn <pmhahn@pmhahn.de> writes:
> 
> [...]
> 
> > Probably the same bug was also reported to samba-technical by Karolin
> > Seeger; she filed the bug for 3.19-ckt with Ubuntu:
> > 
> > 
> > 
> > Running the Samba test suite reproduces the problem; see bug for
> > details.
> 
> 
> JFTR: The oops in this bug report is for 3.13.0-77 and the patch you
> reverted for 4.1 is not part of that (at least not of the upstream 3.13).
[...]

It is in 3.13-ckt and basically all the stable branches.

Does the patch below fix this bug?

Ben.

---
unix: Fix potential double-unlock in unix_dgram_sendmsg()

A datagram socket may be peered with itself, so that sk == other.  We
use unix_state_double_lock() to lock sk and other in the right order,
which also guards against this and only locks the socket once, but we
then end up trying to unlock it twice.  Add the check for sk != other.

Reported-by: Philipp Hahn <pmhahn@pmhahn.de>
Fixes: 7d267278a9ec ("unix: avoid use-after-free in ep_remove_wait_queue")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 net/unix/af_unix.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index c5bf5ef2bf89..b4320d3e3a25 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1810,7 +1810,7 @@ restart_locked:
 		}
 	}
 
-	if (unlikely(sk_locked))
+	if (unlikely(sk_locked) && sk != other)
 		unix_state_unlock(sk);
 
 	if (sock_flag(other, SOCK_RCVTSTAMP))
@@ -1826,7 +1826,7 @@ restart_locked:
 	return len;
 
 out_unlock:
-	if (sk_locked)
+	if (sk_locked && sk != other)
 		unix_state_unlock(sk);
 	unix_state_unlock(other);
 out_free:

-- 
Ben Hutchings
Who are all these weirdos? - David Bowie, reading IRC for the first time

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Bug 4.1.16: self-detected stall in net/unix/?
  2016-02-11 17:03         ` Ben Hutchings
@ 2016-02-11 17:40           ` Rainer Weikusat
  2016-02-11 17:54             ` Rainer Weikusat
  2016-02-11 18:31             ` Rainer Weikusat
  0 siblings, 2 replies; 17+ messages in thread
From: Rainer Weikusat @ 2016-02-11 17:40 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Rainer Weikusat, Philipp Hahn, Hannes Frederic Sowa, Sasha Levin,
	David S. Miller, linux-kernel, Karolin Seeger, Jason Baron,
	Greg Kroah-Hartman, Arvid Requate, Stefan Gohmann

Ben Hutchings <ben@decadent.org.uk> writes:
> On Thu, 2016-02-11 at 15:55 +0000, Rainer Weikusat wrote:
>> Philipp Hahn <pmhahn@pmhahn.de> writes:
>> 
>> [...]
>> 
>> > Probably the same bug was also reported to samba-technical by Karolin
>> > Seeger; she filed the bug for 3.19-ckt with Ubuntu:
>> > 
>> > 
>> > 
>> > Running the Samba test suite reproduces the problem; see bug for
>> > details.
>> 
>> 
>> JFTR: The oops in this bug report is for 3.13.0-77 and the patch you
>> reverted for 4.1 is not part of that (at least not of the upstream 3.13).
> [...]
>
> It is in 3.13-ckt and basically all the stable branches.
>
> Does the patch below fix this bug?
>
> Ben.
>
> ---
> unix: Fix potential double-unlock in unix_dgram_sendmsg()
>
> A datagram socket may be peered with itself, so that sk == other.  We
> use unix_state_double_lock() to lock sk and other in the right order,
> which also guards against this and only locks the socket once, but we
> then end up trying to unlock it twice.  Add the check for sk != other.

That's a good observation but I think this happens in another way. The
code setting sk_logged is

	if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
		if (timeo) {
			timeo = unix_wait_for_peer(other, timeo);

			err = sock_intr_errno(timeo);
			if (signal_pending(current))
				goto out_free;

			goto restart;
		}

		if (!sk_locked) {
			unix_state_unlock(other);
			unix_state_double_lock(sk, other);
		}

		if (unix_peer(sk) != other ||
		    unix_dgram_peer_wake_me(sk, other)) {
			err = -EAGAIN;
			sk_locked = 1;
			goto out_unlock;
		}

		if (!sk_locked) {
			sk_locked = 1;
			goto restart_locked;
		}
	}

This means it only gets locked if unix_peer(other) != sk and this cannot
happen if other == sk and unix_peer(sk) == other, however, the 2nd
condition isn't guaranteed: other might indeed be == sk and not the peer
of it because someone could be using _sendmsg to send a message via a
socket to an address bound to the same socket. In this case, other was
found via

	if (!other) {
		err = -ECONNRESET;
		if (sunaddr == NULL)
			goto out_free;

		other = unix_find_other(net, sunaddr, namelen, sk->sk_type,
					hash, &err);
		if (other == NULL)
			goto out_free;
	}

and the if-block leading to the double lock should never have been
executed as it's supposed to deal with the case where sk is connect to
other but other not to sk (eg, /dev/log).

If the description correct, the patch below should fix it (as sunaddr
gets cleared if and only if unix_peer(sk) == other.

This is a 'preview', ie, not even compiled. It's provided for
discussion/ testing. I'll try to create a test program for this and a
more formal patch.

---

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 1975fd8..06259ac 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1698,7 +1698,7 @@ restart_locked:
                        goto out_unlock;
        }
 
-       if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
+       if (!sunaddr && unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
                if (timeo) {
                        timeo = unix_wait_for_peer(other, timeo);
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Bug 4.1.16: self-detected stall in net/unix/?
  2016-02-11 17:40           ` Rainer Weikusat
@ 2016-02-11 17:54             ` Rainer Weikusat
  2016-02-11 18:31             ` Rainer Weikusat
  1 sibling, 0 replies; 17+ messages in thread
From: Rainer Weikusat @ 2016-02-11 17:54 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Rainer Weikusat, Philipp Hahn, Hannes Frederic Sowa, Sasha Levin,
	David S. Miller, linux-kernel, Karolin Seeger, Jason Baron,
	Greg Kroah-Hartman, Arvid Requate, Stefan Gohmann

Rainer Weikusat <rw@doppelsaurus.mobileactivedefense.com> writes:

[...]

> This means it only gets locked if unix_peer(other) != sk and this cannot
> happen if other == sk and unix_peer(sk) == other, however, the 2nd
> condition isn't guaranteed: other might indeed be == sk and not the peer
> of it because someone could be using _sendmsg to send a message via a
> socket to an address bound to the same socket. In this case, other was
> found via

A second way to hits this (probably somewhat difficult to trigger in
practice): sk happened to be connected to itself by the time the
unix_peer_get(sk) was executed but was disconnected before the
unix_state_lock(other) below.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug 4.1.16: self-detected stall in net/unix/?
  2016-02-11 17:40           ` Rainer Weikusat
  2016-02-11 17:54             ` Rainer Weikusat
@ 2016-02-11 18:31             ` Rainer Weikusat
  2016-02-11 19:37               ` [PATCH net] af_unix: Guard against other == sk in unix_dgram_sendmsg Rainer Weikusat
  1 sibling, 1 reply; 17+ messages in thread
From: Rainer Weikusat @ 2016-02-11 18:31 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Rainer Weikusat, Philipp Hahn, Hannes Frederic Sowa, Sasha Levin,
	David S. Miller, linux-kernel, Karolin Seeger, Jason Baron,
	Greg Kroah-Hartman, Arvid Requate, Stefan Gohmann

Rainer Weikusat <rw@doppelsaurus.mobileactivedefense.com> writes:
> Ben Hutchings <ben@decadent.org.uk> writes:

[...]

>> unix: Fix potential double-unlock in unix_dgram_sendmsg()
>>
>> A datagram socket may be peered with itself, so that sk == other.  We
>> use unix_state_double_lock() to lock sk and other in the right order,
>> which also guards against this and only locks the socket once, but we
>> then end up trying to unlock it twice.  Add the check for sk != other.

[...]

> other was found via
>
> 	if (!other) {
> 		err = -ECONNRESET;
> 		if (sunaddr == NULL)
> 			goto out_free;
>
> 		other = unix_find_other(net, sunaddr, namelen, sk->sk_type,
> 					hash, &err);
> 		if (other == NULL)
> 			goto out_free;
> 	}
>
> and the if-block leading to the double lock should never have been
> executed as it's supposed to deal with the case where sk is connect to
> other but other not to sk (eg, /dev/log).

Test program triggering a double unlock (w/o the MSG_DONTWAIT, it blocks
which it isn't supposed to do):

--------
#include <stdio.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>

#define SUN_NAME	"\0other-is-me"

int main(int argc, char **argv)
{
    struct sockaddr_un sun;
    unsigned max;
    int sk, rc;

    sk = socket(AF_UNIX, SOCK_DGRAM, 0);

    sun.sun_family = AF_UNIX;
    strncpy(sun.sun_path, SUN_NAME, sizeof(sun.sun_path));
    bind(sk, (struct sockaddr *)&sun, sizeof(sun));

    max = 12;
    do 
	sendto(sk, &sun, sizeof(sun), MSG_DONTWAIT,
	       (struct sockaddr *)&sun, sizeof(sun));
    while (--max);

    return 0;
}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net] af_unix: Guard against other == sk in unix_dgram_sendmsg
  2016-02-11 18:31             ` Rainer Weikusat
@ 2016-02-11 19:37               ` Rainer Weikusat
  2016-02-12  9:19                 ` Philipp Hahn
  2016-02-16 17:54                 ` David Miller
  0 siblings, 2 replies; 17+ messages in thread
From: Rainer Weikusat @ 2016-02-11 19:37 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Rainer Weikusat, Philipp Hahn, Hannes Frederic Sowa, Sasha Levin,
	David S. Miller, linux-kernel, Karolin Seeger, Jason Baron,
	Greg Kroah-Hartman, Arvid Requate, Stefan Gohmann, netdev

The unix_dgram_sendmsg routine use the following test

if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {

to determine if sk and other are in an n:1 association (either
established via connect or by using sendto to send messages to an
unrelated socket identified by address). This isn't correct as the
specified address could have been bound to the sending socket itself or
because this socket could have been connected to itself by the time of
the unix_peer_get but disconnected before the unix_state_lock(other). In
both cases, the if-block would be entered despite other == sk which
might either block the sender unintentionally or lead to trying to unlock
the same spin lock twice for a non-blocking send. Add a other != sk
check to guard against this.

Fixes: 7d267278a9ec ("unix: avoid use-after-free in ep_remove_wait_queue")
Reported-By: Philipp Hahn <pmhahn@pmhahn.de>
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
---
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 29be035..f1ca279 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1781,7 +1781,12 @@ restart_locked:
 			goto out_unlock;
 	}
 
-	if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
+	/* other == sk && unix_peer(other) != sk if
+	 * - unix_peer(sk) == NULL, destination address bound to sk
+	 * - unix_peer(sk) == sk by time of get but disconnected before lock
+	 */
+	if (other != sk &&
+	    unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
 		if (timeo) {
 			timeo = unix_wait_for_peer(other, timeo);
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net] af_unix: Guard against other == sk in unix_dgram_sendmsg
  2016-02-11 19:37               ` [PATCH net] af_unix: Guard against other == sk in unix_dgram_sendmsg Rainer Weikusat
@ 2016-02-12  9:19                 ` Philipp Hahn
  2016-02-12 13:25                   ` Rainer Weikusat
  2016-02-16 17:54                 ` David Miller
  1 sibling, 1 reply; 17+ messages in thread
From: Philipp Hahn @ 2016-02-12  9:19 UTC (permalink / raw)
  To: Rainer Weikusat, Ben Hutchings
  Cc: Hannes Frederic Sowa, Sasha Levin, David S. Miller, linux-kernel,
	Karolin Seeger, Jason Baron, Greg Kroah-Hartman, Arvid Requate,
	Stefan Gohmann, netdev

Hello Rainer,

Am 11.02.2016 um 20:37 schrieb Rainer Weikusat:
> The unix_dgram_sendmsg routine use the following test
> 
> if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
> 
> to determine if sk and other are in an n:1 association (either
> established via connect or by using sendto to send messages to an
> unrelated socket identified by address). This isn't correct as the
> specified address could have been bound to the sending socket itself or
> because this socket could have been connected to itself by the time of
> the unix_peer_get but disconnected before the unix_state_lock(other). In
> both cases, the if-block would be entered despite other == sk which
> might either block the sender unintentionally or lead to trying to unlock
> the same spin lock twice for a non-blocking send. Add a other != sk
> check to guard against this.
> 
> Fixes: 7d267278a9ec ("unix: avoid use-after-free in ep_remove_wait_queue")
> Reported-By: Philipp Hahn <pmhahn@pmhahn.de>
> Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
> ---
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 29be035..f1ca279 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1781,7 +1781,12 @@ restart_locked:
>  			goto out_unlock;
>  	}
>  
> -	if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
> +	/* other == sk && unix_peer(other) != sk if
> +	 * - unix_peer(sk) == NULL, destination address bound to sk
> +	 * - unix_peer(sk) == sk by time of get but disconnected before lock
> +	 */
> +	if (other != sk &&
> +	    unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
>  		if (timeo) {
>  			timeo = unix_wait_for_peer(other, timeo);
>  
> 

After applying that patch at least my machine running the samba test no
longer crashes. So you might add
Tested-by: Philipp Hahn <pmhahn@pmhahn.de>

Thanks for looking it that issues.

Philipp

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] af_unix: Guard against other == sk in unix_dgram_sendmsg
  2016-02-12  9:19                 ` Philipp Hahn
@ 2016-02-12 13:25                   ` Rainer Weikusat
  2016-02-12 19:54                     ` Ben Hutchings
  0 siblings, 1 reply; 17+ messages in thread
From: Rainer Weikusat @ 2016-02-12 13:25 UTC (permalink / raw)
  To: Philipp Hahn
  Cc: Ben Hutchings, Hannes Frederic Sowa, Sasha Levin,
	David S. Miller, linux-kernel, Karolin Seeger, Jason Baron,
	Greg Kroah-Hartman, Arvid Requate, Stefan Gohmann, netdev

Philipp Hahn <pmhahn@pmhahn.de> writes:

> Hello Rainer,
>
> Am 11.02.2016 um 20:37 schrieb Rainer Weikusat:
>> The unix_dgram_sendmsg routine use the following test
>> 
>> if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {

[...]

>> This isn't correct as the> specified address could have been bound to
>> the sending socket itself

[...]

> After applying that patch at least my machine running the samba test no
> longer crashes.

There's a possible gotcha in there: Send-to-self used to be limited by
the queue limit. But the rationale for that (IIRC) was that someone
could keep using newly created sockets to queue ever more data to a
single, unrelated receiver. I don't think this should apply when
receiving and sending sockets are identical. But that's just my
opinion. The other option would be to avoid the unix_state_double_lock
for sk == other. I'd be willing to change this accordingly if someone
thinks the queue limit should apply to send-to-self.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] af_unix: Guard against other == sk in unix_dgram_sendmsg
  2016-02-12 13:25                   ` Rainer Weikusat
@ 2016-02-12 19:54                     ` Ben Hutchings
  2016-02-12 20:17                       ` Rainer Weikusat
  0 siblings, 1 reply; 17+ messages in thread
From: Ben Hutchings @ 2016-02-12 19:54 UTC (permalink / raw)
  To: Rainer Weikusat, Philipp Hahn
  Cc: Hannes Frederic Sowa, Sasha Levin, David S. Miller, linux-kernel,
	Karolin Seeger, Jason Baron, Greg Kroah-Hartman, Arvid Requate,
	Stefan Gohmann, netdev

[-- Attachment #1: Type: text/plain, Size: 1470 bytes --]

On Fri, 2016-02-12 at 13:25 +0000, Rainer Weikusat wrote:
> Philipp Hahn <pmhahn@pmhahn.de> writes:
> 
> > Hello Rainer,
> > 
> > Am 11.02.2016 um 20:37 schrieb Rainer Weikusat:
> > > The unix_dgram_sendmsg routine use the following test
> > > 
> > > if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
> 
> [...]
> 
> > > This isn't correct as the> specified address could have been bound to
> > > the sending socket itself
> 
> [...]
> 
> > After applying that patch at least my machine running the samba test no
> > longer crashes.
> 
> There's a possible gotcha in there: Send-to-self used to be limited by
> the queue limit. But the rationale for that (IIRC) was that someone
> could keep using newly created sockets to queue ever more data to a
> single, unrelated receiver. I don't think this should apply when
> receiving and sending sockets are identical. But that's just my
> opinion. The other option would be to avoid the unix_state_double_lock
> for sk == other.

Given that unix_state_double_lock() already handles sk == other, I'm
not sure why you think it needs to be avoided.

> I'd be willing to change this accordingly if someone
> thinks the queue limit should apply to send-to-self.

If we don't check the queue limit here, does anything else prevent the
queue growing to the point it's a DoS?

Ben.

-- 
Ben Hutchings
I say we take off; nuke the site from orbit.  It's the only way to be sure.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] af_unix: Guard against other == sk in unix_dgram_sendmsg
  2016-02-12 19:54                     ` Ben Hutchings
@ 2016-02-12 20:17                       ` Rainer Weikusat
  2016-02-12 20:47                         ` Ben Hutchings
  0 siblings, 1 reply; 17+ messages in thread
From: Rainer Weikusat @ 2016-02-12 20:17 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Rainer Weikusat, Philipp Hahn, Hannes Frederic Sowa, Sasha Levin,
	David S. Miller, linux-kernel, Karolin Seeger, Jason Baron,
	Greg Kroah-Hartman, Arvid Requate, Stefan Gohmann, netdev

Ben Hutchings <ben@decadent.org.uk> writes:
> On Fri, 2016-02-12 at 13:25 +0000, Rainer Weikusat wrote:
>> Philipp Hahn <pmhahn@pmhahn.de> writes:
>> > Hello Rainer,
>> > 
>> > Am 11.02.2016 um 20:37 schrieb Rainer Weikusat:
>> > > The unix_dgram_sendmsg routine use the following test
>> > > 
>> > > if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
>> 
>> [...]
>> 
>> > > This isn't correct as the> specified address could have been bound to
>> > > the sending socket itself
>> 
>> [...]
>> 
>> > After applying that patch at least my machine running the samba test no
>> > longer crashes.
>> 
>> There's a possible gotcha in there: Send-to-self used to be limited by
>> the queue limit. But the rationale for that (IIRC) was that someone
>> could keep using newly created sockets to queue ever more data to a
>> single, unrelated receiver. I don't think this should apply when
>> receiving and sending sockets are identical. But that's just my
>> opinion. The other option would be to avoid the unix_state_double_lock
>> for sk == other.
>
> Given that unix_state_double_lock() already handles sk == other, I'm
> not sure why you think it needs to be avoided.

Because the whole complication of restarting the operation after locking
both sk and other because other had to be unlocked before calling
unix_state_double_lock is useless for this case: As other == sk, there's
no reason to drop the lock on it which guarantees that the result of all
the earlier checks is still valid: If the -EAGAIN condition is not true,
execution can just continue.

>> I'd be willing to change this accordingly if someone
>> thinks the queue limit should apply to send-to-self.
>
> If we don't check the queue limit here, does anything else prevent the
> queue growing to the point it's a DoS?

The max_dgram_qlen limit exists specifically to prevent someone sending
'a lot' of messages to a socket unrelated to it by repeatedly creating a
socket, sending as many messages as the send buffer size will allow,
closing the socket, creating a new socket, ..., cf

http://netdev.vger.kernel.narkive.com/tcZIFJeC/get-rid-of-proc-sys-net-unix-max-dgram-qlen#post4
(first copy I found)

This 'attack' will obviously not work very well when sending and
receiving socket are identical.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] af_unix: Guard against other == sk in unix_dgram_sendmsg
  2016-02-12 20:17                       ` Rainer Weikusat
@ 2016-02-12 20:47                         ` Ben Hutchings
  2016-02-12 20:59                           ` Rainer Weikusat
  0 siblings, 1 reply; 17+ messages in thread
From: Ben Hutchings @ 2016-02-12 20:47 UTC (permalink / raw)
  To: Rainer Weikusat
  Cc: Philipp Hahn, Hannes Frederic Sowa, Sasha Levin, David S. Miller,
	linux-kernel, Karolin Seeger, Jason Baron, Greg Kroah-Hartman,
	Arvid Requate, Stefan Gohmann, netdev

[-- Attachment #1: Type: text/plain, Size: 3098 bytes --]

On Fri, 2016-02-12 at 20:17 +0000, Rainer Weikusat wrote:
> Ben Hutchings <ben@decadent.org.uk> writes:
> > On Fri, 2016-02-12 at 13:25 +0000, Rainer Weikusat wrote:
> > > Philipp Hahn <pmhahn@pmhahn.de> writes:
> > > > Hello Rainer,
> > > > 
> > > > Am 11.02.2016 um 20:37 schrieb Rainer Weikusat:
> > > > > The unix_dgram_sendmsg routine use the following test
> > > > > 
> > > > > if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
> > > 
> > > [...]
> > > 
> > > > > This isn't correct as the> specified address could have been bound to
> > > > > the sending socket itself
> > > 
> > > [...]
> > > 
> > > > After applying that patch at least my machine running the samba test no
> > > > longer crashes.
> > > 
> > > There's a possible gotcha in there: Send-to-self used to be limited by
> > > the queue limit. But the rationale for that (IIRC) was that someone
> > > could keep using newly created sockets to queue ever more data to a
> > > single, unrelated receiver. I don't think this should apply when
> > > receiving and sending sockets are identical. But that's just my
> > > opinion. The other option would be to avoid the unix_state_double_lock
> > > for sk == other.
> > 
> > Given that unix_state_double_lock() already handles sk == other, I'm
> > not sure why you think it needs to be avoided.
> 
> Because the whole complication of restarting the operation after locking
> both sk and other because other had to be unlocked before calling
> unix_state_double_lock is useless for this case: As other == sk, there's
> no reason to drop the lock on it which guarantees that the result of all
> the earlier checks is still valid: If the -EAGAIN condition is not true,
> execution can just continue.

Well of course it's useless, but it's also harmless.  If we really
wanted to optimise this we could also skip unlocking if other < sk.

> > > I'd be willing to change this accordingly if someone
> > > thinks the queue limit should apply to send-to-self.
> > 
> > If we don't check the queue limit here, does anything else prevent the
> > queue growing to the point it's a DoS?
> 
> The max_dgram_qlen limit exists specifically to prevent someone sending
> 'a lot' of messages to a socket unrelated to it by repeatedly creating a
> socket, sending as many messages as the send buffer size will allow,
> closing the socket, creating a new socket, ..., cf
> 
> http://netdev.vger.kernel.narkive.com/tcZIFJeC/get-rid-of-proc-sys-net-unix-max-dgram-qlen#post4
> (first copy I found)
> 
> This 'attack' will obviously not work very well when sending and
> receiving socket are identical.

It looked to me like the queue length was the only limit here, as I was
looking in vain for a charge to the receiving socket's memory.
However, to answer my own question, AF_UNIX skbs are always charged to
the sending socket (which is the same thing in this case, but still
affects where the buffer limit is applied).

Ben.

-- 
Ben Hutchings
I say we take off; nuke the site from orbit.  It's the only way to be sure.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] af_unix: Guard against other == sk in unix_dgram_sendmsg
  2016-02-12 20:47                         ` Ben Hutchings
@ 2016-02-12 20:59                           ` Rainer Weikusat
  0 siblings, 0 replies; 17+ messages in thread
From: Rainer Weikusat @ 2016-02-12 20:59 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Rainer Weikusat, Philipp Hahn, Hannes Frederic Sowa, Sasha Levin,
	David S. Miller, linux-kernel, Karolin Seeger, Jason Baron,
	Greg Kroah-Hartman, Arvid Requate, Stefan Gohmann, netdev

Ben Hutchings <ben@decadent.org.uk> writes:
> On Fri, 2016-02-12 at 20:17 +0000, Rainer Weikusat wrote:

[...]

>>>> I don't think this should apply when
>>>> receiving and sending sockets are identical. But that's just my
>>>> opinion. The other option would be to avoid the unix_state_double_lock
>>>> for sk == other.
>>> 
>>> Given that unix_state_double_lock() already handles sk == other, I'm
>>> not sure why you think it needs to be avoided.
>> 
>> Because the whole complication of restarting the operation after locking
>> both sk and other because other had to be unlocked before calling
>> unix_state_double_lock is useless for this case:

[...]

> Well of course it's useless, but it's also harmless.  

As is adding a

for (i = 0; i < 1000000; ++i);

between any two statements. And this isn't even entirely true as the
pointless double-lock will then require "did we pointlessly
doube-lock" checks elsewhere. I think it should be possible to do this
in a simpler way by not pointlessly double-locking (this may be
wrong but it's worth a try).

> If we really wanted to optimise this we could also skip unlocking if
> other < sk.

I wouldn't want to hardcode assumptions about the unix_state_double_lock
algorithm in functions using it. 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] af_unix: Guard against other == sk in unix_dgram_sendmsg
  2016-02-11 19:37               ` [PATCH net] af_unix: Guard against other == sk in unix_dgram_sendmsg Rainer Weikusat
  2016-02-12  9:19                 ` Philipp Hahn
@ 2016-02-16 17:54                 ` David Miller
  1 sibling, 0 replies; 17+ messages in thread
From: David Miller @ 2016-02-16 17:54 UTC (permalink / raw)
  To: rweikusat
  Cc: ben, pmhahn, hannes, sasha.levin, linux-kernel, kseeger, jbaron,
	gregkh, requate, gohmann, netdev

From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Date: Thu, 11 Feb 2016 19:37:27 +0000

> The unix_dgram_sendmsg routine use the following test
> 
> if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
> 
> to determine if sk and other are in an n:1 association (either
> established via connect or by using sendto to send messages to an
> unrelated socket identified by address). This isn't correct as the
> specified address could have been bound to the sending socket itself or
> because this socket could have been connected to itself by the time of
> the unix_peer_get but disconnected before the unix_state_lock(other). In
> both cases, the if-block would be entered despite other == sk which
> might either block the sender unintentionally or lead to trying to unlock
> the same spin lock twice for a non-blocking send. Add a other != sk
> check to guard against this.
> 
> Fixes: 7d267278a9ec ("unix: avoid use-after-free in ep_remove_wait_queue")
> Reported-By: Philipp Hahn <pmhahn@pmhahn.de>
> Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>

Also applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-02-16 17:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-02 16:25 Bug 4.1.16: self-detected stall in net/unix/? Philipp Hahn
2016-02-03  1:43 ` Hannes Frederic Sowa
2016-02-05 15:28   ` Philipp Hahn
2016-02-11 13:47     ` Philipp Hahn
2016-02-11 15:55       ` Rainer Weikusat
2016-02-11 17:03         ` Ben Hutchings
2016-02-11 17:40           ` Rainer Weikusat
2016-02-11 17:54             ` Rainer Weikusat
2016-02-11 18:31             ` Rainer Weikusat
2016-02-11 19:37               ` [PATCH net] af_unix: Guard against other == sk in unix_dgram_sendmsg Rainer Weikusat
2016-02-12  9:19                 ` Philipp Hahn
2016-02-12 13:25                   ` Rainer Weikusat
2016-02-12 19:54                     ` Ben Hutchings
2016-02-12 20:17                       ` Rainer Weikusat
2016-02-12 20:47                         ` Ben Hutchings
2016-02-12 20:59                           ` Rainer Weikusat
2016-02-16 17:54                 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).