linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel BUG at net/unix/garbage.c:149!"
@ 2016-08-24 14:24 Nikolay Borisov
  2016-08-24 21:40 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 13+ messages in thread
From: Nikolay Borisov @ 2016-08-24 14:24 UTC (permalink / raw)
  To: mszeredi; +Cc: Linux-Kernel@Vger. Kernel. Org, netdev

Hello, 

I hit the following BUG: 

[1851513.239831] ------------[ cut here ]------------
[1851513.240079] kernel BUG at net/unix/garbage.c:149!
[1851513.240313] invalid opcode: 0000 [#1] SMP 
[1851513.248320] CPU: 37 PID: 11683 Comm: nginx Tainted: G           O    4.4.14-clouder3 #26
[1851513.248719] Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
[1851513.248966] task: ffff883b0f6f0000 ti: ffff880189cf0000 task.ti: ffff880189cf0000
[1851513.249361] RIP: 0010:[<ffffffff815f895d>]  [<ffffffff815f895d>] unix_notinflight+0x8d/0x90
[1851513.249846] RSP: 0018:ffff880189cf3cf8  EFLAGS: 00010246
[1851513.250082] RAX: ffff883b05491968 RBX: ffff883b05491680 RCX: ffff8807f9967330
[1851513.250476] RDX: 0000000000000001 RSI: ffff882e6d8bae00 RDI: ffffffff82073f10
[1851513.250886] RBP: ffff880189cf3d08 R08: ffff880cbc70e200 R09: 0000000180200001
[1851513.251280] R10: ffff883fff3b9dc0 R11: ffffea0032f1c380 R12: ffff883fbaf50000
[1851513.251674] R13: ffffffff815f6354 R14: ffff881a7c77b140 R15: ffff881a7c7792c0
[1851513.252083] FS:  00007f4f19573720(0000) GS:ffff883fff3a0000(0000) knlGS:0000000000000000
[1851513.252481] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1851513.252717] CR2: 00000000013062d8 CR3: 0000001712f32000 CR4: 00000000001406e0
[1851513.253116] Stack:
[1851513.253345]  00000000ffffffff ffff880189cf3d40 ffff880189cf3d28 ffffffff815f4383
[1851513.254022]  ffff8839ee11a800 ffff8839ee11a800 ffff880189cf3d60 ffffffff815f53b8
[1851513.254685]  0000000000000000 ffff883406788de0 0000000000000000 0000000000000000
[1851513.255360] Call Trace:
[1851513.255594]  [<ffffffff815f4383>] unix_detach_fds.isra.19+0x43/0x50
[1851513.255851]  [<ffffffff815f53b8>] unix_destruct_scm+0x48/0x80
[1851513.256090]  [<ffffffff815384af>] skb_release_head_state+0x4f/0xb0
[1851513.256328]  [<ffffffff81538522>] skb_release_all+0x12/0x30
[1851513.256564]  [<ffffffff81538592>] kfree_skb+0x32/0xa0
[1851513.256810]  [<ffffffff815f6354>] unix_release_sock+0x1e4/0x2c0
[1851513.257046]  [<ffffffff815f6450>] unix_release+0x20/0x30
[1851513.257284]  [<ffffffff8152fbcf>] sock_release+0x1f/0x80
[1851513.257521]  [<ffffffff8152fc42>] sock_close+0x12/0x20
[1851513.257769]  [<ffffffff8119a8aa>] __fput+0xea/0x1f0
[1851513.258005]  [<ffffffff8119a9ee>] ____fput+0xe/0x10
[1851513.258244]  [<ffffffff8106fccf>] task_work_run+0x7f/0xb0
[1851513.258488]  [<ffffffff81002210>] exit_to_usermode_loop+0xc0/0xd0
[1851513.258728]  [<ffffffff81002a90>] syscall_return_slowpath+0x80/0xf0
[1851513.258983]  [<ffffffff816147b4>] int_ret_from_sys_call+0x25/0x9f
[1851513.259222] Code: 7e 5b 41 5c 5d c3 48 8b 8b e8 02 00 00 48 8b 93 f0 02 00 00 48 89 51 08 48 89 0a 48 89 83 e8 02 00 00 48 89 83 f0 02 00 00 eb b8 <0f> 0b 90 0f 1f 44 00 00 55 48 c7 c7 10 3f 07 82 48 89 e5 41 54 
[1851513.268473] RIP  [<ffffffff815f895d>] unix_notinflight+0x8d/0x90
[1851513.268793]  RSP <ffff880189cf3cf8>

That's essentially BUG_ON(list_empty(&u->link));

I see that all the code involving the ->link member hasn't really been 
touched since it was introduced in 2007. So this must be a latent bug. 
This is the first time I've observed it. The state 
of the struct unix_sock can be found here http://sprunge.us/WCMW . Evidently, 
there are no inflight sockets. 

Regards, 
Nikolay 

 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel BUG at net/unix/garbage.c:149!"
  2016-08-24 14:24 kernel BUG at net/unix/garbage.c:149!" Nikolay Borisov
@ 2016-08-24 21:40 ` Hannes Frederic Sowa
  2016-08-24 23:30   ` Nikolay Borisov
  2016-08-27  9:55   ` Miklos Szeredi
  0 siblings, 2 replies; 13+ messages in thread
From: Hannes Frederic Sowa @ 2016-08-24 21:40 UTC (permalink / raw)
  To: Nikolay Borisov, mszeredi; +Cc: Linux-Kernel@Vger. Kernel. Org, netdev

On 24.08.2016 16:24, Nikolay Borisov wrote:
> Hello, 
> 
> I hit the following BUG: 
> 
> [1851513.239831] ------------[ cut here ]------------
> [1851513.240079] kernel BUG at net/unix/garbage.c:149!
> [1851513.240313] invalid opcode: 0000 [#1] SMP 
> [1851513.248320] CPU: 37 PID: 11683 Comm: nginx Tainted: G           O    4.4.14-clouder3 #26
> [1851513.248719] Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
> [1851513.248966] task: ffff883b0f6f0000 ti: ffff880189cf0000 task.ti: ffff880189cf0000
> [1851513.249361] RIP: 0010:[<ffffffff815f895d>]  [<ffffffff815f895d>] unix_notinflight+0x8d/0x90
> [1851513.249846] RSP: 0018:ffff880189cf3cf8  EFLAGS: 00010246
> [1851513.250082] RAX: ffff883b05491968 RBX: ffff883b05491680 RCX: ffff8807f9967330
> [1851513.250476] RDX: 0000000000000001 RSI: ffff882e6d8bae00 RDI: ffffffff82073f10
> [1851513.250886] RBP: ffff880189cf3d08 R08: ffff880cbc70e200 R09: 0000000180200001
> [1851513.251280] R10: ffff883fff3b9dc0 R11: ffffea0032f1c380 R12: ffff883fbaf50000
> [1851513.251674] R13: ffffffff815f6354 R14: ffff881a7c77b140 R15: ffff881a7c7792c0
> [1851513.252083] FS:  00007f4f19573720(0000) GS:ffff883fff3a0000(0000) knlGS:0000000000000000
> [1851513.252481] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [1851513.252717] CR2: 00000000013062d8 CR3: 0000001712f32000 CR4: 00000000001406e0
> [1851513.253116] Stack:
> [1851513.253345]  00000000ffffffff ffff880189cf3d40 ffff880189cf3d28 ffffffff815f4383
> [1851513.254022]  ffff8839ee11a800 ffff8839ee11a800 ffff880189cf3d60 ffffffff815f53b8
> [1851513.254685]  0000000000000000 ffff883406788de0 0000000000000000 0000000000000000
> [1851513.255360] Call Trace:
> [1851513.255594]  [<ffffffff815f4383>] unix_detach_fds.isra.19+0x43/0x50
> [1851513.255851]  [<ffffffff815f53b8>] unix_destruct_scm+0x48/0x80
> [1851513.256090]  [<ffffffff815384af>] skb_release_head_state+0x4f/0xb0
> [1851513.256328]  [<ffffffff81538522>] skb_release_all+0x12/0x30
> [1851513.256564]  [<ffffffff81538592>] kfree_skb+0x32/0xa0
> [1851513.256810]  [<ffffffff815f6354>] unix_release_sock+0x1e4/0x2c0
> [1851513.257046]  [<ffffffff815f6450>] unix_release+0x20/0x30
> [1851513.257284]  [<ffffffff8152fbcf>] sock_release+0x1f/0x80
> [1851513.257521]  [<ffffffff8152fc42>] sock_close+0x12/0x20
> [1851513.257769]  [<ffffffff8119a8aa>] __fput+0xea/0x1f0
> [1851513.258005]  [<ffffffff8119a9ee>] ____fput+0xe/0x10
> [1851513.258244]  [<ffffffff8106fccf>] task_work_run+0x7f/0xb0
> [1851513.258488]  [<ffffffff81002210>] exit_to_usermode_loop+0xc0/0xd0
> [1851513.258728]  [<ffffffff81002a90>] syscall_return_slowpath+0x80/0xf0
> [1851513.258983]  [<ffffffff816147b4>] int_ret_from_sys_call+0x25/0x9f
> [1851513.259222] Code: 7e 5b 41 5c 5d c3 48 8b 8b e8 02 00 00 48 8b 93 f0 02 00 00 48 89 51 08 48 89 0a 48 89 83 e8 02 00 00 48 89 83 f0 02 00 00 eb b8 <0f> 0b 90 0f 1f 44 00 00 55 48 c7 c7 10 3f 07 82 48 89 e5 41 54 
> [1851513.268473] RIP  [<ffffffff815f895d>] unix_notinflight+0x8d/0x90
> [1851513.268793]  RSP <ffff880189cf3cf8>
> 
> That's essentially BUG_ON(list_empty(&u->link));
> 
> I see that all the code involving the ->link member hasn't really been 
> touched since it was introduced in 2007. So this must be a latent bug. 
> This is the first time I've observed it. The state 
> of the struct unix_sock can be found here http://sprunge.us/WCMW . Evidently, 
> there are no inflight sockets. 

One commit which could have to do with that is

commit fc64869c48494a401b1fb627c9ecc4e6c1d74b0d
Author: Andrey Ryabinin <aryabinin@virtuozzo.com>
Date:   Wed May 18 19:19:27 2016 +0300

    net: sock: move ->sk_shutdown out of bitfields.

but that is only a wild guess.

Which unix_sock did you extract specifically in the url you provided? In
unix_notinflight we are specifically checking an unix domain socket that
is itself being transferred over another af_unix domain socket and not
the unix domain socket being released at this point.

Can you reproduce this and maybe also with a newer kernel?

Thanks for the report,
Hannes

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel BUG at net/unix/garbage.c:149!"
  2016-08-24 21:40 ` Hannes Frederic Sowa
@ 2016-08-24 23:30   ` Nikolay Borisov
  2016-08-26 20:24     ` Hannes Frederic Sowa
  2016-08-27  9:55   ` Miklos Szeredi
  1 sibling, 1 reply; 13+ messages in thread
From: Nikolay Borisov @ 2016-08-24 23:30 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Nikolay Borisov, mszeredi, Linux-Kernel@Vger. Kernel. Org, netdev

On Thu, Aug 25, 2016 at 12:40 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On 24.08.2016 16:24, Nikolay Borisov wrote:
[SNIP]
>
> One commit which could have to do with that is
>
> commit fc64869c48494a401b1fb627c9ecc4e6c1d74b0d
> Author: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Date:   Wed May 18 19:19:27 2016 +0300
>
>     net: sock: move ->sk_shutdown out of bitfields.
>
> but that is only a wild guess.
>
> Which unix_sock did you extract specifically in the url you provided? In
> unix_notinflight we are specifically checking an unix domain socket that
> is itself being transferred over another af_unix domain socket and not
> the unix domain socket being released at this point.

So this is the state of the socket that is being passed to
unix_notinflight. I have a complete crashdump so if you need more info
to diagnose it I'm happy to provide it. I'm not too familiar with the
code in question so I will need a bit of time to grasp what actually
is happening.

>
> Can you reproduce this and maybe also with a newer kernel?

Unfortunately I cannot reproduce this since it happened on a
production server nor can I change the kernel. But clearly there is
something wrong, and given that this is a stable kernel and no
relevant changes have gone in latest stable I believe the problem
(albeit hardly reproducible) would still persist.

>
> Thanks for the report,
> Hannes
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel BUG at net/unix/garbage.c:149!"
  2016-08-24 23:30   ` Nikolay Borisov
@ 2016-08-26 20:24     ` Hannes Frederic Sowa
  0 siblings, 0 replies; 13+ messages in thread
From: Hannes Frederic Sowa @ 2016-08-26 20:24 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: mszeredi, Linux-Kernel@Vger. Kernel. Org, netdev

On 25.08.2016 01:30, Nikolay Borisov wrote:
> On Thu, Aug 25, 2016 at 12:40 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> On 24.08.2016 16:24, Nikolay Borisov wrote:
> [SNIP]
>>
>> One commit which could have to do with that is
>>
>> commit fc64869c48494a401b1fb627c9ecc4e6c1d74b0d
>> Author: Andrey Ryabinin <aryabinin@virtuozzo.com>
>> Date:   Wed May 18 19:19:27 2016 +0300
>>
>>     net: sock: move ->sk_shutdown out of bitfields.
>>
>> but that is only a wild guess.
>>
>> Which unix_sock did you extract specifically in the url you provided? In
>> unix_notinflight we are specifically checking an unix domain socket that
>> is itself being transferred over another af_unix domain socket and not
>> the unix domain socket being released at this point.
> 
> So this is the state of the socket that is being passed to
> unix_notinflight. I have a complete crashdump so if you need more info
> to diagnose it I'm happy to provide it. I'm not too familiar with the
> code in question so I will need a bit of time to grasp what actually
> is happening.

Would probably be helpful. I couldn't spot anything so far
unfortunately. Also moving sockets over sockets which get closed
inflight seemed to work fine.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel BUG at net/unix/garbage.c:149!"
  2016-08-24 21:40 ` Hannes Frederic Sowa
  2016-08-24 23:30   ` Nikolay Borisov
@ 2016-08-27  9:55   ` Miklos Szeredi
  2016-08-29 22:37     ` Miklos Szeredi
  1 sibling, 1 reply; 13+ messages in thread
From: Miklos Szeredi @ 2016-08-27  9:55 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Nikolay Borisov, Linux-Kernel@Vger. Kernel. Org, netdev

On Wed, Aug 24, 2016 at 11:40 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On 24.08.2016 16:24, Nikolay Borisov wrote:
>> Hello,
>>
>> I hit the following BUG:
>>
>> [1851513.239831] ------------[ cut here ]------------
>> [1851513.240079] kernel BUG at net/unix/garbage.c:149!
>> [1851513.240313] invalid opcode: 0000 [#1] SMP
>> [1851513.248320] CPU: 37 PID: 11683 Comm: nginx Tainted: G           O    4.4.14-clouder3 #26
>> [1851513.248719] Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
>> [1851513.248966] task: ffff883b0f6f0000 ti: ffff880189cf0000 task.ti: ffff880189cf0000
>> [1851513.249361] RIP: 0010:[<ffffffff815f895d>]  [<ffffffff815f895d>] unix_notinflight+0x8d/0x90
>> [1851513.249846] RSP: 0018:ffff880189cf3cf8  EFLAGS: 00010246
>> [1851513.250082] RAX: ffff883b05491968 RBX: ffff883b05491680 RCX: ffff8807f9967330
>> [1851513.250476] RDX: 0000000000000001 RSI: ffff882e6d8bae00 RDI: ffffffff82073f10
>> [1851513.250886] RBP: ffff880189cf3d08 R08: ffff880cbc70e200 R09: 0000000180200001
>> [1851513.251280] R10: ffff883fff3b9dc0 R11: ffffea0032f1c380 R12: ffff883fbaf50000
>> [1851513.251674] R13: ffffffff815f6354 R14: ffff881a7c77b140 R15: ffff881a7c7792c0
>> [1851513.252083] FS:  00007f4f19573720(0000) GS:ffff883fff3a0000(0000) knlGS:0000000000000000
>> [1851513.252481] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [1851513.252717] CR2: 00000000013062d8 CR3: 0000001712f32000 CR4: 00000000001406e0
>> [1851513.253116] Stack:
>> [1851513.253345]  00000000ffffffff ffff880189cf3d40 ffff880189cf3d28 ffffffff815f4383
>> [1851513.254022]  ffff8839ee11a800 ffff8839ee11a800 ffff880189cf3d60 ffffffff815f53b8
>> [1851513.254685]  0000000000000000 ffff883406788de0 0000000000000000 0000000000000000
>> [1851513.255360] Call Trace:
>> [1851513.255594]  [<ffffffff815f4383>] unix_detach_fds.isra.19+0x43/0x50
>> [1851513.255851]  [<ffffffff815f53b8>] unix_destruct_scm+0x48/0x80
>> [1851513.256090]  [<ffffffff815384af>] skb_release_head_state+0x4f/0xb0
>> [1851513.256328]  [<ffffffff81538522>] skb_release_all+0x12/0x30
>> [1851513.256564]  [<ffffffff81538592>] kfree_skb+0x32/0xa0
>> [1851513.256810]  [<ffffffff815f6354>] unix_release_sock+0x1e4/0x2c0
>> [1851513.257046]  [<ffffffff815f6450>] unix_release+0x20/0x30
>> [1851513.257284]  [<ffffffff8152fbcf>] sock_release+0x1f/0x80
>> [1851513.257521]  [<ffffffff8152fc42>] sock_close+0x12/0x20
>> [1851513.257769]  [<ffffffff8119a8aa>] __fput+0xea/0x1f0
>> [1851513.258005]  [<ffffffff8119a9ee>] ____fput+0xe/0x10
>> [1851513.258244]  [<ffffffff8106fccf>] task_work_run+0x7f/0xb0
>> [1851513.258488]  [<ffffffff81002210>] exit_to_usermode_loop+0xc0/0xd0
>> [1851513.258728]  [<ffffffff81002a90>] syscall_return_slowpath+0x80/0xf0
>> [1851513.258983]  [<ffffffff816147b4>] int_ret_from_sys_call+0x25/0x9f
>> [1851513.259222] Code: 7e 5b 41 5c 5d c3 48 8b 8b e8 02 00 00 48 8b 93 f0 02 00 00 48 89 51 08 48 89 0a 48 89 83 e8 02 00 00 48 89 83 f0 02 00 00 eb b8 <0f> 0b 90 0f 1f 44 00 00 55 48 c7 c7 10 3f 07 82 48 89 e5 41 54
>> [1851513.268473] RIP  [<ffffffff815f895d>] unix_notinflight+0x8d/0x90
>> [1851513.268793]  RSP <ffff880189cf3cf8>
>>
>> That's essentially BUG_ON(list_empty(&u->link));
>>
>> I see that all the code involving the ->link member hasn't really been
>> touched since it was introduced in 2007. So this must be a latent bug.
>> This is the first time I've observed it. The state
>> of the struct unix_sock can be found here http://sprunge.us/WCMW . Evidently,
>> there are no inflight sockets.

Weird.  If it was the BUG_ON(!list_empty(&u->link)) I'd understand,
because the code in scan children looks fishy: what prevents "embryos"
from fledging to full socket status and going in-flight?

But this one, I cannot imagine any scenario.

Can we have access to the crashdump?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel BUG at net/unix/garbage.c:149!"
  2016-08-27  9:55   ` Miklos Szeredi
@ 2016-08-29 22:37     ` Miklos Szeredi
  2016-08-30  9:18       ` Miklos Szeredi
  0 siblings, 1 reply; 13+ messages in thread
From: Miklos Szeredi @ 2016-08-29 22:37 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Nikolay Borisov, Linux-Kernel@Vger. Kernel. Org, netdev

On Sat, Aug 27, 2016 at 11:55 AM, Miklos Szeredi <mszeredi@redhat.com> wrote:
>
> On Wed, Aug 24, 2016 at 11:40 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > On 24.08.2016 16:24, Nikolay Borisov wrote:
> >> Hello,
> >>
> >> I hit the following BUG:
> >>
> >> [1851513.239831] ------------[ cut here ]------------
> >> [1851513.240079] kernel BUG at net/unix/garbage.c:149!
> >> [1851513.240313] invalid opcode: 0000 [#1] SMP
> >> [1851513.248320] CPU: 37 PID: 11683 Comm: nginx Tainted: G           O    4.4.14-clouder3 #26
> >> [1851513.248719] Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
> >> [1851513.248966] task: ffff883b0f6f0000 ti: ffff880189cf0000 task.ti: ffff880189cf0000
> >> [1851513.249361] RIP: 0010:[<ffffffff815f895d>]  [<ffffffff815f895d>] unix_notinflight+0x8d/0x90
> >> [1851513.249846] RSP: 0018:ffff880189cf3cf8  EFLAGS: 00010246
> >> [1851513.250082] RAX: ffff883b05491968 RBX: ffff883b05491680 RCX: ffff8807f9967330
> >> [1851513.250476] RDX: 0000000000000001 RSI: ffff882e6d8bae00 RDI: ffffffff82073f10
> >> [1851513.250886] RBP: ffff880189cf3d08 R08: ffff880cbc70e200 R09: 0000000180200001
> >> [1851513.251280] R10: ffff883fff3b9dc0 R11: ffffea0032f1c380 R12: ffff883fbaf50000
> >> [1851513.251674] R13: ffffffff815f6354 R14: ffff881a7c77b140 R15: ffff881a7c7792c0
> >> [1851513.252083] FS:  00007f4f19573720(0000) GS:ffff883fff3a0000(0000) knlGS:0000000000000000
> >> [1851513.252481] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [1851513.252717] CR2: 00000000013062d8 CR3: 0000001712f32000 CR4: 00000000001406e0
> >> [1851513.253116] Stack:
> >> [1851513.253345]  00000000ffffffff ffff880189cf3d40 ffff880189cf3d28 ffffffff815f4383
> >> [1851513.254022]  ffff8839ee11a800 ffff8839ee11a800 ffff880189cf3d60 ffffffff815f53b8
> >> [1851513.254685]  0000000000000000 ffff883406788de0 0000000000000000 0000000000000000
> >> [1851513.255360] Call Trace:
> >> [1851513.255594]  [<ffffffff815f4383>] unix_detach_fds.isra.19+0x43/0x50
> >> [1851513.255851]  [<ffffffff815f53b8>] unix_destruct_scm+0x48/0x80
> >> [1851513.256090]  [<ffffffff815384af>] skb_release_head_state+0x4f/0xb0
> >> [1851513.256328]  [<ffffffff81538522>] skb_release_all+0x12/0x30
> >> [1851513.256564]  [<ffffffff81538592>] kfree_skb+0x32/0xa0
> >> [1851513.256810]  [<ffffffff815f6354>] unix_release_sock+0x1e4/0x2c0
> >> [1851513.257046]  [<ffffffff815f6450>] unix_release+0x20/0x30
> >> [1851513.257284]  [<ffffffff8152fbcf>] sock_release+0x1f/0x80
> >> [1851513.257521]  [<ffffffff8152fc42>] sock_close+0x12/0x20
> >> [1851513.257769]  [<ffffffff8119a8aa>] __fput+0xea/0x1f0
> >> [1851513.258005]  [<ffffffff8119a9ee>] ____fput+0xe/0x10
> >> [1851513.258244]  [<ffffffff8106fccf>] task_work_run+0x7f/0xb0
> >> [1851513.258488]  [<ffffffff81002210>] exit_to_usermode_loop+0xc0/0xd0
> >> [1851513.258728]  [<ffffffff81002a90>] syscall_return_slowpath+0x80/0xf0
> >> [1851513.258983]  [<ffffffff816147b4>] int_ret_from_sys_call+0x25/0x9f
> >> [1851513.259222] Code: 7e 5b 41 5c 5d c3 48 8b 8b e8 02 00 00 48 8b 93 f0 02 00 00 48 89 51 08 48 89 0a 48 89 83 e8 02 00 00 48 89 83 f0 02 00 00 eb b8 <0f> 0b 90 0f 1f 44 00 00 55 48 c7 c7 10 3f 07 82 48 89 e5 41 54
> >> [1851513.268473] RIP  [<ffffffff815f895d>] unix_notinflight+0x8d/0x90
> >> [1851513.268793]  RSP <ffff880189cf3cf8>
> >>
> >> That's essentially BUG_ON(list_empty(&u->link));
> >>
> >> I see that all the code involving the ->link member hasn't really been
> >> touched since it was introduced in 2007. So this must be a latent bug.
> >> This is the first time I've observed it. The state
> >> of the struct unix_sock can be found here http://sprunge.us/WCMW . Evidently,
> >> there are no inflight sockets.
>
> Weird.  If it was the BUG_ON(!list_empty(&u->link)) I'd understand,
> because the code in scan children looks fishy: what prevents "embryos"
> from fledging to full socket status and going in-flight?
>
> But this one, I cannot imagine any scenario.
>
> Can we have access to the crashdump?

crash> list -H gc_inflight_list unix_sock.link -s unix_sock.inflight |
grep counter | cut -d= -f2 | awk '{s+=$1} END {print s}'
130
crash> p unix_tot_inflight
unix_tot_inflight = $2 = 135

We've lost track of a total of five inflight sockets, so it's not a
one-off thing.  Really weird...  Now off to sleep, maybe I'll dream of
the solution.

Btw. I notice that this is a patched kernel.  Nothing in there that
could be relevant to this bug?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel BUG at net/unix/garbage.c:149!"
  2016-08-29 22:37     ` Miklos Szeredi
@ 2016-08-30  9:18       ` Miklos Szeredi
  2016-08-30  9:31         ` Nikolay Borisov
                           ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Miklos Szeredi @ 2016-08-30  9:18 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Nikolay Borisov, Linux-Kernel@Vger. Kernel. Org, netdev

[-- Attachment #1: Type: text/plain, Size: 1296 bytes --]

On Tue, Aug 30, 2016 at 12:37 AM, Miklos Szeredi <mszeredi@redhat.com> wrote:
> On Sat, Aug 27, 2016 at 11:55 AM, Miklos Szeredi <mszeredi@redhat.com> wrote:

> crash> list -H gc_inflight_list unix_sock.link -s unix_sock.inflight |
> grep counter | cut -d= -f2 | awk '{s+=$1} END {print s}'
> 130
> crash> p unix_tot_inflight
> unix_tot_inflight = $2 = 135
>
> We've lost track of a total of five inflight sockets, so it's not a
> one-off thing.  Really weird...  Now off to sleep, maybe I'll dream of
> the solution.

Okay, found one bug: gc assumes that in-flight sockets that don't have
an external ref can't gain one while unix_gc_lock is held.  That is
true because unix_notinflight() will be called before detaching fds,
which takes unix_gc_lock.  Only MSG_PEEK was somehow overlooked.  That
one also clones the fds, also keeping them in the skb.  But through
MSG_PEEK an external reference can definitely be gained without ever
touching unix_gc_lock.

Not sure whether the reported bug can be explained by this.  Can you
confirm the MSG_PEEK was used in the setup?

Does someone want to write a stress test for SCM_RIGHTS + MSG_PEEK?

Anyway, attaching a fix that works by acquiring unix_gc_lock in case
of MSG_PEEK also.  It is trivially correct, but I haven't tested it.

Thanks,
Miklos

[-- Attachment #2: af_unix-fix-garbage-collect-vs-msg_peek.patch --]
[-- Type: text/x-patch, Size: 2518 bytes --]

From: Miklos Szeredi <mszeredi@redhat.com>
Subject: af_unix: fix garbage collect vs. MSG_PEEK

Gc assumes that in-flight sockets that don't have an external ref can't
gain one while unix_gc_lock is held.  That is true because
unix_notinflight() will be called before detaching fds, which takes
unix_gc_lock.

Only MSG_PEEK was somehow overlooked.  That one also clones the fds, also
keeping them in the skb.  But through MSG_PEEK an external reference can
definitely be gained without ever touching unix_gc_lock.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
---
 include/net/af_unix.h |    1 +
 net/unix/af_unix.c    |   15 +++++++++++++--
 net/unix/garbage.c    |    6 ++++++
 3 files changed, 20 insertions(+), 2 deletions(-)

--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -10,6 +10,7 @@ void unix_inflight(struct user_struct *u
 void unix_notinflight(struct user_struct *user, struct file *fp);
 void unix_gc(void);
 void wait_for_unix_gc(void);
+void unix_gc_barrier(void);
 struct sock *unix_get_socket(struct file *filp);
 struct sock *unix_peer_get(struct sock *);
 
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1563,6 +1563,17 @@ static int unix_attach_fds(struct scm_co
 	return max_level;
 }
 
+static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb)
+{
+	scm->fp = scm_fp_dup(UNIXCB(skb).fp);
+	/*
+	 * During garbage collection it is assumed that in-flight sockets don't
+	 * get a new external reference.  So we need to wait until current run
+	 * finishes.
+	 */
+	unix_gc_barrier();
+}
+
 static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool send_fds)
 {
 	int err = 0;
@@ -2195,7 +2206,7 @@ static int unix_dgram_recvmsg(struct soc
 		sk_peek_offset_fwd(sk, size);
 
 		if (UNIXCB(skb).fp)
-			scm.fp = scm_fp_dup(UNIXCB(skb).fp);
+			unix_peek_fds(&scm, skb);
 	}
 	err = (flags & MSG_TRUNC) ? skb->len - skip : size;
 
@@ -2435,7 +2446,7 @@ static int unix_stream_read_generic(stru
 			/* It is questionable, see note in unix_dgram_recvmsg.
 			 */
 			if (UNIXCB(skb).fp)
-				scm.fp = scm_fp_dup(UNIXCB(skb).fp);
+				unix_peek_fds(&scm, skb);
 
 			sk_peek_offset_fwd(sk, chunk);
 
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -266,6 +266,12 @@ void wait_for_unix_gc(void)
 	wait_event(unix_gc_wait, gc_in_progress == false);
 }
 
+void unix_gc_barrier(void)
+{
+	spin_lock(&unix_gc_lock);
+	spin_unlock(&unix_gc_lock);
+}
+
 /* The external entry point: unix_gc() */
 void unix_gc(void)
 {

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel BUG at net/unix/garbage.c:149!"
  2016-08-30  9:18       ` Miklos Szeredi
@ 2016-08-30  9:31         ` Nikolay Borisov
  2016-08-30  9:39           ` Miklos Szeredi
  2016-09-01  9:13         ` Hannes Frederic Sowa
  2016-09-27 14:16         ` Nikolay Borisov
  2 siblings, 1 reply; 13+ messages in thread
From: Nikolay Borisov @ 2016-08-30  9:31 UTC (permalink / raw)
  To: Miklos Szeredi, Hannes Frederic Sowa
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev



On 08/30/2016 12:18 PM, Miklos Szeredi wrote:
> On Tue, Aug 30, 2016 at 12:37 AM, Miklos Szeredi <mszeredi@redhat.com> wrote:
>> On Sat, Aug 27, 2016 at 11:55 AM, Miklos Szeredi <mszeredi@redhat.com> wrote:
> 
>> crash> list -H gc_inflight_list unix_sock.link -s unix_sock.inflight |
>> grep counter | cut -d= -f2 | awk '{s+=$1} END {print s}'
>> 130
>> crash> p unix_tot_inflight
>> unix_tot_inflight = $2 = 135
>>
>> We've lost track of a total of five inflight sockets, so it's not a
>> one-off thing.  Really weird...  Now off to sleep, maybe I'll dream of
>> the solution.
> 
> Okay, found one bug: gc assumes that in-flight sockets that don't have
> an external ref can't gain one while unix_gc_lock is held.  That is
> true because unix_notinflight() will be called before detaching fds,
> which takes unix_gc_lock.  Only MSG_PEEK was somehow overlooked.  That
> one also clones the fds, also keeping them in the skb.  But through
> MSG_PEEK an external reference can definitely be gained without ever
> touching unix_gc_lock.
> 
> Not sure whether the reported bug can be explained by this.  Can you
> confirm the MSG_PEEK was used in the setup?
> 
> Does someone want to write a stress test for SCM_RIGHTS + MSG_PEEK?
> 
> Anyway, attaching a fix that works by acquiring unix_gc_lock in case
> of MSG_PEEK also.  It is trivially correct, but I haven't tested it.

I have no way of being 100% sure but looking through nginx's source code
it seems they do utilize MSG_PEEK on several occasions. This issue has
been apparently very hard to reproduce since I have 100s of servers
running a lot of  NGINX processes and this has been triggered only once.

On a different note - if I inspect a live node without this patch should
the discrepancy between the gc_inflight_list and the unix_tot_inflight
be present VS with this patch applied?

> 
> Thanks,
> Miklos
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel BUG at net/unix/garbage.c:149!"
  2016-08-30  9:31         ` Nikolay Borisov
@ 2016-08-30  9:39           ` Miklos Szeredi
  0 siblings, 0 replies; 13+ messages in thread
From: Miklos Szeredi @ 2016-08-30  9:39 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Hannes Frederic Sowa, Linux-Kernel@Vger. Kernel. Org, netdev

On Tue, Aug 30, 2016 at 11:31 AM, Nikolay Borisov <kernel@kyup.com> wrote:
>
>
> On 08/30/2016 12:18 PM, Miklos Szeredi wrote:
>> On Tue, Aug 30, 2016 at 12:37 AM, Miklos Szeredi <mszeredi@redhat.com> wrote:
>>> On Sat, Aug 27, 2016 at 11:55 AM, Miklos Szeredi <mszeredi@redhat.com> wrote:
>>
>>> crash> list -H gc_inflight_list unix_sock.link -s unix_sock.inflight |
>>> grep counter | cut -d= -f2 | awk '{s+=$1} END {print s}'
>>> 130
>>> crash> p unix_tot_inflight
>>> unix_tot_inflight = $2 = 135
>>>
>>> We've lost track of a total of five inflight sockets, so it's not a
>>> one-off thing.  Really weird...  Now off to sleep, maybe I'll dream of
>>> the solution.
>>
>> Okay, found one bug: gc assumes that in-flight sockets that don't have
>> an external ref can't gain one while unix_gc_lock is held.  That is
>> true because unix_notinflight() will be called before detaching fds,
>> which takes unix_gc_lock.  Only MSG_PEEK was somehow overlooked.  That
>> one also clones the fds, also keeping them in the skb.  But through
>> MSG_PEEK an external reference can definitely be gained without ever
>> touching unix_gc_lock.
>>
>> Not sure whether the reported bug can be explained by this.  Can you
>> confirm the MSG_PEEK was used in the setup?
>>
>> Does someone want to write a stress test for SCM_RIGHTS + MSG_PEEK?
>>
>> Anyway, attaching a fix that works by acquiring unix_gc_lock in case
>> of MSG_PEEK also.  It is trivially correct, but I haven't tested it.
>
> I have no way of being 100% sure but looking through nginx's source code
> it seems they do utilize MSG_PEEK on several occasions. This issue has
> been apparently very hard to reproduce since I have 100s of servers
> running a lot of  NGINX processes and this has been triggered only once.
>
> On a different note - if I inspect a live node without this patch should
> the discrepancy between the gc_inflight_list and the unix_tot_inflight
> be present VS with this patch applied?

May well be, since in the vmcore 4 in-flight sockets were "lost"
before triggering the bug.  I guess the best way to check is with a
systemtap script that walks the list with the gc lock.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel BUG at net/unix/garbage.c:149!"
  2016-08-30  9:18       ` Miklos Szeredi
  2016-08-30  9:31         ` Nikolay Borisov
@ 2016-09-01  9:13         ` Hannes Frederic Sowa
  2016-09-27 14:16         ` Nikolay Borisov
  2 siblings, 0 replies; 13+ messages in thread
From: Hannes Frederic Sowa @ 2016-09-01  9:13 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Nikolay Borisov, Linux-Kernel@Vger. Kernel. Org, netdev

On 30.08.2016 11:18, Miklos Szeredi wrote:
> On Tue, Aug 30, 2016 at 12:37 AM, Miklos Szeredi <mszeredi@redhat.com> wrote:
>> On Sat, Aug 27, 2016 at 11:55 AM, Miklos Szeredi <mszeredi@redhat.com> wrote:
> 
>> crash> list -H gc_inflight_list unix_sock.link -s unix_sock.inflight |
>> grep counter | cut -d= -f2 | awk '{s+=$1} END {print s}'
>> 130
>> crash> p unix_tot_inflight
>> unix_tot_inflight = $2 = 135
>>
>> We've lost track of a total of five inflight sockets, so it's not a
>> one-off thing.  Really weird...  Now off to sleep, maybe I'll dream of
>> the solution.
> 
> Okay, found one bug: gc assumes that in-flight sockets that don't have
> an external ref can't gain one while unix_gc_lock is held.  That is
> true because unix_notinflight() will be called before detaching fds,
> which takes unix_gc_lock.  Only MSG_PEEK was somehow overlooked.  That
> one also clones the fds, also keeping them in the skb.  But through
> MSG_PEEK an external reference can definitely be gained without ever
> touching unix_gc_lock.
> 
> Not sure whether the reported bug can be explained by this.  Can you
> confirm the MSG_PEEK was used in the setup?
> 
> Does someone want to write a stress test for SCM_RIGHTS + MSG_PEEK?
> 
> Anyway, attaching a fix that works by acquiring unix_gc_lock in case
> of MSG_PEEK also.  It is trivially correct, but I haven't tested it.

You can use spin_unlock_wait in unix_gc_barrier to make it a bit more
lightweight.

Anyway, all of the scans on the socket receive queues are actually
protected by the appropriate locks again, I didn't see a way were we
could result in such a crash because of concurrent modification of the
receive queue. Do you have any hints or looked into this more closely?

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel BUG at net/unix/garbage.c:149!"
  2016-08-30  9:18       ` Miklos Szeredi
  2016-08-30  9:31         ` Nikolay Borisov
  2016-09-01  9:13         ` Hannes Frederic Sowa
@ 2016-09-27 14:16         ` Nikolay Borisov
  2016-09-27 14:43           ` Hannes Frederic Sowa
  2016-09-28  2:05           ` David Miller
  2 siblings, 2 replies; 13+ messages in thread
From: Nikolay Borisov @ 2016-09-27 14:16 UTC (permalink / raw)
  To: David Miller
  Cc: Miklos Szeredi, Hannes Frederic Sowa,
	Linux-Kernel@Vger. Kernel. Org, netdev

[Added Dave Miller to see what's the status of this patch]

On 08/30/2016 12:18 PM, Miklos Szeredi wrote:
> On Tue, Aug 30, 2016 at 12:37 AM, Miklos Szeredi <mszeredi@redhat.com> wrote:
>> On Sat, Aug 27, 2016 at 11:55 AM, Miklos Szeredi <mszeredi@redhat.com> wrote:
> 
>> crash> list -H gc_inflight_list unix_sock.link -s unix_sock.inflight |
>> grep counter | cut -d= -f2 | awk '{s+=$1} END {print s}'
>> 130
>> crash> p unix_tot_inflight
>> unix_tot_inflight = $2 = 135
>>
>> We've lost track of a total of five inflight sockets, so it's not a
>> one-off thing.  Really weird...  Now off to sleep, maybe I'll dream of
>> the solution.
> 
> Okay, found one bug: gc assumes that in-flight sockets that don't have
> an external ref can't gain one while unix_gc_lock is held.  That is
> true because unix_notinflight() will be called before detaching fds,
> which takes unix_gc_lock.  Only MSG_PEEK was somehow overlooked.  That
> one also clones the fds, also keeping them in the skb.  But through
> MSG_PEEK an external reference can definitely be gained without ever
> touching unix_gc_lock.
> 
> Not sure whether the reported bug can be explained by this.  Can you
> confirm the MSG_PEEK was used in the setup?
> 
> Does someone want to write a stress test for SCM_RIGHTS + MSG_PEEK?
> 
> Anyway, attaching a fix that works by acquiring unix_gc_lock in case
> of MSG_PEEK also.  It is trivially correct, but I haven't tested it.
> 
> Thanks,
> Miklos
> 


Dave,

What's the status of https://patchwork.ozlabs.org/patch/664062/ , is
this going to be picked up ?

Regards,
Nikolay

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel BUG at net/unix/garbage.c:149!"
  2016-09-27 14:16         ` Nikolay Borisov
@ 2016-09-27 14:43           ` Hannes Frederic Sowa
  2016-09-28  2:05           ` David Miller
  1 sibling, 0 replies; 13+ messages in thread
From: Hannes Frederic Sowa @ 2016-09-27 14:43 UTC (permalink / raw)
  To: Nikolay Borisov, David Miller
  Cc: Miklos Szeredi, Linux-Kernel@Vger. Kernel. Org, netdev

On Tue, Sep 27, 2016, at 16:16, Nikolay Borisov wrote:
> Dave,
> 
> What's the status of https://patchwork.ozlabs.org/patch/664062/ , is
> this going to be picked up ?

Not sure if we actually fix a bug with this. Miklos could you maybe
enhance the changelog then?

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel BUG at net/unix/garbage.c:149!"
  2016-09-27 14:16         ` Nikolay Borisov
  2016-09-27 14:43           ` Hannes Frederic Sowa
@ 2016-09-28  2:05           ` David Miller
  1 sibling, 0 replies; 13+ messages in thread
From: David Miller @ 2016-09-28  2:05 UTC (permalink / raw)
  To: kernel; +Cc: mszeredi, hannes, linux-kernel, netdev

From: Nikolay Borisov <kernel@kyup.com>
Date: Tue, 27 Sep 2016 17:16:27 +0300

> What's the status of https://patchwork.ozlabs.org/patch/664062/ , is
> this going to be picked up ?

Why would I apply a patch that's an RFC, doesn't have a proper commit
message, lacks a proper signoff, and also lacks ACK's and feedback
from other knowledgable developers?

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-09-28  2:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-24 14:24 kernel BUG at net/unix/garbage.c:149!" Nikolay Borisov
2016-08-24 21:40 ` Hannes Frederic Sowa
2016-08-24 23:30   ` Nikolay Borisov
2016-08-26 20:24     ` Hannes Frederic Sowa
2016-08-27  9:55   ` Miklos Szeredi
2016-08-29 22:37     ` Miklos Szeredi
2016-08-30  9:18       ` Miklos Szeredi
2016-08-30  9:31         ` Nikolay Borisov
2016-08-30  9:39           ` Miklos Szeredi
2016-09-01  9:13         ` Hannes Frederic Sowa
2016-09-27 14:16         ` Nikolay Borisov
2016-09-27 14:43           ` Hannes Frederic Sowa
2016-09-28  2:05           ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).