* kernel BUG at net/unix/garbage.c:149!" @ 2016-08-24 14:24 Nikolay Borisov 2016-08-24 21:40 ` Hannes Frederic Sowa 0 siblings, 1 reply; 13+ messages in thread From: Nikolay Borisov @ 2016-08-24 14:24 UTC (permalink / raw) To: mszeredi; +Cc: Linux-Kernel@Vger. Kernel. Org, netdev Hello, I hit the following BUG: [1851513.239831] ------------[ cut here ]------------ [1851513.240079] kernel BUG at net/unix/garbage.c:149! [1851513.240313] invalid opcode: 0000 [#1] SMP [1851513.248320] CPU: 37 PID: 11683 Comm: nginx Tainted: G O 4.4.14-clouder3 #26 [1851513.248719] Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015 [1851513.248966] task: ffff883b0f6f0000 ti: ffff880189cf0000 task.ti: ffff880189cf0000 [1851513.249361] RIP: 0010:[<ffffffff815f895d>] [<ffffffff815f895d>] unix_notinflight+0x8d/0x90 [1851513.249846] RSP: 0018:ffff880189cf3cf8 EFLAGS: 00010246 [1851513.250082] RAX: ffff883b05491968 RBX: ffff883b05491680 RCX: ffff8807f9967330 [1851513.250476] RDX: 0000000000000001 RSI: ffff882e6d8bae00 RDI: ffffffff82073f10 [1851513.250886] RBP: ffff880189cf3d08 R08: ffff880cbc70e200 R09: 0000000180200001 [1851513.251280] R10: ffff883fff3b9dc0 R11: ffffea0032f1c380 R12: ffff883fbaf50000 [1851513.251674] R13: ffffffff815f6354 R14: ffff881a7c77b140 R15: ffff881a7c7792c0 [1851513.252083] FS: 00007f4f19573720(0000) GS:ffff883fff3a0000(0000) knlGS:0000000000000000 [1851513.252481] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1851513.252717] CR2: 00000000013062d8 CR3: 0000001712f32000 CR4: 00000000001406e0 [1851513.253116] Stack: [1851513.253345] 00000000ffffffff ffff880189cf3d40 ffff880189cf3d28 ffffffff815f4383 [1851513.254022] ffff8839ee11a800 ffff8839ee11a800 ffff880189cf3d60 ffffffff815f53b8 [1851513.254685] 0000000000000000 ffff883406788de0 0000000000000000 0000000000000000 [1851513.255360] Call Trace: [1851513.255594] [<ffffffff815f4383>] unix_detach_fds.isra.19+0x43/0x50 [1851513.255851] [<ffffffff815f53b8>] unix_destruct_scm+0x48/0x80 [1851513.256090] [<ffffffff815384af>] skb_release_head_state+0x4f/0xb0 [1851513.256328] [<ffffffff81538522>] skb_release_all+0x12/0x30 [1851513.256564] [<ffffffff81538592>] kfree_skb+0x32/0xa0 [1851513.256810] [<ffffffff815f6354>] unix_release_sock+0x1e4/0x2c0 [1851513.257046] [<ffffffff815f6450>] unix_release+0x20/0x30 [1851513.257284] [<ffffffff8152fbcf>] sock_release+0x1f/0x80 [1851513.257521] [<ffffffff8152fc42>] sock_close+0x12/0x20 [1851513.257769] [<ffffffff8119a8aa>] __fput+0xea/0x1f0 [1851513.258005] [<ffffffff8119a9ee>] ____fput+0xe/0x10 [1851513.258244] [<ffffffff8106fccf>] task_work_run+0x7f/0xb0 [1851513.258488] [<ffffffff81002210>] exit_to_usermode_loop+0xc0/0xd0 [1851513.258728] [<ffffffff81002a90>] syscall_return_slowpath+0x80/0xf0 [1851513.258983] [<ffffffff816147b4>] int_ret_from_sys_call+0x25/0x9f [1851513.259222] Code: 7e 5b 41 5c 5d c3 48 8b 8b e8 02 00 00 48 8b 93 f0 02 00 00 48 89 51 08 48 89 0a 48 89 83 e8 02 00 00 48 89 83 f0 02 00 00 eb b8 <0f> 0b 90 0f 1f 44 00 00 55 48 c7 c7 10 3f 07 82 48 89 e5 41 54 [1851513.268473] RIP [<ffffffff815f895d>] unix_notinflight+0x8d/0x90 [1851513.268793] RSP <ffff880189cf3cf8> That's essentially BUG_ON(list_empty(&u->link)); I see that all the code involving the ->link member hasn't really been touched since it was introduced in 2007. So this must be a latent bug. This is the first time I've observed it. The state of the struct unix_sock can be found here http://sprunge.us/WCMW . Evidently, there are no inflight sockets. Regards, Nikolay ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kernel BUG at net/unix/garbage.c:149!" 2016-08-24 14:24 kernel BUG at net/unix/garbage.c:149!" Nikolay Borisov @ 2016-08-24 21:40 ` Hannes Frederic Sowa 2016-08-24 23:30 ` Nikolay Borisov 2016-08-27 9:55 ` Miklos Szeredi 0 siblings, 2 replies; 13+ messages in thread From: Hannes Frederic Sowa @ 2016-08-24 21:40 UTC (permalink / raw) To: Nikolay Borisov, mszeredi; +Cc: Linux-Kernel@Vger. Kernel. Org, netdev On 24.08.2016 16:24, Nikolay Borisov wrote: > Hello, > > I hit the following BUG: > > [1851513.239831] ------------[ cut here ]------------ > [1851513.240079] kernel BUG at net/unix/garbage.c:149! > [1851513.240313] invalid opcode: 0000 [#1] SMP > [1851513.248320] CPU: 37 PID: 11683 Comm: nginx Tainted: G O 4.4.14-clouder3 #26 > [1851513.248719] Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015 > [1851513.248966] task: ffff883b0f6f0000 ti: ffff880189cf0000 task.ti: ffff880189cf0000 > [1851513.249361] RIP: 0010:[<ffffffff815f895d>] [<ffffffff815f895d>] unix_notinflight+0x8d/0x90 > [1851513.249846] RSP: 0018:ffff880189cf3cf8 EFLAGS: 00010246 > [1851513.250082] RAX: ffff883b05491968 RBX: ffff883b05491680 RCX: ffff8807f9967330 > [1851513.250476] RDX: 0000000000000001 RSI: ffff882e6d8bae00 RDI: ffffffff82073f10 > [1851513.250886] RBP: ffff880189cf3d08 R08: ffff880cbc70e200 R09: 0000000180200001 > [1851513.251280] R10: ffff883fff3b9dc0 R11: ffffea0032f1c380 R12: ffff883fbaf50000 > [1851513.251674] R13: ffffffff815f6354 R14: ffff881a7c77b140 R15: ffff881a7c7792c0 > [1851513.252083] FS: 00007f4f19573720(0000) GS:ffff883fff3a0000(0000) knlGS:0000000000000000 > [1851513.252481] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [1851513.252717] CR2: 00000000013062d8 CR3: 0000001712f32000 CR4: 00000000001406e0 > [1851513.253116] Stack: > [1851513.253345] 00000000ffffffff ffff880189cf3d40 ffff880189cf3d28 ffffffff815f4383 > [1851513.254022] ffff8839ee11a800 ffff8839ee11a800 ffff880189cf3d60 ffffffff815f53b8 > [1851513.254685] 0000000000000000 ffff883406788de0 0000000000000000 0000000000000000 > [1851513.255360] Call Trace: > [1851513.255594] [<ffffffff815f4383>] unix_detach_fds.isra.19+0x43/0x50 > [1851513.255851] [<ffffffff815f53b8>] unix_destruct_scm+0x48/0x80 > [1851513.256090] [<ffffffff815384af>] skb_release_head_state+0x4f/0xb0 > [1851513.256328] [<ffffffff81538522>] skb_release_all+0x12/0x30 > [1851513.256564] [<ffffffff81538592>] kfree_skb+0x32/0xa0 > [1851513.256810] [<ffffffff815f6354>] unix_release_sock+0x1e4/0x2c0 > [1851513.257046] [<ffffffff815f6450>] unix_release+0x20/0x30 > [1851513.257284] [<ffffffff8152fbcf>] sock_release+0x1f/0x80 > [1851513.257521] [<ffffffff8152fc42>] sock_close+0x12/0x20 > [1851513.257769] [<ffffffff8119a8aa>] __fput+0xea/0x1f0 > [1851513.258005] [<ffffffff8119a9ee>] ____fput+0xe/0x10 > [1851513.258244] [<ffffffff8106fccf>] task_work_run+0x7f/0xb0 > [1851513.258488] [<ffffffff81002210>] exit_to_usermode_loop+0xc0/0xd0 > [1851513.258728] [<ffffffff81002a90>] syscall_return_slowpath+0x80/0xf0 > [1851513.258983] [<ffffffff816147b4>] int_ret_from_sys_call+0x25/0x9f > [1851513.259222] Code: 7e 5b 41 5c 5d c3 48 8b 8b e8 02 00 00 48 8b 93 f0 02 00 00 48 89 51 08 48 89 0a 48 89 83 e8 02 00 00 48 89 83 f0 02 00 00 eb b8 <0f> 0b 90 0f 1f 44 00 00 55 48 c7 c7 10 3f 07 82 48 89 e5 41 54 > [1851513.268473] RIP [<ffffffff815f895d>] unix_notinflight+0x8d/0x90 > [1851513.268793] RSP <ffff880189cf3cf8> > > That's essentially BUG_ON(list_empty(&u->link)); > > I see that all the code involving the ->link member hasn't really been > touched since it was introduced in 2007. So this must be a latent bug. > This is the first time I've observed it. The state > of the struct unix_sock can be found here http://sprunge.us/WCMW . Evidently, > there are no inflight sockets. One commit which could have to do with that is commit fc64869c48494a401b1fb627c9ecc4e6c1d74b0d Author: Andrey Ryabinin <aryabinin@virtuozzo.com> Date: Wed May 18 19:19:27 2016 +0300 net: sock: move ->sk_shutdown out of bitfields. but that is only a wild guess. Which unix_sock did you extract specifically in the url you provided? In unix_notinflight we are specifically checking an unix domain socket that is itself being transferred over another af_unix domain socket and not the unix domain socket being released at this point. Can you reproduce this and maybe also with a newer kernel? Thanks for the report, Hannes ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kernel BUG at net/unix/garbage.c:149!" 2016-08-24 21:40 ` Hannes Frederic Sowa @ 2016-08-24 23:30 ` Nikolay Borisov 2016-08-26 20:24 ` Hannes Frederic Sowa 2016-08-27 9:55 ` Miklos Szeredi 1 sibling, 1 reply; 13+ messages in thread From: Nikolay Borisov @ 2016-08-24 23:30 UTC (permalink / raw) To: Hannes Frederic Sowa Cc: Nikolay Borisov, mszeredi, Linux-Kernel@Vger. Kernel. Org, netdev On Thu, Aug 25, 2016 at 12:40 AM, Hannes Frederic Sowa <hannes@stressinduktion.org> wrote: > On 24.08.2016 16:24, Nikolay Borisov wrote: [SNIP] > > One commit which could have to do with that is > > commit fc64869c48494a401b1fb627c9ecc4e6c1d74b0d > Author: Andrey Ryabinin <aryabinin@virtuozzo.com> > Date: Wed May 18 19:19:27 2016 +0300 > > net: sock: move ->sk_shutdown out of bitfields. > > but that is only a wild guess. > > Which unix_sock did you extract specifically in the url you provided? In > unix_notinflight we are specifically checking an unix domain socket that > is itself being transferred over another af_unix domain socket and not > the unix domain socket being released at this point. So this is the state of the socket that is being passed to unix_notinflight. I have a complete crashdump so if you need more info to diagnose it I'm happy to provide it. I'm not too familiar with the code in question so I will need a bit of time to grasp what actually is happening. > > Can you reproduce this and maybe also with a newer kernel? Unfortunately I cannot reproduce this since it happened on a production server nor can I change the kernel. But clearly there is something wrong, and given that this is a stable kernel and no relevant changes have gone in latest stable I believe the problem (albeit hardly reproducible) would still persist. > > Thanks for the report, > Hannes > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kernel BUG at net/unix/garbage.c:149!" 2016-08-24 23:30 ` Nikolay Borisov @ 2016-08-26 20:24 ` Hannes Frederic Sowa 0 siblings, 0 replies; 13+ messages in thread From: Hannes Frederic Sowa @ 2016-08-26 20:24 UTC (permalink / raw) To: Nikolay Borisov; +Cc: mszeredi, Linux-Kernel@Vger. Kernel. Org, netdev On 25.08.2016 01:30, Nikolay Borisov wrote: > On Thu, Aug 25, 2016 at 12:40 AM, Hannes Frederic Sowa > <hannes@stressinduktion.org> wrote: >> On 24.08.2016 16:24, Nikolay Borisov wrote: > [SNIP] >> >> One commit which could have to do with that is >> >> commit fc64869c48494a401b1fb627c9ecc4e6c1d74b0d >> Author: Andrey Ryabinin <aryabinin@virtuozzo.com> >> Date: Wed May 18 19:19:27 2016 +0300 >> >> net: sock: move ->sk_shutdown out of bitfields. >> >> but that is only a wild guess. >> >> Which unix_sock did you extract specifically in the url you provided? In >> unix_notinflight we are specifically checking an unix domain socket that >> is itself being transferred over another af_unix domain socket and not >> the unix domain socket being released at this point. > > So this is the state of the socket that is being passed to > unix_notinflight. I have a complete crashdump so if you need more info > to diagnose it I'm happy to provide it. I'm not too familiar with the > code in question so I will need a bit of time to grasp what actually > is happening. Would probably be helpful. I couldn't spot anything so far unfortunately. Also moving sockets over sockets which get closed inflight seemed to work fine. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kernel BUG at net/unix/garbage.c:149!" 2016-08-24 21:40 ` Hannes Frederic Sowa 2016-08-24 23:30 ` Nikolay Borisov @ 2016-08-27 9:55 ` Miklos Szeredi 2016-08-29 22:37 ` Miklos Szeredi 1 sibling, 1 reply; 13+ messages in thread From: Miklos Szeredi @ 2016-08-27 9:55 UTC (permalink / raw) To: Hannes Frederic Sowa Cc: Nikolay Borisov, Linux-Kernel@Vger. Kernel. Org, netdev On Wed, Aug 24, 2016 at 11:40 PM, Hannes Frederic Sowa <hannes@stressinduktion.org> wrote: > On 24.08.2016 16:24, Nikolay Borisov wrote: >> Hello, >> >> I hit the following BUG: >> >> [1851513.239831] ------------[ cut here ]------------ >> [1851513.240079] kernel BUG at net/unix/garbage.c:149! >> [1851513.240313] invalid opcode: 0000 [#1] SMP >> [1851513.248320] CPU: 37 PID: 11683 Comm: nginx Tainted: G O 4.4.14-clouder3 #26 >> [1851513.248719] Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015 >> [1851513.248966] task: ffff883b0f6f0000 ti: ffff880189cf0000 task.ti: ffff880189cf0000 >> [1851513.249361] RIP: 0010:[<ffffffff815f895d>] [<ffffffff815f895d>] unix_notinflight+0x8d/0x90 >> [1851513.249846] RSP: 0018:ffff880189cf3cf8 EFLAGS: 00010246 >> [1851513.250082] RAX: ffff883b05491968 RBX: ffff883b05491680 RCX: ffff8807f9967330 >> [1851513.250476] RDX: 0000000000000001 RSI: ffff882e6d8bae00 RDI: ffffffff82073f10 >> [1851513.250886] RBP: ffff880189cf3d08 R08: ffff880cbc70e200 R09: 0000000180200001 >> [1851513.251280] R10: ffff883fff3b9dc0 R11: ffffea0032f1c380 R12: ffff883fbaf50000 >> [1851513.251674] R13: ffffffff815f6354 R14: ffff881a7c77b140 R15: ffff881a7c7792c0 >> [1851513.252083] FS: 00007f4f19573720(0000) GS:ffff883fff3a0000(0000) knlGS:0000000000000000 >> [1851513.252481] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [1851513.252717] CR2: 00000000013062d8 CR3: 0000001712f32000 CR4: 00000000001406e0 >> [1851513.253116] Stack: >> [1851513.253345] 00000000ffffffff ffff880189cf3d40 ffff880189cf3d28 ffffffff815f4383 >> [1851513.254022] ffff8839ee11a800 ffff8839ee11a800 ffff880189cf3d60 ffffffff815f53b8 >> [1851513.254685] 0000000000000000 ffff883406788de0 0000000000000000 0000000000000000 >> [1851513.255360] Call Trace: >> [1851513.255594] [<ffffffff815f4383>] unix_detach_fds.isra.19+0x43/0x50 >> [1851513.255851] [<ffffffff815f53b8>] unix_destruct_scm+0x48/0x80 >> [1851513.256090] [<ffffffff815384af>] skb_release_head_state+0x4f/0xb0 >> [1851513.256328] [<ffffffff81538522>] skb_release_all+0x12/0x30 >> [1851513.256564] [<ffffffff81538592>] kfree_skb+0x32/0xa0 >> [1851513.256810] [<ffffffff815f6354>] unix_release_sock+0x1e4/0x2c0 >> [1851513.257046] [<ffffffff815f6450>] unix_release+0x20/0x30 >> [1851513.257284] [<ffffffff8152fbcf>] sock_release+0x1f/0x80 >> [1851513.257521] [<ffffffff8152fc42>] sock_close+0x12/0x20 >> [1851513.257769] [<ffffffff8119a8aa>] __fput+0xea/0x1f0 >> [1851513.258005] [<ffffffff8119a9ee>] ____fput+0xe/0x10 >> [1851513.258244] [<ffffffff8106fccf>] task_work_run+0x7f/0xb0 >> [1851513.258488] [<ffffffff81002210>] exit_to_usermode_loop+0xc0/0xd0 >> [1851513.258728] [<ffffffff81002a90>] syscall_return_slowpath+0x80/0xf0 >> [1851513.258983] [<ffffffff816147b4>] int_ret_from_sys_call+0x25/0x9f >> [1851513.259222] Code: 7e 5b 41 5c 5d c3 48 8b 8b e8 02 00 00 48 8b 93 f0 02 00 00 48 89 51 08 48 89 0a 48 89 83 e8 02 00 00 48 89 83 f0 02 00 00 eb b8 <0f> 0b 90 0f 1f 44 00 00 55 48 c7 c7 10 3f 07 82 48 89 e5 41 54 >> [1851513.268473] RIP [<ffffffff815f895d>] unix_notinflight+0x8d/0x90 >> [1851513.268793] RSP <ffff880189cf3cf8> >> >> That's essentially BUG_ON(list_empty(&u->link)); >> >> I see that all the code involving the ->link member hasn't really been >> touched since it was introduced in 2007. So this must be a latent bug. >> This is the first time I've observed it. The state >> of the struct unix_sock can be found here http://sprunge.us/WCMW . Evidently, >> there are no inflight sockets. Weird. If it was the BUG_ON(!list_empty(&u->link)) I'd understand, because the code in scan children looks fishy: what prevents "embryos" from fledging to full socket status and going in-flight? But this one, I cannot imagine any scenario. Can we have access to the crashdump? Thanks, Miklos ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kernel BUG at net/unix/garbage.c:149!" 2016-08-27 9:55 ` Miklos Szeredi @ 2016-08-29 22:37 ` Miklos Szeredi 2016-08-30 9:18 ` Miklos Szeredi 0 siblings, 1 reply; 13+ messages in thread From: Miklos Szeredi @ 2016-08-29 22:37 UTC (permalink / raw) To: Hannes Frederic Sowa Cc: Nikolay Borisov, Linux-Kernel@Vger. Kernel. Org, netdev On Sat, Aug 27, 2016 at 11:55 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: > > On Wed, Aug 24, 2016 at 11:40 PM, Hannes Frederic Sowa > <hannes@stressinduktion.org> wrote: > > On 24.08.2016 16:24, Nikolay Borisov wrote: > >> Hello, > >> > >> I hit the following BUG: > >> > >> [1851513.239831] ------------[ cut here ]------------ > >> [1851513.240079] kernel BUG at net/unix/garbage.c:149! > >> [1851513.240313] invalid opcode: 0000 [#1] SMP > >> [1851513.248320] CPU: 37 PID: 11683 Comm: nginx Tainted: G O 4.4.14-clouder3 #26 > >> [1851513.248719] Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015 > >> [1851513.248966] task: ffff883b0f6f0000 ti: ffff880189cf0000 task.ti: ffff880189cf0000 > >> [1851513.249361] RIP: 0010:[<ffffffff815f895d>] [<ffffffff815f895d>] unix_notinflight+0x8d/0x90 > >> [1851513.249846] RSP: 0018:ffff880189cf3cf8 EFLAGS: 00010246 > >> [1851513.250082] RAX: ffff883b05491968 RBX: ffff883b05491680 RCX: ffff8807f9967330 > >> [1851513.250476] RDX: 0000000000000001 RSI: ffff882e6d8bae00 RDI: ffffffff82073f10 > >> [1851513.250886] RBP: ffff880189cf3d08 R08: ffff880cbc70e200 R09: 0000000180200001 > >> [1851513.251280] R10: ffff883fff3b9dc0 R11: ffffea0032f1c380 R12: ffff883fbaf50000 > >> [1851513.251674] R13: ffffffff815f6354 R14: ffff881a7c77b140 R15: ffff881a7c7792c0 > >> [1851513.252083] FS: 00007f4f19573720(0000) GS:ffff883fff3a0000(0000) knlGS:0000000000000000 > >> [1851513.252481] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> [1851513.252717] CR2: 00000000013062d8 CR3: 0000001712f32000 CR4: 00000000001406e0 > >> [1851513.253116] Stack: > >> [1851513.253345] 00000000ffffffff ffff880189cf3d40 ffff880189cf3d28 ffffffff815f4383 > >> [1851513.254022] ffff8839ee11a800 ffff8839ee11a800 ffff880189cf3d60 ffffffff815f53b8 > >> [1851513.254685] 0000000000000000 ffff883406788de0 0000000000000000 0000000000000000 > >> [1851513.255360] Call Trace: > >> [1851513.255594] [<ffffffff815f4383>] unix_detach_fds.isra.19+0x43/0x50 > >> [1851513.255851] [<ffffffff815f53b8>] unix_destruct_scm+0x48/0x80 > >> [1851513.256090] [<ffffffff815384af>] skb_release_head_state+0x4f/0xb0 > >> [1851513.256328] [<ffffffff81538522>] skb_release_all+0x12/0x30 > >> [1851513.256564] [<ffffffff81538592>] kfree_skb+0x32/0xa0 > >> [1851513.256810] [<ffffffff815f6354>] unix_release_sock+0x1e4/0x2c0 > >> [1851513.257046] [<ffffffff815f6450>] unix_release+0x20/0x30 > >> [1851513.257284] [<ffffffff8152fbcf>] sock_release+0x1f/0x80 > >> [1851513.257521] [<ffffffff8152fc42>] sock_close+0x12/0x20 > >> [1851513.257769] [<ffffffff8119a8aa>] __fput+0xea/0x1f0 > >> [1851513.258005] [<ffffffff8119a9ee>] ____fput+0xe/0x10 > >> [1851513.258244] [<ffffffff8106fccf>] task_work_run+0x7f/0xb0 > >> [1851513.258488] [<ffffffff81002210>] exit_to_usermode_loop+0xc0/0xd0 > >> [1851513.258728] [<ffffffff81002a90>] syscall_return_slowpath+0x80/0xf0 > >> [1851513.258983] [<ffffffff816147b4>] int_ret_from_sys_call+0x25/0x9f > >> [1851513.259222] Code: 7e 5b 41 5c 5d c3 48 8b 8b e8 02 00 00 48 8b 93 f0 02 00 00 48 89 51 08 48 89 0a 48 89 83 e8 02 00 00 48 89 83 f0 02 00 00 eb b8 <0f> 0b 90 0f 1f 44 00 00 55 48 c7 c7 10 3f 07 82 48 89 e5 41 54 > >> [1851513.268473] RIP [<ffffffff815f895d>] unix_notinflight+0x8d/0x90 > >> [1851513.268793] RSP <ffff880189cf3cf8> > >> > >> That's essentially BUG_ON(list_empty(&u->link)); > >> > >> I see that all the code involving the ->link member hasn't really been > >> touched since it was introduced in 2007. So this must be a latent bug. > >> This is the first time I've observed it. The state > >> of the struct unix_sock can be found here http://sprunge.us/WCMW . Evidently, > >> there are no inflight sockets. > > Weird. If it was the BUG_ON(!list_empty(&u->link)) I'd understand, > because the code in scan children looks fishy: what prevents "embryos" > from fledging to full socket status and going in-flight? > > But this one, I cannot imagine any scenario. > > Can we have access to the crashdump? crash> list -H gc_inflight_list unix_sock.link -s unix_sock.inflight | grep counter | cut -d= -f2 | awk '{s+=$1} END {print s}' 130 crash> p unix_tot_inflight unix_tot_inflight = $2 = 135 We've lost track of a total of five inflight sockets, so it's not a one-off thing. Really weird... Now off to sleep, maybe I'll dream of the solution. Btw. I notice that this is a patched kernel. Nothing in there that could be relevant to this bug? Thanks, Miklos ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kernel BUG at net/unix/garbage.c:149!" 2016-08-29 22:37 ` Miklos Szeredi @ 2016-08-30 9:18 ` Miklos Szeredi 2016-08-30 9:31 ` Nikolay Borisov ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Miklos Szeredi @ 2016-08-30 9:18 UTC (permalink / raw) To: Hannes Frederic Sowa Cc: Nikolay Borisov, Linux-Kernel@Vger. Kernel. Org, netdev [-- Attachment #1: Type: text/plain, Size: 1296 bytes --] On Tue, Aug 30, 2016 at 12:37 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: > On Sat, Aug 27, 2016 at 11:55 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: > crash> list -H gc_inflight_list unix_sock.link -s unix_sock.inflight | > grep counter | cut -d= -f2 | awk '{s+=$1} END {print s}' > 130 > crash> p unix_tot_inflight > unix_tot_inflight = $2 = 135 > > We've lost track of a total of five inflight sockets, so it's not a > one-off thing. Really weird... Now off to sleep, maybe I'll dream of > the solution. Okay, found one bug: gc assumes that in-flight sockets that don't have an external ref can't gain one while unix_gc_lock is held. That is true because unix_notinflight() will be called before detaching fds, which takes unix_gc_lock. Only MSG_PEEK was somehow overlooked. That one also clones the fds, also keeping them in the skb. But through MSG_PEEK an external reference can definitely be gained without ever touching unix_gc_lock. Not sure whether the reported bug can be explained by this. Can you confirm the MSG_PEEK was used in the setup? Does someone want to write a stress test for SCM_RIGHTS + MSG_PEEK? Anyway, attaching a fix that works by acquiring unix_gc_lock in case of MSG_PEEK also. It is trivially correct, but I haven't tested it. Thanks, Miklos [-- Attachment #2: af_unix-fix-garbage-collect-vs-msg_peek.patch --] [-- Type: text/x-patch, Size: 2518 bytes --] From: Miklos Szeredi <mszeredi@redhat.com> Subject: af_unix: fix garbage collect vs. MSG_PEEK Gc assumes that in-flight sockets that don't have an external ref can't gain one while unix_gc_lock is held. That is true because unix_notinflight() will be called before detaching fds, which takes unix_gc_lock. Only MSG_PEEK was somehow overlooked. That one also clones the fds, also keeping them in the skb. But through MSG_PEEK an external reference can definitely be gained without ever touching unix_gc_lock. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> --- include/net/af_unix.h | 1 + net/unix/af_unix.c | 15 +++++++++++++-- net/unix/garbage.c | 6 ++++++ 3 files changed, 20 insertions(+), 2 deletions(-) --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -10,6 +10,7 @@ void unix_inflight(struct user_struct *u void unix_notinflight(struct user_struct *user, struct file *fp); void unix_gc(void); void wait_for_unix_gc(void); +void unix_gc_barrier(void); struct sock *unix_get_socket(struct file *filp); struct sock *unix_peer_get(struct sock *); --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1563,6 +1563,17 @@ static int unix_attach_fds(struct scm_co return max_level; } +static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) +{ + scm->fp = scm_fp_dup(UNIXCB(skb).fp); + /* + * During garbage collection it is assumed that in-flight sockets don't + * get a new external reference. So we need to wait until current run + * finishes. + */ + unix_gc_barrier(); +} + static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool send_fds) { int err = 0; @@ -2195,7 +2206,7 @@ static int unix_dgram_recvmsg(struct soc sk_peek_offset_fwd(sk, size); if (UNIXCB(skb).fp) - scm.fp = scm_fp_dup(UNIXCB(skb).fp); + unix_peek_fds(&scm, skb); } err = (flags & MSG_TRUNC) ? skb->len - skip : size; @@ -2435,7 +2446,7 @@ static int unix_stream_read_generic(stru /* It is questionable, see note in unix_dgram_recvmsg. */ if (UNIXCB(skb).fp) - scm.fp = scm_fp_dup(UNIXCB(skb).fp); + unix_peek_fds(&scm, skb); sk_peek_offset_fwd(sk, chunk); --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -266,6 +266,12 @@ void wait_for_unix_gc(void) wait_event(unix_gc_wait, gc_in_progress == false); } +void unix_gc_barrier(void) +{ + spin_lock(&unix_gc_lock); + spin_unlock(&unix_gc_lock); +} + /* The external entry point: unix_gc() */ void unix_gc(void) { ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kernel BUG at net/unix/garbage.c:149!" 2016-08-30 9:18 ` Miklos Szeredi @ 2016-08-30 9:31 ` Nikolay Borisov 2016-08-30 9:39 ` Miklos Szeredi 2016-09-01 9:13 ` Hannes Frederic Sowa 2016-09-27 14:16 ` Nikolay Borisov 2 siblings, 1 reply; 13+ messages in thread From: Nikolay Borisov @ 2016-08-30 9:31 UTC (permalink / raw) To: Miklos Szeredi, Hannes Frederic Sowa Cc: Linux-Kernel@Vger. Kernel. Org, netdev On 08/30/2016 12:18 PM, Miklos Szeredi wrote: > On Tue, Aug 30, 2016 at 12:37 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: >> On Sat, Aug 27, 2016 at 11:55 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: > >> crash> list -H gc_inflight_list unix_sock.link -s unix_sock.inflight | >> grep counter | cut -d= -f2 | awk '{s+=$1} END {print s}' >> 130 >> crash> p unix_tot_inflight >> unix_tot_inflight = $2 = 135 >> >> We've lost track of a total of five inflight sockets, so it's not a >> one-off thing. Really weird... Now off to sleep, maybe I'll dream of >> the solution. > > Okay, found one bug: gc assumes that in-flight sockets that don't have > an external ref can't gain one while unix_gc_lock is held. That is > true because unix_notinflight() will be called before detaching fds, > which takes unix_gc_lock. Only MSG_PEEK was somehow overlooked. That > one also clones the fds, also keeping them in the skb. But through > MSG_PEEK an external reference can definitely be gained without ever > touching unix_gc_lock. > > Not sure whether the reported bug can be explained by this. Can you > confirm the MSG_PEEK was used in the setup? > > Does someone want to write a stress test for SCM_RIGHTS + MSG_PEEK? > > Anyway, attaching a fix that works by acquiring unix_gc_lock in case > of MSG_PEEK also. It is trivially correct, but I haven't tested it. I have no way of being 100% sure but looking through nginx's source code it seems they do utilize MSG_PEEK on several occasions. This issue has been apparently very hard to reproduce since I have 100s of servers running a lot of NGINX processes and this has been triggered only once. On a different note - if I inspect a live node without this patch should the discrepancy between the gc_inflight_list and the unix_tot_inflight be present VS with this patch applied? > > Thanks, > Miklos > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kernel BUG at net/unix/garbage.c:149!" 2016-08-30 9:31 ` Nikolay Borisov @ 2016-08-30 9:39 ` Miklos Szeredi 0 siblings, 0 replies; 13+ messages in thread From: Miklos Szeredi @ 2016-08-30 9:39 UTC (permalink / raw) To: Nikolay Borisov Cc: Hannes Frederic Sowa, Linux-Kernel@Vger. Kernel. Org, netdev On Tue, Aug 30, 2016 at 11:31 AM, Nikolay Borisov <kernel@kyup.com> wrote: > > > On 08/30/2016 12:18 PM, Miklos Szeredi wrote: >> On Tue, Aug 30, 2016 at 12:37 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: >>> On Sat, Aug 27, 2016 at 11:55 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: >> >>> crash> list -H gc_inflight_list unix_sock.link -s unix_sock.inflight | >>> grep counter | cut -d= -f2 | awk '{s+=$1} END {print s}' >>> 130 >>> crash> p unix_tot_inflight >>> unix_tot_inflight = $2 = 135 >>> >>> We've lost track of a total of five inflight sockets, so it's not a >>> one-off thing. Really weird... Now off to sleep, maybe I'll dream of >>> the solution. >> >> Okay, found one bug: gc assumes that in-flight sockets that don't have >> an external ref can't gain one while unix_gc_lock is held. That is >> true because unix_notinflight() will be called before detaching fds, >> which takes unix_gc_lock. Only MSG_PEEK was somehow overlooked. That >> one also clones the fds, also keeping them in the skb. But through >> MSG_PEEK an external reference can definitely be gained without ever >> touching unix_gc_lock. >> >> Not sure whether the reported bug can be explained by this. Can you >> confirm the MSG_PEEK was used in the setup? >> >> Does someone want to write a stress test for SCM_RIGHTS + MSG_PEEK? >> >> Anyway, attaching a fix that works by acquiring unix_gc_lock in case >> of MSG_PEEK also. It is trivially correct, but I haven't tested it. > > I have no way of being 100% sure but looking through nginx's source code > it seems they do utilize MSG_PEEK on several occasions. This issue has > been apparently very hard to reproduce since I have 100s of servers > running a lot of NGINX processes and this has been triggered only once. > > On a different note - if I inspect a live node without this patch should > the discrepancy between the gc_inflight_list and the unix_tot_inflight > be present VS with this patch applied? May well be, since in the vmcore 4 in-flight sockets were "lost" before triggering the bug. I guess the best way to check is with a systemtap script that walks the list with the gc lock. Thanks, Miklos ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kernel BUG at net/unix/garbage.c:149!" 2016-08-30 9:18 ` Miklos Szeredi 2016-08-30 9:31 ` Nikolay Borisov @ 2016-09-01 9:13 ` Hannes Frederic Sowa 2016-09-27 14:16 ` Nikolay Borisov 2 siblings, 0 replies; 13+ messages in thread From: Hannes Frederic Sowa @ 2016-09-01 9:13 UTC (permalink / raw) To: Miklos Szeredi; +Cc: Nikolay Borisov, Linux-Kernel@Vger. Kernel. Org, netdev On 30.08.2016 11:18, Miklos Szeredi wrote: > On Tue, Aug 30, 2016 at 12:37 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: >> On Sat, Aug 27, 2016 at 11:55 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: > >> crash> list -H gc_inflight_list unix_sock.link -s unix_sock.inflight | >> grep counter | cut -d= -f2 | awk '{s+=$1} END {print s}' >> 130 >> crash> p unix_tot_inflight >> unix_tot_inflight = $2 = 135 >> >> We've lost track of a total of five inflight sockets, so it's not a >> one-off thing. Really weird... Now off to sleep, maybe I'll dream of >> the solution. > > Okay, found one bug: gc assumes that in-flight sockets that don't have > an external ref can't gain one while unix_gc_lock is held. That is > true because unix_notinflight() will be called before detaching fds, > which takes unix_gc_lock. Only MSG_PEEK was somehow overlooked. That > one also clones the fds, also keeping them in the skb. But through > MSG_PEEK an external reference can definitely be gained without ever > touching unix_gc_lock. > > Not sure whether the reported bug can be explained by this. Can you > confirm the MSG_PEEK was used in the setup? > > Does someone want to write a stress test for SCM_RIGHTS + MSG_PEEK? > > Anyway, attaching a fix that works by acquiring unix_gc_lock in case > of MSG_PEEK also. It is trivially correct, but I haven't tested it. You can use spin_unlock_wait in unix_gc_barrier to make it a bit more lightweight. Anyway, all of the scans on the socket receive queues are actually protected by the appropriate locks again, I didn't see a way were we could result in such a crash because of concurrent modification of the receive queue. Do you have any hints or looked into this more closely? Thanks, Hannes ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kernel BUG at net/unix/garbage.c:149!" 2016-08-30 9:18 ` Miklos Szeredi 2016-08-30 9:31 ` Nikolay Borisov 2016-09-01 9:13 ` Hannes Frederic Sowa @ 2016-09-27 14:16 ` Nikolay Borisov 2016-09-27 14:43 ` Hannes Frederic Sowa 2016-09-28 2:05 ` David Miller 2 siblings, 2 replies; 13+ messages in thread From: Nikolay Borisov @ 2016-09-27 14:16 UTC (permalink / raw) To: David Miller Cc: Miklos Szeredi, Hannes Frederic Sowa, Linux-Kernel@Vger. Kernel. Org, netdev [Added Dave Miller to see what's the status of this patch] On 08/30/2016 12:18 PM, Miklos Szeredi wrote: > On Tue, Aug 30, 2016 at 12:37 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: >> On Sat, Aug 27, 2016 at 11:55 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: > >> crash> list -H gc_inflight_list unix_sock.link -s unix_sock.inflight | >> grep counter | cut -d= -f2 | awk '{s+=$1} END {print s}' >> 130 >> crash> p unix_tot_inflight >> unix_tot_inflight = $2 = 135 >> >> We've lost track of a total of five inflight sockets, so it's not a >> one-off thing. Really weird... Now off to sleep, maybe I'll dream of >> the solution. > > Okay, found one bug: gc assumes that in-flight sockets that don't have > an external ref can't gain one while unix_gc_lock is held. That is > true because unix_notinflight() will be called before detaching fds, > which takes unix_gc_lock. Only MSG_PEEK was somehow overlooked. That > one also clones the fds, also keeping them in the skb. But through > MSG_PEEK an external reference can definitely be gained without ever > touching unix_gc_lock. > > Not sure whether the reported bug can be explained by this. Can you > confirm the MSG_PEEK was used in the setup? > > Does someone want to write a stress test for SCM_RIGHTS + MSG_PEEK? > > Anyway, attaching a fix that works by acquiring unix_gc_lock in case > of MSG_PEEK also. It is trivially correct, but I haven't tested it. > > Thanks, > Miklos > Dave, What's the status of https://patchwork.ozlabs.org/patch/664062/ , is this going to be picked up ? Regards, Nikolay ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kernel BUG at net/unix/garbage.c:149!" 2016-09-27 14:16 ` Nikolay Borisov @ 2016-09-27 14:43 ` Hannes Frederic Sowa 2016-09-28 2:05 ` David Miller 1 sibling, 0 replies; 13+ messages in thread From: Hannes Frederic Sowa @ 2016-09-27 14:43 UTC (permalink / raw) To: Nikolay Borisov, David Miller Cc: Miklos Szeredi, Linux-Kernel@Vger. Kernel. Org, netdev On Tue, Sep 27, 2016, at 16:16, Nikolay Borisov wrote: > Dave, > > What's the status of https://patchwork.ozlabs.org/patch/664062/ , is > this going to be picked up ? Not sure if we actually fix a bug with this. Miklos could you maybe enhance the changelog then? Thanks, Hannes ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kernel BUG at net/unix/garbage.c:149!" 2016-09-27 14:16 ` Nikolay Borisov 2016-09-27 14:43 ` Hannes Frederic Sowa @ 2016-09-28 2:05 ` David Miller 1 sibling, 0 replies; 13+ messages in thread From: David Miller @ 2016-09-28 2:05 UTC (permalink / raw) To: kernel; +Cc: mszeredi, hannes, linux-kernel, netdev From: Nikolay Borisov <kernel@kyup.com> Date: Tue, 27 Sep 2016 17:16:27 +0300 > What's the status of https://patchwork.ozlabs.org/patch/664062/ , is > this going to be picked up ? Why would I apply a patch that's an RFC, doesn't have a proper commit message, lacks a proper signoff, and also lacks ACK's and feedback from other knowledgable developers? ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-09-28 2:05 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-08-24 14:24 kernel BUG at net/unix/garbage.c:149!" Nikolay Borisov 2016-08-24 21:40 ` Hannes Frederic Sowa 2016-08-24 23:30 ` Nikolay Borisov 2016-08-26 20:24 ` Hannes Frederic Sowa 2016-08-27 9:55 ` Miklos Szeredi 2016-08-29 22:37 ` Miklos Szeredi 2016-08-30 9:18 ` Miklos Szeredi 2016-08-30 9:31 ` Nikolay Borisov 2016-08-30 9:39 ` Miklos Szeredi 2016-09-01 9:13 ` Hannes Frederic Sowa 2016-09-27 14:16 ` Nikolay Borisov 2016-09-27 14:43 ` Hannes Frederic Sowa 2016-09-28 2:05 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).