netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Glibc recvmsg from kernel netlink socket hangs forever
       [not found] ` <20150925043653.GA29111@roeck-us.net>
@ 2015-09-25  4:58   ` Herbert Xu
  2015-09-25  5:34     ` Guenter Roeck
  0 siblings, 1 reply; 10+ messages in thread
From: Herbert Xu @ 2015-09-25  4:58 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Steven Schlansker, linux-kernel, Eric Dumazet, netdev

On Thu, Sep 24, 2015 at 09:36:53PM -0700, Guenter Roeck wrote:
>
> http://comments.gmane.org/gmane.linux.network/363085
> 
> might explain your problem.
> 
> I thought this was resolved in 4.1, but it looks like the problem still persists
> there. At least I have reports from my workplace that 4.1.6 and 4.1.7 are still
> affected. I don't know if there have been any relevant changes in 4.2.
> 
> Copying Herbert and Eric for additional input.

There was a separate bug discovered by Tejun recently.  You need
to apply the patches

https://patchwork.ozlabs.org/patch/519245/
https://patchwork.ozlabs.org/patch/520824/

There is another follow-up but it shouldn't make any difference
in practice.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Glibc recvmsg from kernel netlink socket hangs forever
  2015-09-25  4:58   ` Glibc recvmsg from kernel netlink socket hangs forever Herbert Xu
@ 2015-09-25  5:34     ` Guenter Roeck
  2015-09-25 15:55       ` Herbert Xu
  2015-09-25 21:37       ` Steven Schlansker
  0 siblings, 2 replies; 10+ messages in thread
From: Guenter Roeck @ 2015-09-25  5:34 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Steven Schlansker, linux-kernel, Eric Dumazet, netdev

Herbert,

On 09/24/2015 09:58 PM, Herbert Xu wrote:
> On Thu, Sep 24, 2015 at 09:36:53PM -0700, Guenter Roeck wrote:
>>
>> http://comments.gmane.org/gmane.linux.network/363085
>>
>> might explain your problem.
>>
>> I thought this was resolved in 4.1, but it looks like the problem still persists
>> there. At least I have reports from my workplace that 4.1.6 and 4.1.7 are still
>> affected. I don't know if there have been any relevant changes in 4.2.
>>
>> Copying Herbert and Eric for additional input.
>
> There was a separate bug discovered by Tejun recently.  You need
> to apply the patches
>
> https://patchwork.ozlabs.org/patch/519245/
> https://patchwork.ozlabs.org/patch/520824/
>
I assume this is on top of mainline ?

> There is another follow-up but it shouldn't make any difference
> in practice.
>

Any idea what may be needed for 4.1 ?
I am currently trying https://patchwork.ozlabs.org/patch/473041/,
but I have no idea if that will help with the problem we are seeing there.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Glibc recvmsg from kernel netlink socket hangs forever
  2015-09-25  5:34     ` Guenter Roeck
@ 2015-09-25 15:55       ` Herbert Xu
  2015-09-25 16:14         ` Guenter Roeck
  2015-09-26  3:45         ` Guenter Roeck
  2015-09-25 21:37       ` Steven Schlansker
  1 sibling, 2 replies; 10+ messages in thread
From: Herbert Xu @ 2015-09-25 15:55 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Steven Schlansker, linux-kernel, Eric Dumazet, netdev

On Thu, Sep 24, 2015 at 10:34:10PM -0700, Guenter Roeck wrote:
>
> Any idea what may be needed for 4.1 ?
> I am currently trying https://patchwork.ozlabs.org/patch/473041/,

This patch should not make any difference on 4.1 and later because
4.1 is where I rewrote rhashtable resizing and it should work (or
if it is broken then the latest kernel should be broken too).

> but I have no idea if that will help with the problem we are seeing there.

Having looked at your message agin I don't think the issue I
alluded to is relevant since the symptom there ought to be a
straight kernel lock-up as opposed to just a user-space one because
you will end up with the kernel sending a message to itself.

And the fact that 4.2 works is more indicative as the bug is
present in both 4.1 and 4.2.

I'll try to reproduce this in 4.1 as time permits but no promises.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Glibc recvmsg from kernel netlink socket hangs forever
  2015-09-25 15:55       ` Herbert Xu
@ 2015-09-25 16:14         ` Guenter Roeck
  2015-09-26  3:45         ` Guenter Roeck
  1 sibling, 0 replies; 10+ messages in thread
From: Guenter Roeck @ 2015-09-25 16:14 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Steven Schlansker, linux-kernel, Eric Dumazet, netdev

On 09/25/2015 08:55 AM, Herbert Xu wrote:
> On Thu, Sep 24, 2015 at 10:34:10PM -0700, Guenter Roeck wrote:
>>
>> Any idea what may be needed for 4.1 ?
>> I am currently trying https://patchwork.ozlabs.org/patch/473041/,
>
> This patch should not make any difference on 4.1 and later because
> 4.1 is where I rewrote rhashtable resizing and it should work (or
> if it is broken then the latest kernel should be broken too).
>
Yes, applying (only) the above patch to 4.1 didn't help.

>> but I have no idea if that will help with the problem we are seeing there.
>
> Having looked at your message agin I don't think the issue I
> alluded to is relevant since the symptom there ought to be a
> straight kernel lock-up as opposed to just a user-space one because
> you will end up with the kernel sending a message to itself.
>
> And the fact that 4.2 works is more indicative as the bug is
> present in both 4.1 and 4.2.
>
> I'll try to reproduce this in 4.1 as time permits but no promises.
>

I applied your patches (and a few additional netlink changes from 4.2)
to our 4.1 branch. I'll let you know if it makes a difference for us.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Glibc recvmsg from kernel netlink socket hangs forever
  2015-09-25  5:34     ` Guenter Roeck
  2015-09-25 15:55       ` Herbert Xu
@ 2015-09-25 21:37       ` Steven Schlansker
  2015-09-25 21:54         ` Steven Schlansker
  2015-09-26  2:58         ` Guenter Roeck
  1 sibling, 2 replies; 10+ messages in thread
From: Steven Schlansker @ 2015-09-25 21:37 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Herbert Xu, linux-kernel, Eric Dumazet, netdev


On Sep 24, 2015, at 10:34 PM, Guenter Roeck <linux@roeck-us.net> wrote:

> Herbert,
> 
> On 09/24/2015 09:58 PM, Herbert Xu wrote:
>> On Thu, Sep 24, 2015 at 09:36:53PM -0700, Guenter Roeck wrote:
>>> 
>>> http://comments.gmane.org/gmane.linux.network/363085
>>> 
>>> might explain your problem.
>>> 
>>> I thought this was resolved in 4.1, but it looks like the problem still persists
>>> there. At least I have reports from my workplace that 4.1.6 and 4.1.7 are still
>>> affected. I don't know if there have been any relevant changes in 4.2.
>>> 
>>> Copying Herbert and Eric for additional input.
>> 
>> There was a separate bug discovered by Tejun recently.  You need
>> to apply the patches
>> 
>> https://patchwork.ozlabs.org/patch/519245/
>> https://patchwork.ozlabs.org/patch/520824/
>> 
> I assume this is on top of mainline ?
> 
>> There is another follow-up but it shouldn't make any difference
>> in practice.
>> 
> 
> Any idea what may be needed for 4.1 ?
> I am currently trying https://patchwork.ozlabs.org/patch/473041/,
> but I have no idea if that will help with the problem we are seeing there.

Thank you for the patches to try, I'll build a kernel with them early next week
and report back.  It sounds like it may not match my problem exactly so we'll
see.

In the meantime, I also observed the following oops:

[ 1709.620092] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[ 1709.624058] BUG: unable to handle kernel paging request at ffffea001dbef3c0
[ 1709.624058] IP: [<ffffea001dbef3c0>] 0xffffea001dbef3c0
[ 1709.624058] PGD 78f7dc067 PUD 78f7db067 PMD 800000078ec001e3 
[ 1709.624058] Oops: 0011 [#1] SMP 
[ 1709.624058] Modules linked in: i2c_piix4(E) btrfs(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) floppy(E)
[ 1709.624058] CPU: 4 PID: 19714 Comm: pf_dump Tainted: G            E   4.0.4 #1
[ 1709.624058] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/06/2015
[ 1709.624058] task: ffff880605a18000 ti: ffff8805f9358000 task.ti: ffff8805f9358000
[ 1709.624058] RIP: 0010:[<ffffea001dbef3c0>]  [<ffffea001dbef3c0>] 0xffffea001dbef3c0
[ 1709.624058] RSP: 0018:ffff8805f935bbc0  EFLAGS: 00010246
[ 1709.624058] RAX: ffffea001dbef3c0 RBX: 0000000000000007 RCX: 0000000000000000
[ 1709.624058] RDX: 0000000000002100 RSI: ffff8805f992f308 RDI: ffff8806622f6b00
[ 1709.624058] RBP: ffff8805f935bc08 R08: 0000000000001ec0 R09: 0000000000002100
[ 1709.624058] R10: 0000000000000000 R11: ffff880771003200 R12: ffff8806622f6b00
[ 1709.624058] R13: 0000000000000002 R14: ffffffff8239e238 R15: ffff8805f992f308
[ 1709.624058] FS:  00007f0735f29700(0000) GS:ffff88078fc80000(0000) knlGS:0000000000000000
[ 1709.624058] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1709.624058] CR2: ffffea001dbef3c0 CR3: 00000005f7e88000 CR4: 00000000001407e0
[ 1709.624058] Stack:
[ 1709.624058]  ffffffff81735ca2 0000000000000000 ffff8805f992f348 ffff88076b491400
[ 1709.624058]  ffff8805f992f000 ffff8806622f6b00 0000000000000ec0 ffff8805f992f308
[ 1709.624058]  ffff88065ffb0000 ffff8805f935bc38 ffffffff8176028a ffff8805f992f000
[ 1709.624058] Call Trace:
[ 1709.624058]  [<ffffffff81735ca2>] ? rtnl_dump_all+0x122/0x1a0
[ 1709.624058]  [<ffffffff8176028a>] netlink_dump+0x11a/0x2d0
[ 1709.624058]  [<ffffffff81760625>] netlink_recvmsg+0x1e5/0x360
[ 1709.624058]  [<ffffffff811b97c9>] ? kmem_cache_free+0x1b9/0x1d0
[ 1709.624058]  [<ffffffff8170b33f>] sock_recvmsg+0x6f/0xa0
[ 1709.624058]  [<ffffffff8170c1a4>] ___sys_recvmsg+0xe4/0x200
[ 1709.624058]  [<ffffffff811f5305>] ? __fget_light+0x25/0x70
[ 1709.624058]  [<ffffffff8170cbe2>] __sys_recvmsg+0x42/0x80
[ 1709.624058]  [<ffffffff81961010>] ? int_check_syscall_exit_work+0x34/0x3d
[ 1709.624058]  [<ffffffff8170cc32>] SyS_recvmsg+0x12/0x20
[ 1709.624058]  [<ffffffff81960dcd>] system_call_fastpath+0x16/0x1b
[ 1709.624058] Code: 00 00 00 ff ff ff ff 01 00 00 00 00 01 10 00 00 00 ad de 00 02 20 00 00 00 ad de 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 ff ff 02 00 00 00 00 00 00 00 00 00 00 00 00 00 
[ 1709.798299] RIP  [<ffffea001dbef3c0>] 0xffffea001dbef3c0
[ 1709.798299]  RSP <ffff8805f935bbc0>
[ 1709.798299] CR2: ffffea001dbef3c0
[ 1709.798299] ---[ end trace 2e069ceceed3d61a ]---

It's so far only been noticed once.  I don't know if it is the same issue, it certainly doesn't always happen when this problem occurs,
but it looks curious all the same...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Glibc recvmsg from kernel netlink socket hangs forever
  2015-09-25 21:37       ` Steven Schlansker
@ 2015-09-25 21:54         ` Steven Schlansker
  2015-09-26  2:58         ` Guenter Roeck
  1 sibling, 0 replies; 10+ messages in thread
From: Steven Schlansker @ 2015-09-25 21:54 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Herbert Xu, linux-kernel, Eric Dumazet, netdev


On Sep 25, 2015, at 2:37 PM, Steven Schlansker <stevenschlansker@gmail.com> wrote:

> 
> On Sep 24, 2015, at 10:34 PM, Guenter Roeck <linux@roeck-us.net> wrote:
> 
>> Herbert,
>> 
>> On 09/24/2015 09:58 PM, Herbert Xu wrote:
>>> On Thu, Sep 24, 2015 at 09:36:53PM -0700, Guenter Roeck wrote:
>>>> 
>>>> http://comments.gmane.org/gmane.linux.network/363085
>>>> 
>>>> might explain your problem.
>>>> 
>>>> I thought this was resolved in 4.1, but it looks like the problem still persists
>>>> there. At least I have reports from my workplace that 4.1.6 and 4.1.7 are still
>>>> affected. I don't know if there have been any relevant changes in 4.2.
>>>> 
>>>> Copying Herbert and Eric for additional input.
>>> 
>>> There was a separate bug discovered by Tejun recently.  You need
>>> to apply the patches
>>> 
>>> https://patchwork.ozlabs.org/patch/519245/
>>> https://patchwork.ozlabs.org/patch/520824/
>>> 
>> I assume this is on top of mainline ?
>> 
>>> There is another follow-up but it shouldn't make any difference
>>> in practice.
>>> 
>> 
>> Any idea what may be needed for 4.1 ?
>> I am currently trying https://patchwork.ozlabs.org/patch/473041/,
>> but I have no idea if that will help with the problem we are seeing there.
> 
> Thank you for the patches to try, I'll build a kernel with them early next week
> and report back.  It sounds like it may not match my problem exactly so we'll
> see.
Huh, when it rains, it pours... now I have a legit panic too!

[ 1675.228701] BUG: unable to handle kernel paging request at fffffffffffffe70
[ 1675.232058] IP: [<ffffffff8175dcea>] netlink_compare+0xa/0x30
[ 1675.232058] PGD 2015067 PUD 2017067 PMD 0 
[ 1675.232058] Oops: 0000 [#1] SMP 
[ 1675.232058] Modules linked in: i2c_piix4(E) btrfs(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) floppy(E)
[ 1675.232058] CPU: 2 PID: 11152 Comm: pf_dump Tainted: G            E   4.0.4 #1
[ 1675.232058] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/06/2015
[ 1675.232058] task: ffff880150fa6480 ti: ffff880150fb4000 task.ti: ffff880150fb4000
[ 1675.232058] RIP: 0010:[<ffffffff8175dcea>]  [<ffffffff8175dcea>] netlink_compare+0xa/0x30
[ 1675.232058] RSP: 0018:ffff880150fb7d10  EFLAGS: 00010246
[ 1675.232058] RAX: 0000000000000000 RBX: 00000000023e503b RCX: 000000000561f992
[ 1675.232058] RDX: 00000000fffc27e4 RSI: ffff880150fb7db8 RDI: fffffffffffffbb8
[ 1675.232058] RBP: ffff880150fb7d58 R08: ffff8805a82f5ab8 R09: 000000000000000c
[ 1675.232058] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[ 1675.232058] R13: ffffffff8175dce0 R14: ffff88008b37e800 R15: ffff88076db40000
[ 1675.232058] FS:  00007feec2440700(0000) GS:ffff88078fc40000(0000) knlGS:0000000000000000
[ 1675.232058] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1675.232058] CR2: fffffffffffffe70 CR3: 000000053bd17000 CR4: 00000000001407e0
[ 1675.232058] Stack:
[ 1675.232058]  ffffffff81434dae ffff88076d864400 ffff880150fb7db8 ffff8801559ee8b8
[ 1675.232058]  ffff88076db40000 ffff8805a82f5c48 ffff88008b37e800 ffff88076d864400
[ 1675.232058]  0000000000000000 ffff880150fb7da8 ffffffff81435476 ffff880150fb7db8
[ 1675.232058] Call Trace:
[ 1675.232058]  [<ffffffff81434dae>] ? rhashtable_lookup_compare+0x5e/0xb0
[ 1675.232058]  [<ffffffff81435476>] rhashtable_lookup_compare_insert+0x66/0xc0
[ 1675.232058]  [<ffffffff8175eb63>] netlink_insert+0x83/0xe0
[ 1675.232058]  [<ffffffff8175f11d>] netlink_autobind.isra.34+0xad/0xd0
[ 1675.232058]  [<ffffffff817614b1>] netlink_bind+0x1b1/0x240
[ 1675.232058]  [<ffffffff8170b8b8>] SYSC_bind+0xb8/0xf0
[ 1675.232058]  [<ffffffff81110784>] ? __audit_syscall_entry+0xb4/0x110
[ 1675.232058]  [<ffffffff81022e2c>] ? do_audit_syscall_entry+0x6c/0x70
[ 1675.232058]  [<ffffffff81024553>] ? syscall_trace_enter_phase1+0x123/0x180
[ 1675.232058]  [<ffffffff810248b6>] ? syscall_trace_leave+0xc6/0x120
[ 1675.232058]  [<ffffffff811f5a35>] ? fd_install+0x25/0x30
[ 1675.232058]  [<ffffffff8170c5de>] SyS_bind+0xe/0x10
[ 1675.232058]  [<ffffffff81960dcd>] system_call_fastpath+0x16/0x1b
[ 1675.232058] Code: 00 8b 77 08 39 77 14 8d 4e 01 41 0f 44 c9 41 39 c8 89 4f 08 74 09 48 8b 08 83 3c 11 04 74 e2 5d c3 0f 1f 44 00 00 31 c0 8b 56 08 <39> 97 b8 02 00 00 55 48 89 e5 74 0a 5d c3 0f 1f 84 00 00 00 00 
[ 1675.232058] RIP  [<ffffffff8175dcea>] netlink_compare+0xa/0x30
[ 1675.232058]  RSP <ffff880150fb7d10>
[ 1675.232058] CR2: fffffffffffffe70
[ 1675.232058] ---[ end trace 963ff50a058120d0 ]---
[ 1675.232058] Kernel panic - not syncing: Fatal exception in interrupt
[ 1675.232058] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Glibc recvmsg from kernel netlink socket hangs forever
  2015-09-25 21:37       ` Steven Schlansker
  2015-09-25 21:54         ` Steven Schlansker
@ 2015-09-26  2:58         ` Guenter Roeck
  2015-10-05 23:26           ` Steven Schlansker
  1 sibling, 1 reply; 10+ messages in thread
From: Guenter Roeck @ 2015-09-26  2:58 UTC (permalink / raw)
  To: Steven Schlansker; +Cc: Herbert Xu, linux-kernel, Eric Dumazet, netdev

On 09/25/2015 02:37 PM, Steven Schlansker wrote:
>
> On Sep 24, 2015, at 10:34 PM, Guenter Roeck <linux@roeck-us.net> wrote:
>
>> Herbert,
>>
>> On 09/24/2015 09:58 PM, Herbert Xu wrote:
>>> On Thu, Sep 24, 2015 at 09:36:53PM -0700, Guenter Roeck wrote:
>>>>
>>>> http://comments.gmane.org/gmane.linux.network/363085
>>>>
>>>> might explain your problem.
>>>>
>>>> I thought this was resolved in 4.1, but it looks like the problem still persists
>>>> there. At least I have reports from my workplace that 4.1.6 and 4.1.7 are still
>>>> affected. I don't know if there have been any relevant changes in 4.2.
>>>>
>>>> Copying Herbert and Eric for additional input.
>>>
>>> There was a separate bug discovered by Tejun recently.  You need
>>> to apply the patches
>>>
>>> https://patchwork.ozlabs.org/patch/519245/
>>> https://patchwork.ozlabs.org/patch/520824/
>>>
>> I assume this is on top of mainline ?
>>
>>> There is another follow-up but it shouldn't make any difference
>>> in practice.
>>>
>>
>> Any idea what may be needed for 4.1 ?
>> I am currently trying https://patchwork.ozlabs.org/patch/473041/,
>> but I have no idea if that will help with the problem we are seeing there.
>
> Thank you for the patches to try, I'll build a kernel with them early next week
> and report back.  It sounds like it may not match my problem exactly so we'll
> see.
>
> In the meantime, I also observed the following oops:
>
> [ 1709.620092] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> [ 1709.624058] BUG: unable to handle kernel paging request at ffffea001dbef3c0
> [ 1709.624058] IP: [<ffffea001dbef3c0>] 0xffffea001dbef3c0
> [ 1709.624058] PGD 78f7dc067 PUD 78f7db067 PMD 800000078ec001e3
> [ 1709.624058] Oops: 0011 [#1] SMP
> [ 1709.624058] Modules linked in: i2c_piix4(E) btrfs(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) floppy(E)
> [ 1709.624058] CPU: 4 PID: 19714 Comm: pf_dump Tainted: G            E   4.0.4 #1

For 4.0.x, you _really_ need to update to 4.0.9 to get the following two patches.

cf8befcc1a55 netlink: Disable insertions/removals during rehash
18889a4315a5 netlink: Reset portid after netlink_insert failure

Guenter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Glibc recvmsg from kernel netlink socket hangs forever
  2015-09-25 15:55       ` Herbert Xu
  2015-09-25 16:14         ` Guenter Roeck
@ 2015-09-26  3:45         ` Guenter Roeck
  1 sibling, 0 replies; 10+ messages in thread
From: Guenter Roeck @ 2015-09-26  3:45 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Steven Schlansker, linux-kernel, Eric Dumazet, netdev

Herbert,

On 09/25/2015 08:55 AM, Herbert Xu wrote:
> On Thu, Sep 24, 2015 at 10:34:10PM -0700, Guenter Roeck wrote:
>>
>> Any idea what may be needed for 4.1 ?
>> I am currently trying https://patchwork.ozlabs.org/patch/473041/,
>
> This patch should not make any difference on 4.1 and later because
> 4.1 is where I rewrote rhashtable resizing and it should work (or
> if it is broken then the latest kernel should be broken too).
>
>> but I have no idea if that will help with the problem we are seeing there.
>
> Having looked at your message agin I don't think the issue I
> alluded to is relevant since the symptom there ought to be a
> straight kernel lock-up as opposed to just a user-space one because
> you will end up with the kernel sending a message to itself.
>
> And the fact that 4.2 works is more indicative as the bug is
> present in both 4.1 and 4.2.
>
> I'll try to reproduce this in 4.1 as time permits but no promises.
>

After applying your two patches, I don't see the problem in 4.1 anymore.
We'll run the system through regression; the complete cycle may take
a couple of weeks. I'll let you know if we find any further problems.

If you submit additional patches in that area, it would be great if you
can Cc: me.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Glibc recvmsg from kernel netlink socket hangs forever
  2015-09-26  2:58         ` Guenter Roeck
@ 2015-10-05 23:26           ` Steven Schlansker
  2015-10-05 23:30             ` Guenter Roeck
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Schlansker @ 2015-10-05 23:26 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Herbert Xu, linux-kernel, Eric Dumazet, netdev


On Sep 25, 2015, at 7:58 PM, Guenter Roeck <linux@roeck-us.net> wrote:

> On 09/25/2015 02:37 PM, Steven Schlansker wrote:
>> 
>> 
>> Thank you for the patches to try, I'll build a kernel with them early next week
>> and report back.  It sounds like it may not match my problem exactly so we'll
>> see.
>> 
> 
> For 4.0.x, you _really_ need to update to 4.0.9 to get the following two patches.
> 
> cf8befcc1a55 netlink: Disable insertions/removals during rehash
> 18889a4315a5 netlink: Reset portid after netlink_insert failure

Hi Guenter,

Thank you very much for the information.  We upgraded to 4.0.9 and all indications are that
the issue is gone.  I will follow up if that is not the case.

Thank you everyone for your guidance.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Glibc recvmsg from kernel netlink socket hangs forever
  2015-10-05 23:26           ` Steven Schlansker
@ 2015-10-05 23:30             ` Guenter Roeck
  0 siblings, 0 replies; 10+ messages in thread
From: Guenter Roeck @ 2015-10-05 23:30 UTC (permalink / raw)
  To: Steven Schlansker; +Cc: Herbert Xu, linux-kernel, Eric Dumazet, netdev

On 10/05/2015 04:26 PM, Steven Schlansker wrote:
>
> On Sep 25, 2015, at 7:58 PM, Guenter Roeck <linux@roeck-us.net> wrote:
>
>> On 09/25/2015 02:37 PM, Steven Schlansker wrote:
>>>
>>>
>>> Thank you for the patches to try, I'll build a kernel with them early next week
>>> and report back.  It sounds like it may not match my problem exactly so we'll
>>> see.
>>>
>>
>> For 4.0.x, you _really_ need to update to 4.0.9 to get the following two patches.
>>
>> cf8befcc1a55 netlink: Disable insertions/removals during rehash
>> 18889a4315a5 netlink: Reset portid after netlink_insert failure
>
> Hi Guenter,
>
> Thank you very much for the information.  We upgraded to 4.0.9 and all indications are that
> the issue is gone.  I will follow up if that is not the case.
>
> Thank you everyone for your guidance.
>

My pleasure.

Guenter

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-10-05 23:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <C3C03EFC-53E1-45DE-87AF-59C7367BD21E@gmail.com>
     [not found] ` <20150925043653.GA29111@roeck-us.net>
2015-09-25  4:58   ` Glibc recvmsg from kernel netlink socket hangs forever Herbert Xu
2015-09-25  5:34     ` Guenter Roeck
2015-09-25 15:55       ` Herbert Xu
2015-09-25 16:14         ` Guenter Roeck
2015-09-26  3:45         ` Guenter Roeck
2015-09-25 21:37       ` Steven Schlansker
2015-09-25 21:54         ` Steven Schlansker
2015-09-26  2:58         ` Guenter Roeck
2015-10-05 23:26           ` Steven Schlansker
2015-10-05 23:30             ` Guenter Roeck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).