All of lore.kernel.org
 help / color / mirror / Atom feed
* Kernel panic 3.18 - 4.0.1
@ 2015-05-26 21:14 Alexandr Morozov
  2015-05-26 21:30 ` Greg KH
  0 siblings, 1 reply; 4+ messages in thread
From: Alexandr Morozov @ 2015-05-26 21:14 UTC (permalink / raw)
  To: stable

We encountered kernel panic in our tests. We think it is because we
introduced bind-mounted network namespaces, so basically for each
container we doing unshare(NEWNET), bindmount it to path and then
configure it and setns on it. Here is trace which I get on 4.0.1:
May 26 13:37:26 minigrind kernel: BUG: unable to handle kernel NULL
pointer dereference at 0000000000000016
May 26 13:37:26 minigrind kernel: IP: [<ffffffff811d4683>]
__detach_mounts+0x33/0x80
May 26 13:37:26 minigrind kernel: PGD 31aef9067 PUD 2b5ed8067 PMD 0
May 26 13:37:26 minigrind kernel: Oops: 0000 [#1] PREEMPT SMP
May 26 13:37:26 minigrind kernel: Modules linked in: ipt_MASQUERADE
nf_nat_masquerade_ipv4 bridge stp llc overlay ip6t_REJECT
nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ebtable_nat ebtab
May 26 13:37:26 minigrind kernel: CPU: 0 PID: 4078 Comm: docker Not
tainted 4.0.1-gentoo #1
May 26 13:37:26 minigrind kernel: Hardware name: LENOVO
20AQ006HUS/20AQ006HUS, BIOS GJET77WW (2.27 ) 05/20/2014
May 26 13:37:26 minigrind kernel: task: ffff8802b5e39980 ti:
ffff88008bfbc000 task.ti: ffff88008bfbc000
May 26 13:37:26 minigrind kernel: RIP: 0010:[<ffffffff811d4683>]
[<ffffffff811d4683>] __detach_mounts+0x33/0x80
May 26 13:37:26 minigrind kernel: RSP: 0018:ffff88008bfbfe38  EFLAGS: 00010202
May 26 13:37:26 minigrind kernel: RAX: 000000000000b9b9 RBX:
fffffffffffffffe RCX: 00000000000000b9
May 26 13:37:26 minigrind kernel: RDX: ffff8802b5e39980 RSI:
ffffffff819a10cd RDI: 0000000000000000
May 26 13:37:26 minigrind kernel: RBP: ffff880327fbe480 R08:
0000000000000000 R09: 0000000000000000
May 26 13:37:26 minigrind kernel: R10: ffff88033e2197e0 R11:
0000000000000000 R12: ffff88007dde8a78
May 26 13:37:26 minigrind kernel: R13: ffff88007dde8ea8 R14:
ffff88008bfbfea0 R15: ffff88007dde8f40
May 26 13:37:26 minigrind kernel: FS:  00007f7421b0a700(0000)
GS:ffff88033e200000(0000) knlGS:0000000000000000
May 26 13:37:26 minigrind kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
May 26 13:37:26 minigrind kernel: CR2: 0000000000000016 CR3:
000000031702b000 CR4: 00000000001406f0
May 26 13:37:26 minigrind kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
May 26 13:37:26 minigrind kernel: DR3: 0000000000000000 DR6:
00000000fffe0ff0 DR7: 0000000000000400
May 26 13:37:26 minigrind kernel: Stack:
May 26 13:37:26 minigrind kernel:  ffff880327fbe4d8 ffffffff811bfc82
00000000014007f0 00000000fffffffe
May 26 13:37:26 minigrind kernel:  ffff88031724d000 0000000000000000
ffff88008bfbfeb8 ffff88007dde8ea8
May 26 13:37:26 minigrind kernel:  00000000ffffff9c ffffffff811c4ec8
000000c20858d5f0 ffff880327fbe480
May 26 13:37:26 minigrind kernel: Call Trace:
May 26 13:37:26 minigrind kernel:  [<ffffffff811bfc82>] ? vfs_unlink+0x172/0x180
May 26 13:37:26 minigrind kernel:  [<ffffffff811c4ec8>] ?
do_unlinkat+0x268/0x2d0
May 26 13:37:26 minigrind kernel:  [<ffffffff8104bdb5>] ?
syscall_trace_enter_phase1+0x195/0x1a0
May 26 13:37:26 minigrind kernel:  [<ffffffff81746216>] ?
int_check_syscall_exit_work+0x34/0x3d
May 26 13:37:26 minigrind kernel:  [<ffffffff81745ff6>] ?
system_call_fastpath+0x16/0x1b
May 26 13:37:26 minigrind kernel: Code: 62 c3 81 e8 b0 fc 56 00 48 89
df e8 18 da ff ff 48 85 c0 48 89 c3 74 55 48 c7 c7 84 b4 c0 81 e8 a4
0f 57 00 83 05 fd 6d a3 00 01 <48> 8b 53 18 48 85 d2
May 26 13:37:26 minigrind kernel: RIP  [<ffffffff811d4683>]
__detach_mounts+0x33/0x80
May 26 13:37:26 minigrind kernel:  RSP <ffff88008bfbfe38>
May 26 13:37:26 minigrind kernel: CR2: 0000000000000016
May 26 13:37:26 minigrind kernel: ---[ end trace 399f937a2cba4abb ]---

On 4.0.2 all is perfect for me.
My colleagues got different errors, like rcu_stall and just deadlock
when you can't create new namespaces. I think all this errors was
fixed somewhere in 4.0.2, but I'm not sure where exactly.
Test which produces panic(or hang) basically starts 16 containers in
parallel, so it is 16 unshares+bindmount then unmount those
namespaces.
Also, here is info from one of my coworkers about deadlock:

mrjana [10:38 PM]
docker thread:

root@jenkins-prs-7:/proc/8895/task/8931# cat stack
[<ffffffff81466465>] copy_net_ns+0x75/0x150
[<ffffffff8108c3bd>] create_new_namespaces+0xfd/0x1a0
[<ffffffff8108c5ea>] unshare_nsproxy_namespaces+0x5a/0xc0
[<ffffffff8106d1c3>] SyS_unshare+0x183/0x330
[<ffffffff8156df4d>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

mrjana [10:38 PM]
This docker thread is waiting on net_mutex

mrjana [10:38 PM]
which is held by the kworker thread and is not returning:

mrjana [10:39 PM]
here’s the stack trace of kernel thread:

mrjana [10:39 PM]
root@jenkins-prs-7:/proc# cat /proc/6/stack
[<ffffffff810aec15>] mutex_optimistic_spin+0x185/0x1e0
[<ffffffff8147d5c5>] rtnl_lock+0x15/0x20
[<ffffffff8146c7a2>] default_device_exit_batch+0x72/0x160
[<ffffffff81465a83>] ops_exit_list.isra.1+0x53/0x60
[<ffffffff81466320>] cleanup_net+0x100/0x1d0
[<ffffffff81086064>] process_one_work+0x154/0x400
[<ffffffff81086a0b>] worker_thread+0x6b/0x490
[<ffffffff8108b8fb>] kthread+0xdb/0x100
[<ffffffff8156de98>] ret_from_fork+0x58/0x90
[<ffffffffffffffff>] 0xffffffffffffffff

mrjana [10:41 PM]
If you look at 3.18 code this thread acquires net_mutex at cleanup_net

mrjana [10:41 PM]
but this kworker thread has never released the net_mutex

mrjana [10:41 PM]
instead it is spinning on rtnl_lock

We tried on our CI versions 3.18, 3.19 and 4.0.1.

Feel free to ask if you need some additional info or machine where you
can reproduce easily.

Thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Kernel panic 3.18 - 4.0.1
  2015-05-26 21:14 Kernel panic 3.18 - 4.0.1 Alexandr Morozov
@ 2015-05-26 21:30 ` Greg KH
  2015-05-26 21:37   ` Alexandr Morozov
  0 siblings, 1 reply; 4+ messages in thread
From: Greg KH @ 2015-05-26 21:30 UTC (permalink / raw)
  To: Alexandr Morozov; +Cc: stable

On Tue, May 26, 2015 at 02:14:24PM -0700, Alexandr Morozov wrote:
> We encountered kernel panic in our tests. We think it is because we
> introduced bind-mounted network namespaces, so basically for each
> container we doing unshare(NEWNET), bindmount it to path and then
> configure it and setns on it. Here is trace which I get on 4.0.1:
> May 26 13:37:26 minigrind kernel: BUG: unable to handle kernel NULL
> pointer dereference at 0000000000000016
> May 26 13:37:26 minigrind kernel: IP: [<ffffffff811d4683>]
> __detach_mounts+0x33/0x80
> May 26 13:37:26 minigrind kernel: PGD 31aef9067 PUD 2b5ed8067 PMD 0
> May 26 13:37:26 minigrind kernel: Oops: 0000 [#1] PREEMPT SMP
> May 26 13:37:26 minigrind kernel: Modules linked in: ipt_MASQUERADE
> nf_nat_masquerade_ipv4 bridge stp llc overlay ip6t_REJECT
> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ebtable_nat ebtab
> May 26 13:37:26 minigrind kernel: CPU: 0 PID: 4078 Comm: docker Not
> tainted 4.0.1-gentoo #1
> May 26 13:37:26 minigrind kernel: Hardware name: LENOVO
> 20AQ006HUS/20AQ006HUS, BIOS GJET77WW (2.27 ) 05/20/2014
> May 26 13:37:26 minigrind kernel: task: ffff8802b5e39980 ti:
> ffff88008bfbc000 task.ti: ffff88008bfbc000
> May 26 13:37:26 minigrind kernel: RIP: 0010:[<ffffffff811d4683>]
> [<ffffffff811d4683>] __detach_mounts+0x33/0x80
> May 26 13:37:26 minigrind kernel: RSP: 0018:ffff88008bfbfe38  EFLAGS: 00010202
> May 26 13:37:26 minigrind kernel: RAX: 000000000000b9b9 RBX:
> fffffffffffffffe RCX: 00000000000000b9
> May 26 13:37:26 minigrind kernel: RDX: ffff8802b5e39980 RSI:
> ffffffff819a10cd RDI: 0000000000000000
> May 26 13:37:26 minigrind kernel: RBP: ffff880327fbe480 R08:
> 0000000000000000 R09: 0000000000000000
> May 26 13:37:26 minigrind kernel: R10: ffff88033e2197e0 R11:
> 0000000000000000 R12: ffff88007dde8a78
> May 26 13:37:26 minigrind kernel: R13: ffff88007dde8ea8 R14:
> ffff88008bfbfea0 R15: ffff88007dde8f40
> May 26 13:37:26 minigrind kernel: FS:  00007f7421b0a700(0000)
> GS:ffff88033e200000(0000) knlGS:0000000000000000
> May 26 13:37:26 minigrind kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> May 26 13:37:26 minigrind kernel: CR2: 0000000000000016 CR3:
> 000000031702b000 CR4: 00000000001406f0
> May 26 13:37:26 minigrind kernel: DR0: 0000000000000000 DR1:
> 0000000000000000 DR2: 0000000000000000
> May 26 13:37:26 minigrind kernel: DR3: 0000000000000000 DR6:
> 00000000fffe0ff0 DR7: 0000000000000400
> May 26 13:37:26 minigrind kernel: Stack:
> May 26 13:37:26 minigrind kernel:  ffff880327fbe4d8 ffffffff811bfc82
> 00000000014007f0 00000000fffffffe
> May 26 13:37:26 minigrind kernel:  ffff88031724d000 0000000000000000
> ffff88008bfbfeb8 ffff88007dde8ea8
> May 26 13:37:26 minigrind kernel:  00000000ffffff9c ffffffff811c4ec8
> 000000c20858d5f0 ffff880327fbe480
> May 26 13:37:26 minigrind kernel: Call Trace:
> May 26 13:37:26 minigrind kernel:  [<ffffffff811bfc82>] ? vfs_unlink+0x172/0x180
> May 26 13:37:26 minigrind kernel:  [<ffffffff811c4ec8>] ?
> do_unlinkat+0x268/0x2d0
> May 26 13:37:26 minigrind kernel:  [<ffffffff8104bdb5>] ?
> syscall_trace_enter_phase1+0x195/0x1a0
> May 26 13:37:26 minigrind kernel:  [<ffffffff81746216>] ?
> int_check_syscall_exit_work+0x34/0x3d
> May 26 13:37:26 minigrind kernel:  [<ffffffff81745ff6>] ?
> system_call_fastpath+0x16/0x1b
> May 26 13:37:26 minigrind kernel: Code: 62 c3 81 e8 b0 fc 56 00 48 89
> df e8 18 da ff ff 48 85 c0 48 89 c3 74 55 48 c7 c7 84 b4 c0 81 e8 a4
> 0f 57 00 83 05 fd 6d a3 00 01 <48> 8b 53 18 48 85 d2
> May 26 13:37:26 minigrind kernel: RIP  [<ffffffff811d4683>]
> __detach_mounts+0x33/0x80
> May 26 13:37:26 minigrind kernel:  RSP <ffff88008bfbfe38>
> May 26 13:37:26 minigrind kernel: CR2: 0000000000000016
> May 26 13:37:26 minigrind kernel: ---[ end trace 399f937a2cba4abb ]---
> 
> On 4.0.2 all is perfect for me.

Great!  What's the problem then?  :)

> My colleagues got different errors, like rcu_stall and just deadlock
> when you can't create new namespaces. I think all this errors was
> fixed somewhere in 4.0.2, but I'm not sure where exactly.
> Test which produces panic(or hang) basically starts 16 containers in
> parallel, so it is 16 unshares+bindmount then unmount those
> namespaces.
> Also, here is info from one of my coworkers about deadlock:
> 
> mrjana [10:38 PM]
> docker thread:
> 
> root@jenkins-prs-7:/proc/8895/task/8931# cat stack
> [<ffffffff81466465>] copy_net_ns+0x75/0x150
> [<ffffffff8108c3bd>] create_new_namespaces+0xfd/0x1a0
> [<ffffffff8108c5ea>] unshare_nsproxy_namespaces+0x5a/0xc0
> [<ffffffff8106d1c3>] SyS_unshare+0x183/0x330
> [<ffffffff8156df4d>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> mrjana [10:38 PM]
> This docker thread is waiting on net_mutex
> 
> mrjana [10:38 PM]
> which is held by the kworker thread and is not returning:
> 
> mrjana [10:39 PM]
> here’s the stack trace of kernel thread:
> 
> mrjana [10:39 PM]
> root@jenkins-prs-7:/proc# cat /proc/6/stack
> [<ffffffff810aec15>] mutex_optimistic_spin+0x185/0x1e0
> [<ffffffff8147d5c5>] rtnl_lock+0x15/0x20
> [<ffffffff8146c7a2>] default_device_exit_batch+0x72/0x160
> [<ffffffff81465a83>] ops_exit_list.isra.1+0x53/0x60
> [<ffffffff81466320>] cleanup_net+0x100/0x1d0
> [<ffffffff81086064>] process_one_work+0x154/0x400
> [<ffffffff81086a0b>] worker_thread+0x6b/0x490
> [<ffffffff8108b8fb>] kthread+0xdb/0x100
> [<ffffffff8156de98>] ret_from_fork+0x58/0x90
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> mrjana [10:41 PM]
> If you look at 3.18 code this thread acquires net_mutex at cleanup_net
> 
> mrjana [10:41 PM]
> but this kworker thread has never released the net_mutex
> 
> mrjana [10:41 PM]
> instead it is spinning on rtnl_lock
> 
> We tried on our CI versions 3.18, 3.19 and 4.0.1.
> 
> Feel free to ask if you need some additional info or machine where you
> can reproduce easily.

I don't understand, 4.0.2 is working, so what is there left for us to do
here?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Kernel panic 3.18 - 4.0.1
  2015-05-26 21:30 ` Greg KH
@ 2015-05-26 21:37   ` Alexandr Morozov
  2015-05-26 21:42     ` Willy Tarreau
  0 siblings, 1 reply; 4+ messages in thread
From: Alexandr Morozov @ 2015-05-26 21:37 UTC (permalink / raw)
  To: Greg KH; +Cc: stable

Ah, sorry. I though it is possible to backport fix to other stable
branches like 3.18 and 3.19. We use 3.19.6 and 3.18.14, seems like
3.18.14 is last longterm on kernel.org, we can try 3.19.8 too.
Sorry, I might hit wrong maillist for such problem, let me know. We'll
try 3.19.8 in meantime.

On Tue, May 26, 2015 at 2:30 PM, Greg KH <greg@kroah.com> wrote:
> On Tue, May 26, 2015 at 02:14:24PM -0700, Alexandr Morozov wrote:
>> We encountered kernel panic in our tests. We think it is because we
>> introduced bind-mounted network namespaces, so basically for each
>> container we doing unshare(NEWNET), bindmount it to path and then
>> configure it and setns on it. Here is trace which I get on 4.0.1:
>> May 26 13:37:26 minigrind kernel: BUG: unable to handle kernel NULL
>> pointer dereference at 0000000000000016
>> May 26 13:37:26 minigrind kernel: IP: [<ffffffff811d4683>]
>> __detach_mounts+0x33/0x80
>> May 26 13:37:26 minigrind kernel: PGD 31aef9067 PUD 2b5ed8067 PMD 0
>> May 26 13:37:26 minigrind kernel: Oops: 0000 [#1] PREEMPT SMP
>> May 26 13:37:26 minigrind kernel: Modules linked in: ipt_MASQUERADE
>> nf_nat_masquerade_ipv4 bridge stp llc overlay ip6t_REJECT
>> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ebtable_nat ebtab
>> May 26 13:37:26 minigrind kernel: CPU: 0 PID: 4078 Comm: docker Not
>> tainted 4.0.1-gentoo #1
>> May 26 13:37:26 minigrind kernel: Hardware name: LENOVO
>> 20AQ006HUS/20AQ006HUS, BIOS GJET77WW (2.27 ) 05/20/2014
>> May 26 13:37:26 minigrind kernel: task: ffff8802b5e39980 ti:
>> ffff88008bfbc000 task.ti: ffff88008bfbc000
>> May 26 13:37:26 minigrind kernel: RIP: 0010:[<ffffffff811d4683>]
>> [<ffffffff811d4683>] __detach_mounts+0x33/0x80
>> May 26 13:37:26 minigrind kernel: RSP: 0018:ffff88008bfbfe38  EFLAGS: 00010202
>> May 26 13:37:26 minigrind kernel: RAX: 000000000000b9b9 RBX:
>> fffffffffffffffe RCX: 00000000000000b9
>> May 26 13:37:26 minigrind kernel: RDX: ffff8802b5e39980 RSI:
>> ffffffff819a10cd RDI: 0000000000000000
>> May 26 13:37:26 minigrind kernel: RBP: ffff880327fbe480 R08:
>> 0000000000000000 R09: 0000000000000000
>> May 26 13:37:26 minigrind kernel: R10: ffff88033e2197e0 R11:
>> 0000000000000000 R12: ffff88007dde8a78
>> May 26 13:37:26 minigrind kernel: R13: ffff88007dde8ea8 R14:
>> ffff88008bfbfea0 R15: ffff88007dde8f40
>> May 26 13:37:26 minigrind kernel: FS:  00007f7421b0a700(0000)
>> GS:ffff88033e200000(0000) knlGS:0000000000000000
>> May 26 13:37:26 minigrind kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
>> 0000000080050033
>> May 26 13:37:26 minigrind kernel: CR2: 0000000000000016 CR3:
>> 000000031702b000 CR4: 00000000001406f0
>> May 26 13:37:26 minigrind kernel: DR0: 0000000000000000 DR1:
>> 0000000000000000 DR2: 0000000000000000
>> May 26 13:37:26 minigrind kernel: DR3: 0000000000000000 DR6:
>> 00000000fffe0ff0 DR7: 0000000000000400
>> May 26 13:37:26 minigrind kernel: Stack:
>> May 26 13:37:26 minigrind kernel:  ffff880327fbe4d8 ffffffff811bfc82
>> 00000000014007f0 00000000fffffffe
>> May 26 13:37:26 minigrind kernel:  ffff88031724d000 0000000000000000
>> ffff88008bfbfeb8 ffff88007dde8ea8
>> May 26 13:37:26 minigrind kernel:  00000000ffffff9c ffffffff811c4ec8
>> 000000c20858d5f0 ffff880327fbe480
>> May 26 13:37:26 minigrind kernel: Call Trace:
>> May 26 13:37:26 minigrind kernel:  [<ffffffff811bfc82>] ? vfs_unlink+0x172/0x180
>> May 26 13:37:26 minigrind kernel:  [<ffffffff811c4ec8>] ?
>> do_unlinkat+0x268/0x2d0
>> May 26 13:37:26 minigrind kernel:  [<ffffffff8104bdb5>] ?
>> syscall_trace_enter_phase1+0x195/0x1a0
>> May 26 13:37:26 minigrind kernel:  [<ffffffff81746216>] ?
>> int_check_syscall_exit_work+0x34/0x3d
>> May 26 13:37:26 minigrind kernel:  [<ffffffff81745ff6>] ?
>> system_call_fastpath+0x16/0x1b
>> May 26 13:37:26 minigrind kernel: Code: 62 c3 81 e8 b0 fc 56 00 48 89
>> df e8 18 da ff ff 48 85 c0 48 89 c3 74 55 48 c7 c7 84 b4 c0 81 e8 a4
>> 0f 57 00 83 05 fd 6d a3 00 01 <48> 8b 53 18 48 85 d2
>> May 26 13:37:26 minigrind kernel: RIP  [<ffffffff811d4683>]
>> __detach_mounts+0x33/0x80
>> May 26 13:37:26 minigrind kernel:  RSP <ffff88008bfbfe38>
>> May 26 13:37:26 minigrind kernel: CR2: 0000000000000016
>> May 26 13:37:26 minigrind kernel: ---[ end trace 399f937a2cba4abb ]---
>>
>> On 4.0.2 all is perfect for me.
>
> Great!  What's the problem then?  :)
>
>> My colleagues got different errors, like rcu_stall and just deadlock
>> when you can't create new namespaces. I think all this errors was
>> fixed somewhere in 4.0.2, but I'm not sure where exactly.
>> Test which produces panic(or hang) basically starts 16 containers in
>> parallel, so it is 16 unshares+bindmount then unmount those
>> namespaces.
>> Also, here is info from one of my coworkers about deadlock:
>>
>> mrjana [10:38 PM]
>> docker thread:
>>
>> root@jenkins-prs-7:/proc/8895/task/8931# cat stack
>> [<ffffffff81466465>] copy_net_ns+0x75/0x150
>> [<ffffffff8108c3bd>] create_new_namespaces+0xfd/0x1a0
>> [<ffffffff8108c5ea>] unshare_nsproxy_namespaces+0x5a/0xc0
>> [<ffffffff8106d1c3>] SyS_unshare+0x183/0x330
>> [<ffffffff8156df4d>] system_call_fastpath+0x16/0x1b
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> mrjana [10:38 PM]
>> This docker thread is waiting on net_mutex
>>
>> mrjana [10:38 PM]
>> which is held by the kworker thread and is not returning:
>>
>> mrjana [10:39 PM]
>> here’s the stack trace of kernel thread:
>>
>> mrjana [10:39 PM]
>> root@jenkins-prs-7:/proc# cat /proc/6/stack
>> [<ffffffff810aec15>] mutex_optimistic_spin+0x185/0x1e0
>> [<ffffffff8147d5c5>] rtnl_lock+0x15/0x20
>> [<ffffffff8146c7a2>] default_device_exit_batch+0x72/0x160
>> [<ffffffff81465a83>] ops_exit_list.isra.1+0x53/0x60
>> [<ffffffff81466320>] cleanup_net+0x100/0x1d0
>> [<ffffffff81086064>] process_one_work+0x154/0x400
>> [<ffffffff81086a0b>] worker_thread+0x6b/0x490
>> [<ffffffff8108b8fb>] kthread+0xdb/0x100
>> [<ffffffff8156de98>] ret_from_fork+0x58/0x90
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> mrjana [10:41 PM]
>> If you look at 3.18 code this thread acquires net_mutex at cleanup_net
>>
>> mrjana [10:41 PM]
>> but this kworker thread has never released the net_mutex
>>
>> mrjana [10:41 PM]
>> instead it is spinning on rtnl_lock
>>
>> We tried on our CI versions 3.18, 3.19 and 4.0.1.
>>
>> Feel free to ask if you need some additional info or machine where you
>> can reproduce easily.
>
> I don't understand, 4.0.2 is working, so what is there left for us to do
> here?
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Kernel panic 3.18 - 4.0.1
  2015-05-26 21:37   ` Alexandr Morozov
@ 2015-05-26 21:42     ` Willy Tarreau
  0 siblings, 0 replies; 4+ messages in thread
From: Willy Tarreau @ 2015-05-26 21:42 UTC (permalink / raw)
  To: Alexandr Morozov; +Cc: Greg KH, stable

On Tue, May 26, 2015 at 02:37:02PM -0700, Alexandr Morozov wrote:
> Ah, sorry. I though it is possible to backport fix to other stable
> branches like 3.18 and 3.19. We use 3.19.6 and 3.18.14, seems like
> 3.18.14 is last longterm on kernel.org, we can try 3.19.8 too.
> Sorry, I might hit wrong maillist for such problem, let me know. We'll
> try 3.19.8 in meantime.

No that's the right list if you want to contact stable kernel maintainers
(including Sasha who maintains 3.18). What you may have to do since you
seem to reliably reproduce the issue is to check between 4.0.1 and 4.0.2
which patch fixed the bug, and ensure it gets backported into 3.18. It
is possible that it was not identified as needing to be backported. Maybe
first wait for the next 3.18 in case the fix is already queue and bisect
after that if next 3.18 doesn't fix your issue.

Hoping this helps,
Willy


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-05-26 21:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-26 21:14 Kernel panic 3.18 - 4.0.1 Alexandr Morozov
2015-05-26 21:30 ` Greg KH
2015-05-26 21:37   ` Alexandr Morozov
2015-05-26 21:42     ` Willy Tarreau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.