Re: reproducable panic eviction work queue

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: reproducable panic eviction work queue
       [not found] <F8D94413-90A2-4F80-AAA2-7A6AB57DF314@transip.nl>
@ 2015-07-18  8:56 ` Eric Dumazet
  2015-07-18  9:01   ` Johan Schuijt
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2015-07-18  8:56 UTC (permalink / raw)
  To: Johan Schuijt
  Cc: nikolay, davem, fw, chutzpah, Robin Geuze, Frank Schreuder, netdev

On Fri, 2015-07-17 at 21:18 +0000, Johan Schuijt wrote:
> Hey guys, 
> 
> 
> We’re currently running into a reproducible panic in the eviction work
> queue code when we pin al our eth* IRQ to different CPU cores (in
> order to scale our networking performance for our virtual servers).
> This only occurs in kernels >= 3.17 and is a result of the following
> change:
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24
> 
> 
> The race/panic we see seems to be the same as, or similar to:
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=65ba1f1ec0eff1c25933468e1d238201c0c2cb29
> 
> 
> We can confirm that this is directly exposed by the IRQ pinning since
> disabling this stops us from being able to reproduce this case :)
> 
> 
> How te reproduce: in our test-setup we have 4 machines generating UDP
> packets which are send to the vulnerable host. These all have a MTU of
> 100 (for test purposes) and send UDP packets of a size of 256 bytes.
> Within half an hour you will see the following panic:
> 
> 
> crash> bt
> PID: 56     TASK: ffff885f3d9fc210  CPU: 9   COMMAND: "kworker/9:0"
>  #0 [ffff885f3da03b60] machine_kexec at ffffffff8104a1f7
>  #1 [ffff885f3da03bb0] crash_kexec at ffffffff810db187
>  #2 [ffff885f3da03c80] oops_end at ffffffff81015140
>  #3 [ffff885f3da03ca0] general_protection at ffffffff814f6c88
>     [exception RIP: inet_evict_bucket+281]
>     RIP: ffffffff81480699  RSP: ffff885f3da03d58  RFLAGS: 00010292
>     RAX: ffff885f3da03d08  RBX: dead0000001000a8  RCX:
> ffff885f3da03d08
>     RDX: 0000000000000006  RSI: ffff885f3da03ce8  RDI:
> dead0000001000a8
>     RBP: 0000000000000002   R8: 0000000000000286   R9:
> ffff88302f401640
>     R10: 0000000080000000  R11: ffff88602ec0c138  R12:
> ffffffff81a8d8c0
>     R13: ffff885f3da03d70  R14: 0000000000000000  R15:
> ffff881d6efe1a00
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #4 [ffff885f3da03db0] inet_frag_worker at ffffffff8148075a
>  #5 [ffff885f3da03e10] process_one_work at ffffffff8107be19
>  #6 [ffff885f3da03e60] worker_thread at ffffffff8107c6e3
>  #7 [ffff885f3da03ed0] kthread at ffffffff8108103e
>  #8 [ffff885f3da03f50] ret_from_fork at ffffffff814f4d7c
> 
> 
> We would love to receive your input on this matter.
> 
> 
> Thx in advance,
> 
> 
> - Johan

Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 &
d70127e8a942364de8dd140fe73893efda363293

Also please send your mails in text format, not html, and CC netdev ( I
did here)

> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-18  8:56 ` reproducable panic eviction work queue Eric Dumazet
@ 2015-07-18  9:01   ` Johan Schuijt
  2015-07-18 10:02     ` Nikolay Aleksandrov
  0 siblings, 1 reply; 20+ messages in thread
From: Johan Schuijt @ 2015-07-18  9:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: nikolay, davem, fw, chutzpah, Robin Geuze, Frank Schreuder, netdev

Yes, we already found these and are included in our kernel, but even with these patches we still receive the panic.

- Johan


> On 18 Jul 2015, at 10:56, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> On Fri, 2015-07-17 at 21:18 +0000, Johan Schuijt wrote:
>> Hey guys, 
>> 
>> 
>> We’re currently running into a reproducible panic in the eviction work
>> queue code when we pin al our eth* IRQ to different CPU cores (in
>> order to scale our networking performance for our virtual servers).
>> This only occurs in kernels >= 3.17 and is a result of the following
>> change:
>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24
>> 
>> 
>> The race/panic we see seems to be the same as, or similar to:
>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=65ba1f1ec0eff1c25933468e1d238201c0c2cb29
>> 
>> 
>> We can confirm that this is directly exposed by the IRQ pinning since
>> disabling this stops us from being able to reproduce this case :)
>> 
>> 
>> How te reproduce: in our test-setup we have 4 machines generating UDP
>> packets which are send to the vulnerable host. These all have a MTU of
>> 100 (for test purposes) and send UDP packets of a size of 256 bytes.
>> Within half an hour you will see the following panic:
>> 
>> 
>> crash> bt
>> PID: 56     TASK: ffff885f3d9fc210  CPU: 9   COMMAND: "kworker/9:0"
>> #0 [ffff885f3da03b60] machine_kexec at ffffffff8104a1f7
>> #1 [ffff885f3da03bb0] crash_kexec at ffffffff810db187
>> #2 [ffff885f3da03c80] oops_end at ffffffff81015140
>> #3 [ffff885f3da03ca0] general_protection at ffffffff814f6c88
>>    [exception RIP: inet_evict_bucket+281]
>>    RIP: ffffffff81480699  RSP: ffff885f3da03d58  RFLAGS: 00010292
>>    RAX: ffff885f3da03d08  RBX: dead0000001000a8  RCX:
>> ffff885f3da03d08
>>    RDX: 0000000000000006  RSI: ffff885f3da03ce8  RDI:
>> dead0000001000a8
>>    RBP: 0000000000000002   R8: 0000000000000286   R9:
>> ffff88302f401640
>>    R10: 0000000080000000  R11: ffff88602ec0c138  R12:
>> ffffffff81a8d8c0
>>    R13: ffff885f3da03d70  R14: 0000000000000000  R15:
>> ffff881d6efe1a00
>>    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>> #4 [ffff885f3da03db0] inet_frag_worker at ffffffff8148075a
>> #5 [ffff885f3da03e10] process_one_work at ffffffff8107be19
>> #6 [ffff885f3da03e60] worker_thread at ffffffff8107c6e3
>> #7 [ffff885f3da03ed0] kthread at ffffffff8108103e
>> #8 [ffff885f3da03f50] ret_from_fork at ffffffff814f4d7c
>> 
>> 
>> We would love to receive your input on this matter.
>> 
>> 
>> Thx in advance,
>> 
>> 
>> - Johan
> 
> Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 &
> d70127e8a942364de8dd140fe73893efda363293
> 
> Also please send your mails in text format, not html, and CC netdev ( I
> did here)
> 
>> 
>> 
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-18  9:01   ` Johan Schuijt
@ 2015-07-18 10:02     ` Nikolay Aleksandrov
  2015-07-18 13:31       ` Nikolay Aleksandrov
  2015-07-18 15:28       ` Johan Schuijt
  0 siblings, 2 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2015-07-18 10:02 UTC (permalink / raw)
  To: Johan Schuijt, Eric Dumazet
  Cc: nikolay, davem, fw, chutzpah, Robin Geuze, Frank Schreuder, netdev

On 07/18/2015 11:01 AM, Johan Schuijt wrote:
> Yes, we already found these and are included in our kernel, but even with these patches we still receive the panic.
> 
> - Johan
> 
> 
>> On 18 Jul 2015, at 10:56, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>> On Fri, 2015-07-17 at 21:18 +0000, Johan Schuijt wrote:
>>> Hey guys, 
>>>
>>>
>>> We’re currently running into a reproducible panic in the eviction work
>>> queue code when we pin al our eth* IRQ to different CPU cores (in
>>> order to scale our networking performance for our virtual servers).
>>> This only occurs in kernels >= 3.17 and is a result of the following
>>> change:
>>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24
>>>
>>>
>>> The race/panic we see seems to be the same as, or similar to:
>>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=65ba1f1ec0eff1c25933468e1d238201c0c2cb29
>>>
>>>
>>> We can confirm that this is directly exposed by the IRQ pinning since
>>> disabling this stops us from being able to reproduce this case :)
>>>
>>>
>>> How te reproduce: in our test-setup we have 4 machines generating UDP
>>> packets which are send to the vulnerable host. These all have a MTU of
>>> 100 (for test purposes) and send UDP packets of a size of 256 bytes.
>>> Within half an hour you will see the following panic:
>>>
>>>
>>> crash> bt
>>> PID: 56     TASK: ffff885f3d9fc210  CPU: 9   COMMAND: "kworker/9:0"
>>> #0 [ffff885f3da03b60] machine_kexec at ffffffff8104a1f7
>>> #1 [ffff885f3da03bb0] crash_kexec at ffffffff810db187
>>> #2 [ffff885f3da03c80] oops_end at ffffffff81015140
>>> #3 [ffff885f3da03ca0] general_protection at ffffffff814f6c88
>>>    [exception RIP: inet_evict_bucket+281]
>>>    RIP: ffffffff81480699  RSP: ffff885f3da03d58  RFLAGS: 00010292
>>>    RAX: ffff885f3da03d08  RBX: dead0000001000a8  RCX:
>>> ffff885f3da03d08
>>>    RDX: 0000000000000006  RSI: ffff885f3da03ce8  RDI:
>>> dead0000001000a8
>>>    RBP: 0000000000000002   R8: 0000000000000286   R9:
>>> ffff88302f401640
>>>    R10: 0000000080000000  R11: ffff88602ec0c138  R12:
>>> ffffffff81a8d8c0
>>>    R13: ffff885f3da03d70  R14: 0000000000000000  R15:
>>> ffff881d6efe1a00
>>>    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>> #4 [ffff885f3da03db0] inet_frag_worker at ffffffff8148075a
>>> #5 [ffff885f3da03e10] process_one_work at ffffffff8107be19
>>> #6 [ffff885f3da03e60] worker_thread at ffffffff8107c6e3
>>> #7 [ffff885f3da03ed0] kthread at ffffffff8108103e
>>> #8 [ffff885f3da03f50] ret_from_fork at ffffffff814f4d7c
>>>
>>>
>>> We would love to receive your input on this matter.
>>>
>>>
>>> Thx in advance,
>>>
>>>
>>> - Johan
>>
>> Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 &
>> d70127e8a942364de8dd140fe73893efda363293
>>
>> Also please send your mails in text format, not html, and CC netdev ( I
>> did here)
>>
>>>
>>>
>>
>>
> 
> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�^�)���w*\x1fjg���\x1e�����ݢj/���z�ޖ��2�ޙ���&�)ߡ�a��\x7f��\x1e�G���h�\x0f�j:+v���w�٥
> 

Thank you for the report, I will try to reproduce this locally
Could you please post the full crash log ? Also could you test
with a clean current kernel from Linus' tree or Dave's -net ?
These are available at:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
respectively.

One last question how many IRQs do you pin i.e. how many cores
do you actively use for receive ?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-18 10:02     ` Nikolay Aleksandrov
@ 2015-07-18 13:31       ` Nikolay Aleksandrov
  2015-07-18 15:28       ` Johan Schuijt
  1 sibling, 0 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2015-07-18 13:31 UTC (permalink / raw)
  To: Johan Schuijt, Eric Dumazet
  Cc: nikolay, davem, fw, chutzpah, Robin Geuze, Frank Schreuder, netdev

On 07/18/2015 12:02 PM, Nikolay Aleksandrov wrote:
> On 07/18/2015 11:01 AM, Johan Schuijt wrote:
>> Yes, we already found these and are included in our kernel, but even with these patches we still receive the panic.
>>
>> - Johan
>>
>>
>>> On 18 Jul 2015, at 10:56, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>> On Fri, 2015-07-17 at 21:18 +0000, Johan Schuijt wrote:
>>>> Hey guys, 
>>>>
>>>>
>>>> We’re currently running into a reproducible panic in the eviction work
>>>> queue code when we pin al our eth* IRQ to different CPU cores (in
>>>> order to scale our networking performance for our virtual servers).
>>>> This only occurs in kernels >= 3.17 and is a result of the following
>>>> change:
>>>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24
>>>>
>>>>
>>>> The race/panic we see seems to be the same as, or similar to:
>>>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=65ba1f1ec0eff1c25933468e1d238201c0c2cb29
>>>>
>>>>
>>>> We can confirm that this is directly exposed by the IRQ pinning since
>>>> disabling this stops us from being able to reproduce this case :)
>>>>
>>>>
>>>> How te reproduce: in our test-setup we have 4 machines generating UDP
>>>> packets which are send to the vulnerable host. These all have a MTU of
>>>> 100 (for test purposes) and send UDP packets of a size of 256 bytes.
>>>> Within half an hour you will see the following panic:
>>>>
>>>>
>>>> crash> bt
>>>> PID: 56     TASK: ffff885f3d9fc210  CPU: 9   COMMAND: "kworker/9:0"
>>>> #0 [ffff885f3da03b60] machine_kexec at ffffffff8104a1f7
>>>> #1 [ffff885f3da03bb0] crash_kexec at ffffffff810db187
>>>> #2 [ffff885f3da03c80] oops_end at ffffffff81015140
>>>> #3 [ffff885f3da03ca0] general_protection at ffffffff814f6c88
>>>>    [exception RIP: inet_evict_bucket+281]
>>>>    RIP: ffffffff81480699  RSP: ffff885f3da03d58  RFLAGS: 00010292
>>>>    RAX: ffff885f3da03d08  RBX: dead0000001000a8  RCX:
>>>> ffff885f3da03d08
>>>>    RDX: 0000000000000006  RSI: ffff885f3da03ce8  RDI:
>>>> dead0000001000a8
>>>>    RBP: 0000000000000002   R8: 0000000000000286   R9:
>>>> ffff88302f401640
>>>>    R10: 0000000080000000  R11: ffff88602ec0c138  R12:
>>>> ffffffff81a8d8c0
>>>>    R13: ffff885f3da03d70  R14: 0000000000000000  R15:
>>>> ffff881d6efe1a00
>>>>    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>> #4 [ffff885f3da03db0] inet_frag_worker at ffffffff8148075a
>>>> #5 [ffff885f3da03e10] process_one_work at ffffffff8107be19
>>>> #6 [ffff885f3da03e60] worker_thread at ffffffff8107c6e3
>>>> #7 [ffff885f3da03ed0] kthread at ffffffff8108103e
>>>> #8 [ffff885f3da03f50] ret_from_fork at ffffffff814f4d7c
>>>>
>>>>
>>>> We would love to receive your input on this matter.
>>>>
>>>>
>>>> Thx in advance,
>>>>
>>>>
>>>> - Johan
>>>
>>> Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 &
>>> d70127e8a942364de8dd140fe73893efda363293
>>>
>>> Also please send your mails in text format, not html, and CC netdev ( I
>>> did here)
>>>
>>>>
>>>>
>>>
>>>
>>
>> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�^�)���w*\x1fjg���\x1e�����ݢj/���z�ޖ��2�ޙ���&�)ߡ�a��\x7f��\x1e�G���h�\x0f�j:+v���w�٥
>>
> 
> Thank you for the report, I will try to reproduce this locally
> Could you please post the full crash log ? Also could you test
> with a clean current kernel from Linus' tree or Dave's -net ?
> These are available at:
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
> respectively.
> 
> One last question how many IRQs do you pin i.e. how many cores
> do you actively use for receive ?
> 

Flags seems to be modified while still linked and we may get the
following (theoretical) situation:
CPU 1						CPU 2
inet_frag_evictor (wait for chainlock)		spin_lock(chainlock)
						unlock(chainlock)
get lock, set EVICT flag, hlist_del etc.
						change flags again while
						qp is in the evict list

So could you please try the following patch which sets the flag while
holding the chain lock:


diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 5e346a082e5f..2521ed9c1b52 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -354,8 +354,8 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
 	hlist_for_each_entry(qp, &hb->chain, list) {
 		if (qp->net == nf && f->match(qp, arg)) {
 			atomic_inc(&qp->refcnt);
-			spin_unlock(&hb->chain_lock);
 			qp_in->flags |= INET_FRAG_COMPLETE;
+			spin_unlock(&hb->chain_lock);
 			inet_frag_put(qp_in, f);
 			return qp;
 		}

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-18 10:02     ` Nikolay Aleksandrov
  2015-07-18 13:31       ` Nikolay Aleksandrov
@ 2015-07-18 15:28       ` Johan Schuijt
  2015-07-18 15:30         ` Johan Schuijt
  2015-07-18 15:32         ` Nikolay Aleksandrov
  1 sibling, 2 replies; 20+ messages in thread
From: Johan Schuijt @ 2015-07-18 15:28 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Eric Dumazet, nikolay, davem, fw, chutzpah, Robin Geuze,
	Frank Schreuder, netdev

Thx for your looking into this!

> 
> Thank you for the report, I will try to reproduce this locally
> Could you please post the full crash log ?

Of course, please see attached file.

> Also could you test
> with a clean current kernel from Linus' tree or Dave's -net ?

Will do.

> These are available at:
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
> respectively.
> 
> One last question how many IRQs do you pin i.e. how many cores
> do you actively use for receive ?

This varies a bit across our systems, but we’ve managed to reproduce this with IRQs pinned on as many as 2,4,8 or 20 cores.

I won’t have access to our test-setup till Monday again, so I’ll be testing 3 scenario’s then:
- Your patch
- Linux tree
- Dave’s -net tree

I’ll make sure to keep you posted on all the results then. We have a kernel dump of the panic, so if you need me to extract any data from there just let me know! (Some instructions might be needed)

- Johan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-18 15:28       ` Johan Schuijt
@ 2015-07-18 15:30         ` Johan Schuijt
  2015-07-18 15:32         ` Nikolay Aleksandrov
  1 sibling, 0 replies; 20+ messages in thread
From: Johan Schuijt @ 2015-07-18 15:30 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Eric Dumazet, nikolay, davem, fw, chutzpah, Robin Geuze,
	Frank Schreuder, netdev

[-- Attachment #1: Type: text/plain, Size: 1338 bytes --]

With attachment this time, also not sure wether this is what you were referring to, so let me know if anything else needed!

- Johan


> On 18 Jul 2015, at 17:28, Johan Schuijt-Li <johan@transip.nl> wrote:
> 
> Thx for your looking into this!
> 
>> 
>> Thank you for the report, I will try to reproduce this locally
>> Could you please post the full crash log ?
> 
> Of course, please see attached file.
> 
>> Also could you test
>> with a clean current kernel from Linus' tree or Dave's -net ?
> 
> Will do.
> 
>> These are available at:
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
>> respectively.
>> 
>> One last question how many IRQs do you pin i.e. how many cores
>> do you actively use for receive ?
> 
> This varies a bit across our systems, but we’ve managed to reproduce this with IRQs pinned on as many as 2,4,8 or 20 cores.
> 
> I won’t have access to our test-setup till Monday again, so I’ll be testing 3 scenario’s then:
> - Your patch
> - Linux tree
> - Dave’s -net tree
> 
> I’ll make sure to keep you posted on all the results then. We have a kernel dump of the panic, so if you need me to extract any data from there just let me know! (Some instructions might be needed)
> 
> - Johan


[-- Attachment #2: log.txt --]
[-- Type: text/plain, Size: 3772 bytes --]

[28732.285611] general protection fault: 0000 [#1] SMP 
[28732.285665] Modules linked in: vhost_net vhost macvtap macvlan act_police cls_u32 sch_ingress cls_fw sch_sfq sch_htb nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack xt_physdev br_netfilter ebt_arp ebt_ip6 ebt_ip ebtable_nat tun rpcsec_gss_krb5 nfsv4 dns_resolver ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_filter ip6_tables nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc bridge 8021q garp mrp stp llc bonding xt_CT xt_DSCP iptable_mangle ipt_REJECT nf_reject_ipv4 xt_pkttype xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_owner iptable_filter iptable_raw ip_tables x_tables loop joydev hid_generic usbhid hid x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ttm crct10dif_pclmul crc32_pclmul
[28732.286421]  ghash_clmulni_intel aesni_intel drm_kms_helper drm i2c_algo_bit aes_x86_64 lrw gf128mul dcdbas ipmi_si i2c_core evdev glue_helper ablk_helper tpm_tis mei_me tpm ehci_pci ehci_hcd mei cryptd usbcore iTCO_wdt iTCO_vendor_support ipmi_msghandler lpc_ich mfd_core wmi pcspkr usb_common shpchp sb_edac edac_core acpi_power_meter acpi_pad button processor thermal_sys ext4 crc16 mbcache jbd2 dm_mod sg sd_mod ahci libahci bnx2x libata ptp pps_core mdio crc32c_generic megaraid_sas crc32c_intel scsi_mod libcrc32c
[28732.286955] CPU: 9 PID: 56 Comm: kworker/9:0 Not tainted 3.18.7-transip-2.0 #1
[28732.287023] Hardware name: Dell Inc. PowerEdge M620/0VHRN7, BIOS 2.5.2 02/03/2015
[28732.287096] Workqueue: events inet_frag_worker
[28732.287139] task: ffff885f3d9fc210 ti: ffff885f3da00000 task.ti: ffff885f3da00000
[28732.287205] RIP: 0010:[<ffffffff81480699>]  [<ffffffff81480699>] inet_evict_bucket+0x119/0x180
[28732.287278] RSP: 0018:ffff885f3da03d58  EFLAGS: 00010292
[28732.287318] RAX: ffff885f3da03d08 RBX: dead0000001000a8 RCX: ffff885f3da03d08
[28732.287362] RDX: 0000000000000006 RSI: ffff885f3da03ce8 RDI: dead0000001000a8
[28732.287406] RBP: 0000000000000002 R08: 0000000000000286 R09: ffff88302f401640
[28732.287450] R10: 0000000080000000 R11: ffff88602ec0c138 R12: ffffffff81a8d8c0
[28732.287494] R13: ffff885f3da03d70 R14: 0000000000000000 R15: ffff881d6efe1a00
[28732.287538] FS:  0000000000000000(0000) GS:ffff88602f280000(0000) knlGS:0000000000000000
[28732.287606] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[28732.287647] CR2: 0000000000b11000 CR3: 0000004f05b24000 CR4: 00000000000427e0
[28732.287691] Stack:
[28732.287722]  ffffffff81a905e0 ffffffff81a905e8 ffffffff814f4599 ffff881d6efe1a58
[28732.287807]  0000000000000246 000000000000002e ffffffff81a8d8c0 ffffffff81a918c0
[28732.287891]  00000000000002d3 0000000000000019 0000000000000240 ffffffff8148075a
[28732.287975] Call Trace:
[28732.288013]  [<ffffffff814f4599>] ? _raw_spin_unlock_irqrestore+0x9/0x10
[28732.288056]  [<ffffffff8148075a>] ? inet_frag_worker+0x5a/0x250
[28732.288103]  [<ffffffff8107be19>] ? process_one_work+0x149/0x3f0
[28732.288146]  [<ffffffff8107c6e3>] ? worker_thread+0x63/0x490
[28732.288187]  [<ffffffff8107c680>] ? rescuer_thread+0x290/0x290
[28732.288229]  [<ffffffff8108103e>] ? kthread+0xce/0xf0
[28732.288269]  [<ffffffff81080f70>] ? kthread_create_on_node+0x180/0x180
[28732.288313]  [<ffffffff814f4d7c>] ? ret_from_fork+0x7c/0xb0
[28732.288353]  [<ffffffff81080f70>] ? kthread_create_on_node+0x180/0x180
[28732.288396] Code: 8b 04 24 66 83 40 08 01 48 8b 7c 24 18 48 85 ff 74 2a 48 83 ef 58 75 13 eb 22 0f 1f 84 00 00 00 00 00 48 83 eb 58 48 89 df 74 11 <48> 8b 5f 58 41 ff 94 24 70 40 00 00 48 85 db 75 e6 48 83 c4 28 
[28732.288827] RIP  [<ffffffff81480699>] inet_evict_bucket+0x119/0x180
[28732.288873]  RSP <ffff885f3da03d58>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-18 15:28       ` Johan Schuijt
  2015-07-18 15:30         ` Johan Schuijt
@ 2015-07-18 15:32         ` Nikolay Aleksandrov
  2015-07-20 12:47           ` Frank Schreuder
  1 sibling, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2015-07-18 15:32 UTC (permalink / raw)
  To: Johan Schuijt
  Cc: Eric Dumazet, nikolay, davem, fw, chutzpah, Robin Geuze,
	Frank Schreuder, netdev

On 07/18/2015 05:28 PM, Johan Schuijt wrote:
> Thx for your looking into this!
> 
>>
>> Thank you for the report, I will try to reproduce this locally
>> Could you please post the full crash log ?
> 
> Of course, please see attached file.
> 
>> Also could you test
>> with a clean current kernel from Linus' tree or Dave's -net ?
> 
> Will do.
> 
>> These are available at:
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
>> respectively.
>>
>> One last question how many IRQs do you pin i.e. how many cores
>> do you actively use for receive ?
> 
> This varies a bit across our systems, but we’ve managed to reproduce this with IRQs pinned on as many as 2,4,8 or 20 cores.
> 
> I won’t have access to our test-setup till Monday again, so I’ll be testing 3 scenario’s then:
> - Your patch
-----
> - Linux tree
> - Dave’s -net tree
Just one of these two would be enough. I couldn't reproduce it here but
I don't have as many machines to test right now and had to improvise with VMs. :-)

> 
> I’ll make sure to keep you posted on all the results then. We have a kernel dump of the panic, so if you need me to extract any data from there just let me know! (Some instructions might be needed)
> 
> - Johan
> 
Great, thank you!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-18 15:32         ` Nikolay Aleksandrov
@ 2015-07-20 12:47           ` Frank Schreuder
  2015-07-20 14:02             ` Nikolay Aleksandrov
  2015-07-20 14:30             ` Florian Westphal
  0 siblings, 2 replies; 20+ messages in thread
From: Frank Schreuder @ 2015-07-20 12:47 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Johan Schuijt
  Cc: Eric Dumazet, nikolay, davem, fw, chutzpah, Robin Geuze, netdev


On 7/18/2015  05:32 PM, Nikolay Aleksandrov wrote:
> On 07/18/2015 05:28 PM, Johan Schuijt wrote:
>> Thx for your looking into this!
>>
>>> Thank you for the report, I will try to reproduce this locally
>>> Could you please post the full crash log ?
>> Of course, please see attached file.
>>
>>> Also could you test
>>> with a clean current kernel from Linus' tree or Dave's -net ?
>> Will do.
>>
>>> These are available at:
>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
>>> respectively.
>>>
>>> One last question how many IRQs do you pin i.e. how many cores
>>> do you actively use for receive ?
>> This varies a bit across our systems, but we’ve managed to reproduce this with IRQs pinned on as many as 2,4,8 or 20 cores.
>>
>> I won’t have access to our test-setup till Monday again, so I’ll be testing 3 scenario’s then:
>> - Your patch
> -----
>> - Linux tree
>> - Dave’s -net tree
> Just one of these two would be enough. I couldn't reproduce it here but
> I don't have as many machines to test right now and had to improvise with VMs. :-)
>
>> I’ll make sure to keep you posted on all the results then. We have a kernel dump of the panic, so if you need me to extract any data from there just let me know! (Some instructions might be needed)
>>
>> - Johan
>>
> Great, thank you!
>
I'm able to reproduce this panic on the following kernel builds:
- 3.18.7
- 3.18.18
- 3.18.18 + patch from Nikolay Aleksandrov
- 4.1.0

Would you happen to have any more suggestions we can try?

Thanks,
Frank

-- 

TransIP BV

Schipholweg 11E
2316XB Leiden
E: fschreuder@transip.nl
I: https://www.transip.nl

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-20 12:47           ` Frank Schreuder
@ 2015-07-20 14:02             ` Nikolay Aleksandrov
  2015-07-20 14:30             ` Florian Westphal
  1 sibling, 0 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2015-07-20 14:02 UTC (permalink / raw)
  To: Frank Schreuder, Johan Schuijt
  Cc: Eric Dumazet, nikolay, davem, fw, chutzpah, Robin Geuze, netdev

On 07/20/2015 02:47 PM, Frank Schreuder wrote:
> 
> On 7/18/2015  05:32 PM, Nikolay Aleksandrov wrote:
>> On 07/18/2015 05:28 PM, Johan Schuijt wrote:
>>> Thx for your looking into this!
>>>
>>>> Thank you for the report, I will try to reproduce this locally
>>>> Could you please post the full crash log ?
>>> Of course, please see attached file.
>>>
>>>> Also could you test
>>>> with a clean current kernel from Linus' tree or Dave's -net ?
>>> Will do.
>>>
>>>> These are available at:
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
>>>> respectively.
>>>>
>>>> One last question how many IRQs do you pin i.e. how many cores
>>>> do you actively use for receive ?
>>> This varies a bit across our systems, but we’ve managed to reproduce this with IRQs pinned on as many as 2,4,8 or 20 cores.
>>>
>>> I won’t have access to our test-setup till Monday again, so I’ll be testing 3 scenario’s then:
>>> - Your patch
>> -----
>>> - Linux tree
>>> - Dave’s -net tree
>> Just one of these two would be enough. I couldn't reproduce it here but
>> I don't have as many machines to test right now and had to improvise with VMs. :-)
>>
>>> I’ll make sure to keep you posted on all the results then. We have a kernel dump of the panic, so if you need me to extract any data from there just let me know! (Some instructions might be needed)
>>>
>>> - Johan
>>>
>> Great, thank you!
>>
> I'm able to reproduce this panic on the following kernel builds:
> - 3.18.7
> - 3.18.18
> - 3.18.18 + patch from Nikolay Aleksandrov
> - 4.1.0
> 
> Would you happen to have any more suggestions we can try?
> 
> Thanks,
> Frank
> 

Unfortunately I was wrong about my theory because I mixed qp and qp_in, the new frag
doesn't make the chainlist if that codepath is hit so it couldn't mix the flags.
I'm still trying (unsuccessfully) to reproduce this, I've tried with up to 4 cores
and 4 different pinned irqs but no luck so far.
Anyway, I'll keep looking into this and will let you know if I get anywhere.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-20 12:47           ` Frank Schreuder
  2015-07-20 14:02             ` Nikolay Aleksandrov
@ 2015-07-20 14:30             ` Florian Westphal
  2015-07-21 11:50               ` Frank Schreuder
  1 sibling, 1 reply; 20+ messages in thread
From: Florian Westphal @ 2015-07-20 14:30 UTC (permalink / raw)
  To: Frank Schreuder
  Cc: Nikolay Aleksandrov, Johan Schuijt, Eric Dumazet, nikolay, davem,
	fw, chutzpah, Robin Geuze, netdev

Frank Schreuder <fschreuder@transip.nl> wrote:
> 
> On 7/18/2015  05:32 PM, Nikolay Aleksandrov wrote:
> >On 07/18/2015 05:28 PM, Johan Schuijt wrote:
> >>Thx for your looking into this!
> >>
> >>>Thank you for the report, I will try to reproduce this locally
> >>>Could you please post the full crash log ?
> >>Of course, please see attached file.
> >>
> >>>Also could you test
> >>>with a clean current kernel from Linus' tree or Dave's -net ?
> >>Will do.
> >>
> >>>These are available at:
> >>>git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> >>>git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
> >>>respectively.
> >>>
> >>>One last question how many IRQs do you pin i.e. how many cores
> >>>do you actively use for receive ?
> >>This varies a bit across our systems, but we’ve managed to reproduce this with IRQs pinned on as many as 2,4,8 or 20 cores.
> >>
> >>I won’t have access to our test-setup till Monday again, so I’ll be testing 3 scenario’s then:
> >>- Your patch
> >-----
> >>- Linux tree
> >>- Dave’s -net tree
> >Just one of these two would be enough. I couldn't reproduce it here but
> >I don't have as many machines to test right now and had to improvise with VMs. :-)
> >
> >>I’ll make sure to keep you posted on all the results then. We have a kernel dump of the panic, so if you need me to extract any data from there just let me know! (Some instructions might be needed)
> >>
> >>- Johan
> >>
> >Great, thank you!
> >
> I'm able to reproduce this panic on the following kernel builds:
> - 3.18.7
> - 3.18.18
> - 3.18.18 + patch from Nikolay Aleksandrov
> - 4.1.0
> 
> Would you happen to have any more suggestions we can try?

Yes, although I admit its clutching at straws.

Problem is that I don't see how we can race with timer, but OTOH
I don't see why this needs to play refcnt tricks if we can just skip
the entry completely ...

The other issue is parallel completion on other cpu, but don't
see how we could trip there either.

Do you always get this one crash backtrace from evictor wq?

I'll set up a bigger test machine soon and will also try to reproduce
this.

Thanks for reporting!

diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -131,24 +131,14 @@ inet_evict_bucket(struct inet_frags *f, struct inet_frag_bucket *hb)
 	unsigned int evicted = 0;
 	HLIST_HEAD(expired);
 
-evict_again:
 	spin_lock(&hb->chain_lock);
 
 	hlist_for_each_entry_safe(fq, n, &hb->chain, list) {
 		if (!inet_fragq_should_evict(fq))
 			continue;
 
-		if (!del_timer(&fq->timer)) {
-			/* q expiring right now thus increment its refcount so
-			 * it won't be freed under us and wait until the timer
-			 * has finished executing then destroy it
-			 */
-			atomic_inc(&fq->refcnt);
-			spin_unlock(&hb->chain_lock);
-			del_timer_sync(&fq->timer);
-			inet_frag_put(fq, f);
-			goto evict_again;
-		}
+		if (!del_timer(&fq->timer))
+			continue;
 
 		fq->flags |= INET_FRAG_EVICTED;
 		hlist_del(&fq->list);
@@ -240,18 +230,20 @@ void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f)
 	int i;
 
 	nf->low_thresh = 0;
-	local_bh_disable();
 
 evict_again:
+	local_bh_disable();
 	seq = read_seqbegin(&f->rnd_seqlock);
 
 	for (i = 0; i < INETFRAGS_HASHSZ ; i++)
 		inet_evict_bucket(f, &f->hash[i]);
 
-	if (read_seqretry(&f->rnd_seqlock, seq))
-		goto evict_again;
-
 	local_bh_enable();
+	cond_resched();
+
+	if (read_seqretry(&f->rnd_seqlock, seq) ||
+	    percpu_counter_sum(&nf->mem))
+		goto evict_again;
 
 	percpu_counter_destroy(&nf->mem);
 }
@@ -286,6 +278,8 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
 	hb = get_frag_bucket_locked(fq, f);
 	if (!(fq->flags & INET_FRAG_EVICTED))
 		hlist_del(&fq->list);
+
+	fq->flags |= INET_FRAG_COMPLETE;
 	spin_unlock(&hb->chain_lock);
 }
 
@@ -297,7 +291,6 @@ void inet_frag_kill(struct inet_frag_queue *fq, struct inet_frags *f)
 	if (!(fq->flags & INET_FRAG_COMPLETE)) {
 		fq_unlink(fq, f);
 		atomic_dec(&fq->refcnt);
-		fq->flags |= INET_FRAG_COMPLETE;
 	}
 }
 EXPORT_SYMBOL(inet_frag_kill);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-20 14:30             ` Florian Westphal
@ 2015-07-21 11:50               ` Frank Schreuder
  2015-07-21 18:34                 ` Florian Westphal
  0 siblings, 1 reply; 20+ messages in thread
From: Frank Schreuder @ 2015-07-21 11:50 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Nikolay Aleksandrov, Johan Schuijt, Eric Dumazet, nikolay, davem,
	chutzpah, Robin Geuze, netdev



On 7/20/2015 04:30 PM Florian Westphal wrote:
> Frank Schreuder <fschreuder@transip.nl> wrote:
>> On 7/18/2015  05:32 PM, Nikolay Aleksandrov wrote:
>>> On 07/18/2015 05:28 PM, Johan Schuijt wrote:
>>>> Thx for your looking into this!
>>>>
>>>>> Thank you for the report, I will try to reproduce this locally
>>>>> Could you please post the full crash log ?
>>>> Of course, please see attached file.
>>>>
>>>>> Also could you test
>>>>> with a clean current kernel from Linus' tree or Dave's -net ?
>>>> Will do.
>>>>
>>>>> These are available at:
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
>>>>> respectively.
>>>>>
>>>>> One last question how many IRQs do you pin i.e. how many cores
>>>>> do you actively use for receive ?
>>>> This varies a bit across our systems, but we’ve managed to reproduce this with IRQs pinned on as many as 2,4,8 or 20 cores.
>>>>
>>>> I won’t have access to our test-setup till Monday again, so I’ll be testing 3 scenario’s then:
>>>> - Your patch
>>> -----
>>>> - Linux tree
>>>> - Dave’s -net tree
>>> Just one of these two would be enough. I couldn't reproduce it here but
>>> I don't have as many machines to test right now and had to improvise with VMs. :-)
>>>
>>>> I’ll make sure to keep you posted on all the results then. We have a kernel dump of the panic, so if you need me to extract any data from there just let me know! (Some instructions might be needed)
>>>>
>>>> - Johan
>>>>
>>> Great, thank you!
>>>
>> I'm able to reproduce this panic on the following kernel builds:
>> - 3.18.7
>> - 3.18.18
>> - 3.18.18 + patch from Nikolay Aleksandrov
>> - 4.1.0
>>
>> Would you happen to have any more suggestions we can try?
> Yes, although I admit its clutching at straws.
>
> Problem is that I don't see how we can race with timer, but OTOH
> I don't see why this needs to play refcnt tricks if we can just skip
> the entry completely ...
>
> The other issue is parallel completion on other cpu, but don't
> see how we could trip there either.
>
> Do you always get this one crash backtrace from evictor wq?
>
> I'll set up a bigger test machine soon and will also try to reproduce
> this.
>
> Thanks for reporting!
>
> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> --- a/net/ipv4/inet_fragment.c
> +++ b/net/ipv4/inet_fragment.c
> @@ -131,24 +131,14 @@ inet_evict_bucket(struct inet_frags *f, struct inet_frag_bucket *hb)
>   	unsigned int evicted = 0;
>   	HLIST_HEAD(expired);
>   
> -evict_again:
>   	spin_lock(&hb->chain_lock);
>   
>   	hlist_for_each_entry_safe(fq, n, &hb->chain, list) {
>   		if (!inet_fragq_should_evict(fq))
>   			continue;
>   
> -		if (!del_timer(&fq->timer)) {
> -			/* q expiring right now thus increment its refcount so
> -			 * it won't be freed under us and wait until the timer
> -			 * has finished executing then destroy it
> -			 */
> -			atomic_inc(&fq->refcnt);
> -			spin_unlock(&hb->chain_lock);
> -			del_timer_sync(&fq->timer);
> -			inet_frag_put(fq, f);
> -			goto evict_again;
> -		}
> +		if (!del_timer(&fq->timer))
> +			continue;
>   
>   		fq->flags |= INET_FRAG_EVICTED;
>   		hlist_del(&fq->list);
> @@ -240,18 +230,20 @@ void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f)
>   	int i;
>   
>   	nf->low_thresh = 0;
> -	local_bh_disable();
>   
>   evict_again:
> +	local_bh_disable();
>   	seq = read_seqbegin(&f->rnd_seqlock);
>   
>   	for (i = 0; i < INETFRAGS_HASHSZ ; i++)
>   		inet_evict_bucket(f, &f->hash[i]);
>   
> -	if (read_seqretry(&f->rnd_seqlock, seq))
> -		goto evict_again;
> -
>   	local_bh_enable();
> +	cond_resched();
> +
> +	if (read_seqretry(&f->rnd_seqlock, seq) ||
> +	    percpu_counter_sum(&nf->mem))
> +		goto evict_again;
>   
>   	percpu_counter_destroy(&nf->mem);
>   }
> @@ -286,6 +278,8 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
>   	hb = get_frag_bucket_locked(fq, f);
>   	if (!(fq->flags & INET_FRAG_EVICTED))
>   		hlist_del(&fq->list);
> +
> +	fq->flags |= INET_FRAG_COMPLETE;
>   	spin_unlock(&hb->chain_lock);
>   }
>   
> @@ -297,7 +291,6 @@ void inet_frag_kill(struct inet_frag_queue *fq, struct inet_frags *f)
>   	if (!(fq->flags & INET_FRAG_COMPLETE)) {
>   		fq_unlink(fq, f);
>   		atomic_dec(&fq->refcnt);
> -		fq->flags |= INET_FRAG_COMPLETE;
>   	}
>   }
>   EXPORT_SYMBOL(inet_frag_kill);
Thanks a lot for your time and the patch. Unfortunately we are still 
able to reproduce the panic on kernel 3.18.18 with this patch included.
 From all previous tests, the same backtrace occurs. If there is any way 
we can provide you with more debug information, please let me know.

Thanks a lot,
Frank

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-21 11:50               ` Frank Schreuder
@ 2015-07-21 18:34                 ` Florian Westphal
  2015-07-22  8:09                   ` Frank Schreuder
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Westphal @ 2015-07-21 18:34 UTC (permalink / raw)
  To: Frank Schreuder
  Cc: Florian Westphal, Nikolay Aleksandrov, Johan Schuijt,
	Eric Dumazet, nikolay, davem, chutzpah, Robin Geuze, netdev

Frank Schreuder <fschreuder@transip.nl> wrote:

[ inet frag evictor crash ]

We believe we found the bug.  This patch should fix it.

We cannot share list for buckets and evictor, the flag member is
subject to race conditions so flags & INET_FRAG_EVICTED test is not
reliable.

It would be great if you could confirm that this fixes the problem
for you, we'll then make formal patch submission.

Please apply this on kernel without previous test patches, wheter you
use affected -stable or net-next kernel shouldn't matter since those are
similar enough.

Many thanks!

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -45,6 +45,7 @@ enum {
  * @flags: fragment queue flags
  * @max_size: maximum received fragment size
  * @net: namespace that this frag belongs to
+ * @list_evictor: list of queues to forcefully evict (e.g. due to low memory)
  */
 struct inet_frag_queue {
 	spinlock_t		lock;
@@ -59,6 +60,7 @@ struct inet_frag_queue {
 	__u8			flags;
 	u16			max_size;
 	struct netns_frags	*net;
+	struct hlist_node	list_evictor;
 };
 
 #define INETFRAGS_HASHSZ	1024
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 5e346a0..1722348 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -151,14 +151,13 @@ evict_again:
 		}
 
 		fq->flags |= INET_FRAG_EVICTED;
-		hlist_del(&fq->list);
-		hlist_add_head(&fq->list, &expired);
+		hlist_add_head(&fq->list_evictor, &expired);
 		++evicted;
 	}
 
 	spin_unlock(&hb->chain_lock);
 
-	hlist_for_each_entry_safe(fq, n, &expired, list)
+	hlist_for_each_entry_safe(fq, n, &expired, list_evictor)
 		f->frag_expire((unsigned long) fq);
 
 	return evicted;
@@ -284,8 +283,7 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
 	struct inet_frag_bucket *hb;
 
 	hb = get_frag_bucket_locked(fq, f);
-	if (!(fq->flags & INET_FRAG_EVICTED))
-		hlist_del(&fq->list);
+	hlist_del(&fq->list);
 	spin_unlock(&hb->chain_lock);
 }
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-21 18:34                 ` Florian Westphal
@ 2015-07-22  8:09                   ` Frank Schreuder
  2015-07-22  8:17                     ` Frank Schreuder
  0 siblings, 1 reply; 20+ messages in thread
From: Frank Schreuder @ 2015-07-22  8:09 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Nikolay Aleksandrov, Johan Schuijt, Eric Dumazet, nikolay, davem,
	chutzpah, Robin Geuze, netdev



Op 7/21/2015 om 8:34 PM schreef Florian Westphal:
> Frank Schreuder <fschreuder@transip.nl> wrote:
>
> [ inet frag evictor crash ]
>
> We believe we found the bug.  This patch should fix it.
>
> We cannot share list for buckets and evictor, the flag member is
> subject to race conditions so flags & INET_FRAG_EVICTED test is not
> reliable.
>
> It would be great if you could confirm that this fixes the problem
> for you, we'll then make formal patch submission.
>
> Please apply this on kernel without previous test patches, wheter you
> use affected -stable or net-next kernel shouldn't matter since those are
> similar enough.
>
> Many thanks!
>
> diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
> --- a/include/net/inet_frag.h
> +++ b/include/net/inet_frag.h
> @@ -45,6 +45,7 @@ enum {
>    * @flags: fragment queue flags
>    * @max_size: maximum received fragment size
>    * @net: namespace that this frag belongs to
> + * @list_evictor: list of queues to forcefully evict (e.g. due to low memory)
>    */
>   struct inet_frag_queue {
>   	spinlock_t		lock;
> @@ -59,6 +60,7 @@ struct inet_frag_queue {
>   	__u8			flags;
>   	u16			max_size;
>   	struct netns_frags	*net;
> +	struct hlist_node	list_evictor;
>   };
>   
>   #define INETFRAGS_HASHSZ	1024
> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> index 5e346a0..1722348 100644
> --- a/net/ipv4/inet_fragment.c
> +++ b/net/ipv4/inet_fragment.c
> @@ -151,14 +151,13 @@ evict_again:
>   		}
>   
>   		fq->flags |= INET_FRAG_EVICTED;
> -		hlist_del(&fq->list);
> -		hlist_add_head(&fq->list, &expired);
> +		hlist_add_head(&fq->list_evictor, &expired);
>   		++evicted;
>   	}
>   
>   	spin_unlock(&hb->chain_lock);
>   
> -	hlist_for_each_entry_safe(fq, n, &expired, list)
> +	hlist_for_each_entry_safe(fq, n, &expired, list_evictor)
>   		f->frag_expire((unsigned long) fq);
>   
>   	return evicted;
> @@ -284,8 +283,7 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
>   	struct inet_frag_bucket *hb;
>   
>   	hb = get_frag_bucket_locked(fq, f);
> -	if (!(fq->flags & INET_FRAG_EVICTED))
> -		hlist_del(&fq->list);
> +	hlist_del(&fq->list);
>   	spin_unlock(&hb->chain_lock);
>   }
>   
Hi Florian,

Thanks for the patch!

After implementing the patch in our setup we are no longer able to 
reproduct the kernel panic.
Unfortunately the server load increases after 5/10 minutes and the logs 
are getting spammed with stacktraces.
I included a snippet below.

Do you have any insights on why this happens, and how we can resolve this?

Thanks,
Frank


Jul 22 09:44:17 dommy0 kernel: [  360.121516] Modules linked in: 
parport_pc ppdev lp parport bnep rfcomm bluetooth rfkill uinput nfsd 
auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop 
coretemp kvm ttm drm_kms_helper iTCO_wdt drm psmouse ipmi_si 
iTCO_vendor_support tpm_tis tpm ipmi_msghandler i2c_algo_bit i2c_core 
i7core_edac dcdbas serio_raw pcspkr wmi lpc_ich edac_core mfd_core evdev 
button acpi_power_meter processor thermal_sys ext4 crc16 mbcache jbd2 
sd_mod sg sr_mod cdrom hid_generic usbhid ata_generic hid crc32c_intel 
ata_piix mptsas scsi_transport_sas mptscsih libata mptbase ehci_pci 
scsi_mod uhci_hcd ehci_hcd usbcore usb_common ixgbe dca ptp bnx2 
pps_core mdio
Jul 22 09:44:17 dommy0 kernel: [  360.121560] CPU: 3 PID: 42 Comm: 
kworker/3:1 Tainted: G        W    L 3.18.18-transip-1.6 #1
Jul 22 09:44:17 dommy0 kernel: [  360.121562] Hardware name: Dell Inc. 
PowerEdge R410/01V648, BIOS 1.12.0 07/30/2013
Jul 22 09:44:17 dommy0 kernel: [  360.121567] Workqueue: events 
inet_frag_worker
Jul 22 09:44:17 dommy0 kernel: [  360.121568] task: ffff880224574490 ti: 
ffff8802240a0000 task.ti: ffff8802240a0000
Jul 22 09:44:17 dommy0 kernel: [  360.121570] RIP: 
0010:[<ffffffff810c0872>]  [<ffffffff810c0872>] del_timer_sync+0x42/0x60
Jul 22 09:44:17 dommy0 kernel: [  360.121575] RSP: 
0018:ffff8802240a3d48  EFLAGS: 00000246
Jul 22 09:44:17 dommy0 kernel: [  360.121576] RAX: 0000000000000200 RBX: 
0000000000000000 RCX: 0000000000000000
Jul 22 09:44:17 dommy0 kernel: [  360.121578] RDX: ffff88022215ce40 RSI: 
0000000000300000 RDI: ffff88022215cdf0
Jul 22 09:44:17 dommy0 kernel: [  360.121579] RBP: 0000000000000003 R08: 
ffff880222343c00 R09: 0000000000000101
Jul 22 09:44:17 dommy0 kernel: [  360.121581] R10: 0000000000000000 R11: 
0000000000000027 R12: ffff880222343c00
Jul 22 09:44:17 dommy0 kernel: [  360.121582] R13: 0000000000000101 R14: 
0000000000000000 R15: 0000000000000027
Jul 22 09:44:17 dommy0 kernel: [  360.121584] FS: 0000000000000000(0000) 
GS:ffff88022f260000(0000) knlGS:0000000000000000
Jul 22 09:44:17 dommy0 kernel: [  360.121585] CS:  0010 DS: 0000 ES: 
0000 CR0: 000000008005003b
Jul 22 09:44:17 dommy0 kernel: [  360.121587] CR2: 00007fb1e9884095 CR3: 
000000021c084000 CR4: 00000000000007e0
Jul 22 09:44:17 dommy0 kernel: [  360.121588] Stack:
Jul 22 09:44:17 dommy0 kernel: [  360.121589]  ffff88022215cdf0 
ffffffff8149289e ffffffff81a8aa30 ffffffff81a8aa38
Jul 22 09:44:17 dommy0 kernel: [  360.121592]  0000000000000286 
ffff88022215ce88 ffffffff8149287f 0000000000000394
Jul 22 09:44:17 dommy0 kernel: [  360.121594]  ffffffff81a87100 
0000000000000001 000000000000007c 0000000000000000
Jul 22 09:44:17 dommy0 kernel: [  360.121596] Call Trace:
Jul 22 09:44:17 dommy0 kernel: [  360.121600] [<ffffffff8149289e>] ? 
inet_evict_bucket+0x11e/0x140
Jul 22 09:44:17 dommy0 kernel: [  360.121602] [<ffffffff8149287f>] ? 
inet_evict_bucket+0xff/0x140
Jul 22 09:44:17 dommy0 kernel: [  360.121605] [<ffffffff814929b0>] ? 
inet_frag_worker+0x60/0x210
Jul 22 09:44:17 dommy0 kernel: [  360.121609] [<ffffffff8107e3a2>] ? 
process_one_work+0x142/0x3b0
Jul 22 09:44:17 dommy0 kernel: [  360.121612] [<ffffffff815078ed>] ? 
schedule+0x1d/0x70
Jul 22 09:44:17 dommy0 kernel: [  360.121614] [<ffffffff8107eb94>] ? 
worker_thread+0x114/0x440
Jul 22 09:44:17 dommy0 kernel: [  360.121617] [<ffffffff815073ad>] ? 
__schedule+0x2cd/0x7b0
Jul 22 09:44:17 dommy0 kernel: [  360.121619] [<ffffffff8107ea80>] ? 
create_worker+0x1a0/0x1a0
Jul 22 09:44:17 dommy0 kernel: [  360.121622] [<ffffffff81083dfc>] ? 
kthread+0xbc/0xe0
Jul 22 09:44:17 dommy0 kernel: [  360.121624] [<ffffffff81083d40>] ? 
kthread_create_on_node+0x1c0/0x1c0
Jul 22 09:44:17 dommy0 kernel: [  360.121627] [<ffffffff8150b218>] ? 
ret_from_fork+0x58/0x90
Jul 22 09:44:17 dommy0 kernel: [  360.121629] [<ffffffff81083d40>] ? 
kthread_create_on_node+0x1c0/0x1c0
Jul 22 09:44:17 dommy0 kernel: [  360.121631] Code: 75 29 be 3c 04 00 00 
48 c7 c7 0c 73 71 81 e8 26 72 fa ff 48 89 df e8 6e ff ff ff 85 c0 79 18 
66 2e 0f 1f 84 00 00 00 00 00 f3 90 <48> 89 df e8 56 ff ff ff 85 c0 78 
f2 5b 90 c3 66 66 66 66 66 66

Jul 22 09:44:27 dommy0 kernel: [  370.097476] Task dump for CPU 3:
Jul 22 09:44:27 dommy0 kernel: [  370.097478] kworker/3:1     R running 
task        0    42      2 0x00000008
Jul 22 09:44:27 dommy0 kernel: [  370.097482] Workqueue: events 
inet_frag_worker
Jul 22 09:44:27 dommy0 kernel: [  370.097483]  0000000000000004 
ffffffff81849240 ffffffff810b9464 00000000000003dc
Jul 22 09:44:27 dommy0 kernel: [  370.097485]  ffff88022f26d4c0 
ffffffff81849180 ffffffff81849240 ffffffff818b4e40
Jul 22 09:44:27 dommy0 kernel: [  370.097488]  ffffffff810bc797 
0000000000000000 ffffffff810c6dc9 0000000000000092
Jul 22 09:44:27 dommy0 kernel: [  370.097490] Call Trace:
Jul 22 09:44:27 dommy0 kernel: [  370.097491]  <IRQ> 
[<ffffffff810b9464>] ? rcu_dump_cpu_stacks+0x84/0xc0
Jul 22 09:44:27 dommy0 kernel: [  370.097499] [<ffffffff810bc797>] ? 
rcu_check_callbacks+0x407/0x650
Jul 22 09:44:27 dommy0 kernel: [  370.097501] [<ffffffff810c6dc9>] ? 
timekeeping_update.constprop.8+0x89/0x1b0
Jul 22 09:44:27 dommy0 kernel: [  370.097504] [<ffffffff810c7ec5>] ? 
update_wall_time+0x225/0x5c0
Jul 22 09:44:27 dommy0 kernel: [  370.097507] [<ffffffff810cfcb0>] ? 
tick_sched_do_timer+0x30/0x30
Jul 22 09:44:27 dommy0 kernel: [  370.097510] [<ffffffff810c14df>] ? 
update_process_times+0x3f/0x80
Jul 22 09:44:27 dommy0 kernel: [  370.097513] [<ffffffff810cfb27>] ? 
tick_sched_handle.isra.12+0x27/0x70
Jul 22 09:44:27 dommy0 kernel: [  370.097515] [<ffffffff810cfcf5>] ? 
tick_sched_timer+0x45/0x80
Jul 22 09:44:27 dommy0 kernel: [  370.097518] [<ffffffff810c1d76>] ? 
__run_hrtimer+0x66/0x1b0
Jul 22 09:44:27 dommy0 kernel: [  370.097522] [<ffffffff8101c5c5>] ? 
read_tsc+0x5/0x10
Jul 22 09:44:27 dommy0 kernel: [  370.097524] [<ffffffff810c2519>] ? 
hrtimer_interrupt+0xf9/0x230
Jul 22 09:44:27 dommy0 kernel: [  370.097528] [<ffffffff81046d86>] ? 
smp_apic_timer_interrupt+0x36/0x50
Jul 22 09:44:27 dommy0 kernel: [  370.097531] [<ffffffff8150c0bd>] ? 
apic_timer_interrupt+0x6d/0x80
Jul 22 09:44:27 dommy0 kernel: [  370.097532]  <EOI> 
[<ffffffff8150ad89>] ? _raw_spin_lock+0x9/0x30
Jul 22 09:44:27 dommy0 kernel: [  370.097537] [<ffffffff814927bb>] ? 
inet_evict_bucket+0x3b/0x140
Jul 22 09:44:27 dommy0 kernel: [  370.097539] [<ffffffff8149287f>] ? 
inet_evict_bucket+0xff/0x140
Jul 22 09:44:27 dommy0 kernel: [  370.097542] [<ffffffff814929b0>] ? 
inet_frag_worker+0x60/0x210
Jul 22 09:44:27 dommy0 kernel: [  370.097545] [<ffffffff8107e3a2>] ? 
process_one_work+0x142/0x3b0
Jul 22 09:44:27 dommy0 kernel: [  370.097547] [<ffffffff815078ed>] ? 
schedule+0x1d/0x70
Jul 22 09:44:27 dommy0 kernel: [  370.097550] [<ffffffff8107eb94>] ? 
worker_thread+0x114/0x440
Jul 22 09:44:27 dommy0 kernel: [  370.097552] [<ffffffff815073ad>] ? 
__schedule+0x2cd/0x7b0
Jul 22 09:44:27 dommy0 kernel: [  370.097554] [<ffffffff8107ea80>] ? 
create_worker+0x1a0/0x1a0
Jul 22 09:44:27 dommy0 kernel: [  370.097557] [<ffffffff81083dfc>] ? 
kthread+0xbc/0xe0
Jul 22 09:44:27 dommy0 kernel: [  370.097559] [<ffffffff81083d40>] ? 
kthread_create_on_node+0x1c0/0x1c0
Jul 22 09:44:27 dommy0 kernel: [  370.097562] [<ffffffff8150b218>] ? 
ret_from_fork+0x58/0x90
Jul 22 09:44:27 dommy0 kernel: [  370.097564] [<ffffffff81083d40>] ? 
kthread_create_on_node+0x1c0/0x1c0

Jul 22 09:44:53 dommy0 kernel: [  396.106303] Modules linked in: 
parport_pc ppdev lp parport bnep rfcomm bluetooth rfkill uinput nfsd 
auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop 
coretemp kvm ttm drm_kms_helper iTCO_wdt drm psmouse ipmi_si 
iTCO_vendor_support tpm_tis tpm ipmi_msghandler i2c_algo_bit i2c_core 
i7core_edac dcdbas serio_raw pcspkr wmi lpc_ich edac_core mfd_core evdev 
button acpi_power_meter processor thermal_sys ext4 crc16 mbcache jbd2 
sd_mod sg sr_mod cdrom hid_generic usbhid ata_generic hid crc32c_intel 
ata_piix mptsas scsi_transport_sas mptscsih libata mptbase ehci_pci 
scsi_mod uhci_hcd ehci_hcd usbcore usb_common ixgbe dca ptp bnx2 
pps_core mdio
Jul 22 09:44:53 dommy0 kernel: [  396.106347] CPU: 3 PID: 42 Comm: 
kworker/3:1 Tainted: G        W    L 3.18.18-transip-1.6 #1
Jul 22 09:44:53 dommy0 kernel: [  396.106348] Hardware name: Dell Inc. 
PowerEdge R410/01V648, BIOS 1.12.0 07/30/2013
Jul 22 09:44:53 dommy0 kernel: [  396.106353] Workqueue: events 
inet_frag_worker
Jul 22 09:44:53 dommy0 kernel: [  396.106355] task: ffff880224574490 ti: 
ffff8802240a0000 task.ti: ffff8802240a0000
Jul 22 09:44:53 dommy0 kernel: [  396.106356] RIP: 
0010:[<ffffffff8149288d>]  [<ffffffff8149288d>] 
inet_evict_bucket+0x10d/0x140
Jul 22 09:44:53 dommy0 kernel: [  396.106359] RSP: 
0018:ffff8802240a3d58  EFLAGS: 00000206
Jul 22 09:44:53 dommy0 kernel: [  396.106361] RAX: 0000000000000000 RBX: 
0000000000000286 RCX: 0000000000000000
Jul 22 09:44:53 dommy0 kernel: [  396.106362] RDX: ffff88022215ce40 RSI: 
0000000000300000 RDI: ffff88022215cdf0
Jul 22 09:44:53 dommy0 kernel: [  396.106364] RBP: 0000000000000003 R08: 
ffff880222343c00 R09: 0000000000000101
Jul 22 09:44:53 dommy0 kernel: [  396.106365] R10: 0000000000000000 R11: 
0000000000000027 R12: 0000000000000000
Jul 22 09:44:53 dommy0 kernel: [  396.106366] R13: 0000000000000000 R14: 
ffff880222343c00 R15: 0000000000000101
Jul 22 09:44:53 dommy0 kernel: [  396.106368] FS: 0000000000000000(0000) 
GS:ffff88022f260000(0000) knlGS:0000000000000000
Jul 22 09:44:53 dommy0 kernel: [  396.106370] CS:  0010 DS: 0000 ES: 
0000 CR0: 000000008005003b
Jul 22 09:44:53 dommy0 kernel: [  396.106371] CR2: 00007fb1e9884095 CR3: 
000000021c084000 CR4: 00000000000007e0
Jul 22 09:44:53 dommy0 kernel: [  396.106372] Stack:
Jul 22 09:44:53 dommy0 kernel: [  396.106373]  ffffffff81a8aa30 
ffffffff81a8aa38 0000000000000286 ffff88022215ce88
Jul 22 09:44:53 dommy0 kernel: [  396.106376]  ffffffff8149287f 
0000000000000394 ffffffff81a87100 0000000000000001
Jul 22 09:44:53 dommy0 kernel: [  396.106378]  000000000000007c 
0000000000000000 00000000000000c0 ffffffff814929b0
Jul 22 09:44:53 dommy0 kernel: [  396.106380] Call Trace:
Jul 22 09:44:53 dommy0 kernel: [  396.106383] [<ffffffff8149287f>] ? 
inet_evict_bucket+0xff/0x140
Jul 22 09:44:53 dommy0 kernel: [  396.106386] [<ffffffff814929b0>] ? 
inet_frag_worker+0x60/0x210
Jul 22 09:44:53 dommy0 kernel: [  396.106390] [<ffffffff8107e3a2>] ? 
process_one_work+0x142/0x3b0
Jul 22 09:44:53 dommy0 kernel: [  396.106393] [<ffffffff815078ed>] ? 
schedule+0x1d/0x70
Jul 22 09:44:53 dommy0 kernel: [  396.106396] [<ffffffff8107eb94>] ? 
worker_thread+0x114/0x440
Jul 22 09:44:53 dommy0 kernel: [  396.106398] [<ffffffff815073ad>] ? 
__schedule+0x2cd/0x7b0
Jul 22 09:44:53 dommy0 kernel: [  396.106401] [<ffffffff8107ea80>] ? 
create_worker+0x1a0/0x1a0
Jul 22 09:44:53 dommy0 kernel: [  396.106403] [<ffffffff81083dfc>] ? 
kthread+0xbc/0xe0
Jul 22 09:44:53 dommy0 kernel: [  396.106406] [<ffffffff81083d40>] ? 
kthread_create_on_node+0x1c0/0x1c0
Jul 22 09:44:53 dommy0 kernel: [  396.106409] [<ffffffff8150b218>] ? 
ret_from_fork+0x58/0x90
Jul 22 09:44:53 dommy0 kernel: [  396.106411] [<ffffffff81083d40>] ? 
kthread_create_on_node+0x1c0/0x1c0
Jul 22 09:44:53 dommy0 kernel: [  396.106412] Code: a0 00 00 00 41 ff 94 
24 70 40 00 00 48 85 db 75 e5 48 83 c4 28 89 e8 5b 5d 41 5c 41 5d 41 5e 
41 5f c3 0f 1f 40 00 f0 41 ff 47 68 <48> 8b 44 24 08 66 83 00 01 48 89 
df e8 92 df c2 ff f0 41 ff 4f

Jul 22 09:45:21 dommy0 kernel: [  424.094444] Modules linked in: 
parport_pc ppdev lp parport bnep rfcomm bluetooth rfkill uinput nfsd 
auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop 
coretemp kvm ttm drm_kms_helper iTCO_wdt drm psmouse ipmi_si 
iTCO_vendor_support tpm_tis tpm ipmi_msghandler i2c_algo_bit i2c_core 
i7core_edac dcdbas serio_raw pcspkr wmi lpc_ich edac_core mfd_core evdev 
button acpi_power_meter processor thermal_sys ext4 crc16 mbcache jbd2 
sd_mod sg sr_mod cdrom hid_generic usbhid ata_generic hid crc32c_intel 
ata_piix mptsas scsi_transport_sas mptscsih libata mptbase ehci_pci 
scsi_mod uhci_hcd ehci_hcd usbcore usb_common ixgbe dca ptp bnx2 
pps_core mdio
Jul 22 09:45:21 dommy0 kernel: [  424.094487] CPU: 3 PID: 42 Comm: 
kworker/3:1 Tainted: G        W    L 3.18.18-transip-1.6 #1
Jul 22 09:45:21 dommy0 kernel: [  424.094488] Hardware name: Dell Inc. 
PowerEdge R410/01V648, BIOS 1.12.0 07/30/2013
Jul 22 09:45:21 dommy0 kernel: [  424.094492] Workqueue: events 
inet_frag_worker
Jul 22 09:45:21 dommy0 kernel: [  424.094494] task: ffff880224574490 ti: 
ffff8802240a0000 task.ti: ffff8802240a0000
Jul 22 09:45:21 dommy0 kernel: [  424.094495] RIP: 
0010:[<ffffffff810c08ac>]  [<ffffffff810c08ac>] del_timer+0x1c/0x70
Jul 22 09:45:21 dommy0 kernel: [  424.094500] RSP: 
0018:ffff8802240a3d28  EFLAGS: 00000246
Jul 22 09:45:21 dommy0 kernel: [  424.094502] RAX: ffffffff81895380 RBX: 
0000000000000000 RCX: 0000000000000000
Jul 22 09:45:21 dommy0 kernel: [  424.094503] RDX: ffff88022215ce40 RSI: 
0000000000300000 RDI: ffff88022215cdf0
Jul 22 09:45:21 dommy0 kernel: [  424.094505] RBP: 0000000000000000 R08: 
ffff880222343c00 R09: 0000000000000101
Jul 22 09:45:21 dommy0 kernel: [  424.094506] R10: 0000000000000000 R11: 
0000000000000027 R12: 0000000000000000
Jul 22 09:45:21 dommy0 kernel: [  424.094507] R13: ffff8802245a8000 R14: 
ffff880222343c00 R15: 0000000000000101
Jul 22 09:45:21 dommy0 kernel: [  424.094509] FS: 0000000000000000(0000) 
GS:ffff88022f260000(0000) knlGS:0000000000000000
Jul 22 09:45:21 dommy0 kernel: [  424.094511] CS:  0010 DS: 0000 ES: 
0000 CR0: 000000008005003b
Jul 22 09:45:21 dommy0 kernel: [  424.094512] CR2: 00007fb1e9884095 CR3: 
000000021c084000 CR4: 00000000000007e0
Jul 22 09:45:21 dommy0 kernel: [  424.094513] Stack:
Jul 22 09:45:21 dommy0 kernel: [  424.094514]  0000000000000296 
ffff88022215cdf0 ffff88022215cdf0 0000000000000003
Jul 22 09:45:21 dommy0 kernel: [  424.094517]  ffffffff81a87100 
ffffffff814927f7 ffffffff81a8aa30 ffffffff81a8aa38
Jul 22 09:45:21 dommy0 kernel: [  424.094519]  0000000000000286 
ffff88022215ce88 ffffffff8149287f 0000000000000394
Jul 22 09:45:21 dommy0 kernel: [  424.094521] Call Trace:
Jul 22 09:45:21 dommy0 kernel: [  424.094524] [<ffffffff814927f7>] ? 
inet_evict_bucket+0x77/0x140
Jul 22 09:45:21 dommy0 kernel: [  424.094527] [<ffffffff8149287f>] ? 
inet_evict_bucket+0xff/0x140
Jul 22 09:45:21 dommy0 kernel: [  424.094529] [<ffffffff814929b0>] ? 
inet_frag_worker+0x60/0x210
Jul 22 09:45:21 dommy0 kernel: [  424.094533] [<ffffffff8107e3a2>] ? 
process_one_work+0x142/0x3b0
Jul 22 09:45:21 dommy0 kernel: [  424.094536] [<ffffffff815078ed>] ? 
schedule+0x1d/0x70
Jul 22 09:45:21 dommy0 kernel: [  424.094539] [<ffffffff8107eb94>] ? 
worker_thread+0x114/0x440
Jul 22 09:45:21 dommy0 kernel: [  424.094541] [<ffffffff815073ad>] ? 
__schedule+0x2cd/0x7b0
Jul 22 09:45:21 dommy0 kernel: [  424.094544] [<ffffffff8107ea80>] ? 
create_worker+0x1a0/0x1a0
Jul 22 09:45:21 dommy0 kernel: [  424.094546] [<ffffffff81083dfc>] ? 
kthread+0xbc/0xe0
Jul 22 09:45:21 dommy0 kernel: [  424.094549] [<ffffffff81083d40>] ? 
kthread_create_on_node+0x1c0/0x1c0
Jul 22 09:45:21 dommy0 kernel: [  424.094552] [<ffffffff8150b218>] ? 
ret_from_fork+0x58/0x90
Jul 22 09:45:21 dommy0 kernel: [  424.094554] [<ffffffff81083d40>] ? 
kthread_create_on_node+0x1c0/0x1c0
Jul 22 09:45:21 dommy0 kernel: [  424.094555] Code: 66 66 66 66 66 66 2e 
0f 1f 84 00 00 00 00 00 48 83 ec 28 48 89 5c 24 10 48 89 6c 24 18 31 ed 
4c 89 64 24 20 48 83 3f 00 48 89 fb <48> c7 47 38 00 00 00 00 74 30 48 
8d 7f 18 48 8d 74 24 08 e8 0c

-- 

TransIP BV

Schipholweg 11E
2316XB Leiden
E: fschreuder@transip.nl
I: https://www.transip.nl

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-22  8:09                   ` Frank Schreuder
@ 2015-07-22  8:17                     ` Frank Schreuder
  2015-07-22  9:11                       ` Nikolay Aleksandrov
  0 siblings, 1 reply; 20+ messages in thread
From: Frank Schreuder @ 2015-07-22  8:17 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Nikolay Aleksandrov, Johan Schuijt, Eric Dumazet, nikolay, davem,
	chutzpah, Robin Geuze, netdev

I got some additional information from syslog:

Jul 22 09:49:33 dommy0 kernel: [  675.987890] NMI watchdog: BUG: soft 
lockup - CPU#3 stuck for 22s! [kworker/3:1:42]
Jul 22 09:49:42 dommy0 kernel: [  685.114033] INFO: rcu_sched 
self-detected stall on CPU { 3}  (t=39918 jiffies g=988 c=987 q=23168)

Thanks,
Frank


Op 7/22/2015 om 10:09 AM schreef Frank Schreuder:
>
>
> Op 7/21/2015 om 8:34 PM schreef Florian Westphal:
>> Frank Schreuder <fschreuder@transip.nl> wrote:
>>
>> [ inet frag evictor crash ]
>>
>> We believe we found the bug.  This patch should fix it.
>>
>> We cannot share list for buckets and evictor, the flag member is
>> subject to race conditions so flags & INET_FRAG_EVICTED test is not
>> reliable.
>>
>> It would be great if you could confirm that this fixes the problem
>> for you, we'll then make formal patch submission.
>>
>> Please apply this on kernel without previous test patches, wheter you
>> use affected -stable or net-next kernel shouldn't matter since those are
>> similar enough.
>>
>> Many thanks!
>>
>> diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
>> --- a/include/net/inet_frag.h
>> +++ b/include/net/inet_frag.h
>> @@ -45,6 +45,7 @@ enum {
>>    * @flags: fragment queue flags
>>    * @max_size: maximum received fragment size
>>    * @net: namespace that this frag belongs to
>> + * @list_evictor: list of queues to forcefully evict (e.g. due to 
>> low memory)
>>    */
>>   struct inet_frag_queue {
>>       spinlock_t        lock;
>> @@ -59,6 +60,7 @@ struct inet_frag_queue {
>>       __u8            flags;
>>       u16            max_size;
>>       struct netns_frags    *net;
>> +    struct hlist_node    list_evictor;
>>   };
>>     #define INETFRAGS_HASHSZ    1024
>> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
>> index 5e346a0..1722348 100644
>> --- a/net/ipv4/inet_fragment.c
>> +++ b/net/ipv4/inet_fragment.c
>> @@ -151,14 +151,13 @@ evict_again:
>>           }
>>             fq->flags |= INET_FRAG_EVICTED;
>> -        hlist_del(&fq->list);
>> -        hlist_add_head(&fq->list, &expired);
>> +        hlist_add_head(&fq->list_evictor, &expired);
>>           ++evicted;
>>       }
>>         spin_unlock(&hb->chain_lock);
>>   -    hlist_for_each_entry_safe(fq, n, &expired, list)
>> +    hlist_for_each_entry_safe(fq, n, &expired, list_evictor)
>>           f->frag_expire((unsigned long) fq);
>>         return evicted;
>> @@ -284,8 +283,7 @@ static inline void fq_unlink(struct 
>> inet_frag_queue *fq, struct inet_frags *f)
>>       struct inet_frag_bucket *hb;
>>         hb = get_frag_bucket_locked(fq, f);
>> -    if (!(fq->flags & INET_FRAG_EVICTED))
>> -        hlist_del(&fq->list);
>> +    hlist_del(&fq->list);
>>       spin_unlock(&hb->chain_lock);
>>   }
> Hi Florian,
>
> Thanks for the patch!
>
> After implementing the patch in our setup we are no longer able to 
> reproduct the kernel panic.
> Unfortunately the server load increases after 5/10 minutes and the 
> logs are getting spammed with stacktraces.
> I included a snippet below.
>
> Do you have any insights on why this happens, and how we can resolve 
> this?
>
> Thanks,
> Frank
>
>
> Jul 22 09:44:17 dommy0 kernel: [  360.121516] Modules linked in: 
> parport_pc ppdev lp parport bnep rfcomm bluetooth rfkill uinput nfsd 
> auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop 
> coretemp kvm ttm drm_kms_helper iTCO_wdt drm psmouse ipmi_si 
> iTCO_vendor_support tpm_tis tpm ipmi_msghandler i2c_algo_bit i2c_core 
> i7core_edac dcdbas serio_raw pcspkr wmi lpc_ich edac_core mfd_core 
> evdev button acpi_power_meter processor thermal_sys ext4 crc16 mbcache 
> jbd2 sd_mod sg sr_mod cdrom hid_generic usbhid ata_generic hid 
> crc32c_intel ata_piix mptsas scsi_transport_sas mptscsih libata 
> mptbase ehci_pci scsi_mod uhci_hcd ehci_hcd usbcore usb_common ixgbe 
> dca ptp bnx2 pps_core mdio
> Jul 22 09:44:17 dommy0 kernel: [  360.121560] CPU: 3 PID: 42 Comm: 
> kworker/3:1 Tainted: G        W    L 3.18.18-transip-1.6 #1
> Jul 22 09:44:17 dommy0 kernel: [  360.121562] Hardware name: Dell Inc. 
> PowerEdge R410/01V648, BIOS 1.12.0 07/30/2013
> Jul 22 09:44:17 dommy0 kernel: [  360.121567] Workqueue: events 
> inet_frag_worker
> Jul 22 09:44:17 dommy0 kernel: [  360.121568] task: ffff880224574490 
> ti: ffff8802240a0000 task.ti: ffff8802240a0000
> Jul 22 09:44:17 dommy0 kernel: [  360.121570] RIP: 
> 0010:[<ffffffff810c0872>]  [<ffffffff810c0872>] del_timer_sync+0x42/0x60
> Jul 22 09:44:17 dommy0 kernel: [  360.121575] RSP: 
> 0018:ffff8802240a3d48  EFLAGS: 00000246
> Jul 22 09:44:17 dommy0 kernel: [  360.121576] RAX: 0000000000000200 
> RBX: 0000000000000000 RCX: 0000000000000000
> Jul 22 09:44:17 dommy0 kernel: [  360.121578] RDX: ffff88022215ce40 
> RSI: 0000000000300000 RDI: ffff88022215cdf0
> Jul 22 09:44:17 dommy0 kernel: [  360.121579] RBP: 0000000000000003 
> R08: ffff880222343c00 R09: 0000000000000101
> Jul 22 09:44:17 dommy0 kernel: [  360.121581] R10: 0000000000000000 
> R11: 0000000000000027 R12: ffff880222343c00
> Jul 22 09:44:17 dommy0 kernel: [  360.121582] R13: 0000000000000101 
> R14: 0000000000000000 R15: 0000000000000027
> Jul 22 09:44:17 dommy0 kernel: [  360.121584] FS: 
> 0000000000000000(0000) GS:ffff88022f260000(0000) knlGS:0000000000000000
> Jul 22 09:44:17 dommy0 kernel: [  360.121585] CS:  0010 DS: 0000 ES: 
> 0000 CR0: 000000008005003b
> Jul 22 09:44:17 dommy0 kernel: [  360.121587] CR2: 00007fb1e9884095 
> CR3: 000000021c084000 CR4: 00000000000007e0
> Jul 22 09:44:17 dommy0 kernel: [  360.121588] Stack:
> Jul 22 09:44:17 dommy0 kernel: [  360.121589]  ffff88022215cdf0 
> ffffffff8149289e ffffffff81a8aa30 ffffffff81a8aa38
> Jul 22 09:44:17 dommy0 kernel: [  360.121592]  0000000000000286 
> ffff88022215ce88 ffffffff8149287f 0000000000000394
> Jul 22 09:44:17 dommy0 kernel: [  360.121594]  ffffffff81a87100 
> 0000000000000001 000000000000007c 0000000000000000
> Jul 22 09:44:17 dommy0 kernel: [  360.121596] Call Trace:
> Jul 22 09:44:17 dommy0 kernel: [  360.121600] [<ffffffff8149289e>] ? 
> inet_evict_bucket+0x11e/0x140
> Jul 22 09:44:17 dommy0 kernel: [  360.121602] [<ffffffff8149287f>] ? 
> inet_evict_bucket+0xff/0x140
> Jul 22 09:44:17 dommy0 kernel: [  360.121605] [<ffffffff814929b0>] ? 
> inet_frag_worker+0x60/0x210
> Jul 22 09:44:17 dommy0 kernel: [  360.121609] [<ffffffff8107e3a2>] ? 
> process_one_work+0x142/0x3b0
> Jul 22 09:44:17 dommy0 kernel: [  360.121612] [<ffffffff815078ed>] ? 
> schedule+0x1d/0x70
> Jul 22 09:44:17 dommy0 kernel: [  360.121614] [<ffffffff8107eb94>] ? 
> worker_thread+0x114/0x440
> Jul 22 09:44:17 dommy0 kernel: [  360.121617] [<ffffffff815073ad>] ? 
> __schedule+0x2cd/0x7b0
> Jul 22 09:44:17 dommy0 kernel: [  360.121619] [<ffffffff8107ea80>] ? 
> create_worker+0x1a0/0x1a0
> Jul 22 09:44:17 dommy0 kernel: [  360.121622] [<ffffffff81083dfc>] ? 
> kthread+0xbc/0xe0
> Jul 22 09:44:17 dommy0 kernel: [  360.121624] [<ffffffff81083d40>] ? 
> kthread_create_on_node+0x1c0/0x1c0
> Jul 22 09:44:17 dommy0 kernel: [  360.121627] [<ffffffff8150b218>] ? 
> ret_from_fork+0x58/0x90
> Jul 22 09:44:17 dommy0 kernel: [  360.121629] [<ffffffff81083d40>] ? 
> kthread_create_on_node+0x1c0/0x1c0
> Jul 22 09:44:17 dommy0 kernel: [  360.121631] Code: 75 29 be 3c 04 00 
> 00 48 c7 c7 0c 73 71 81 e8 26 72 fa ff 48 89 df e8 6e ff ff ff 85 c0 
> 79 18 66 2e 0f 1f 84 00 00 00 00 00 f3 90 <48> 89 df e8 56 ff ff ff 85 
> c0 78 f2 5b 90 c3 66 66 66 66 66 66
>
> Jul 22 09:44:27 dommy0 kernel: [  370.097476] Task dump for CPU 3:
> Jul 22 09:44:27 dommy0 kernel: [  370.097478] kworker/3:1     R 
> running task        0    42      2 0x00000008
> Jul 22 09:44:27 dommy0 kernel: [  370.097482] Workqueue: events 
> inet_frag_worker
> Jul 22 09:44:27 dommy0 kernel: [  370.097483]  0000000000000004 
> ffffffff81849240 ffffffff810b9464 00000000000003dc
> Jul 22 09:44:27 dommy0 kernel: [  370.097485]  ffff88022f26d4c0 
> ffffffff81849180 ffffffff81849240 ffffffff818b4e40
> Jul 22 09:44:27 dommy0 kernel: [  370.097488]  ffffffff810bc797 
> 0000000000000000 ffffffff810c6dc9 0000000000000092
> Jul 22 09:44:27 dommy0 kernel: [  370.097490] Call Trace:
> Jul 22 09:44:27 dommy0 kernel: [  370.097491]  <IRQ> 
> [<ffffffff810b9464>] ? rcu_dump_cpu_stacks+0x84/0xc0
> Jul 22 09:44:27 dommy0 kernel: [  370.097499] [<ffffffff810bc797>] ? 
> rcu_check_callbacks+0x407/0x650
> Jul 22 09:44:27 dommy0 kernel: [  370.097501] [<ffffffff810c6dc9>] ? 
> timekeeping_update.constprop.8+0x89/0x1b0
> Jul 22 09:44:27 dommy0 kernel: [  370.097504] [<ffffffff810c7ec5>] ? 
> update_wall_time+0x225/0x5c0
> Jul 22 09:44:27 dommy0 kernel: [  370.097507] [<ffffffff810cfcb0>] ? 
> tick_sched_do_timer+0x30/0x30
> Jul 22 09:44:27 dommy0 kernel: [  370.097510] [<ffffffff810c14df>] ? 
> update_process_times+0x3f/0x80
> Jul 22 09:44:27 dommy0 kernel: [  370.097513] [<ffffffff810cfb27>] ? 
> tick_sched_handle.isra.12+0x27/0x70
> Jul 22 09:44:27 dommy0 kernel: [  370.097515] [<ffffffff810cfcf5>] ? 
> tick_sched_timer+0x45/0x80
> Jul 22 09:44:27 dommy0 kernel: [  370.097518] [<ffffffff810c1d76>] ? 
> __run_hrtimer+0x66/0x1b0
> Jul 22 09:44:27 dommy0 kernel: [  370.097522] [<ffffffff8101c5c5>] ? 
> read_tsc+0x5/0x10
> Jul 22 09:44:27 dommy0 kernel: [  370.097524] [<ffffffff810c2519>] ? 
> hrtimer_interrupt+0xf9/0x230
> Jul 22 09:44:27 dommy0 kernel: [  370.097528] [<ffffffff81046d86>] ? 
> smp_apic_timer_interrupt+0x36/0x50
> Jul 22 09:44:27 dommy0 kernel: [  370.097531] [<ffffffff8150c0bd>] ? 
> apic_timer_interrupt+0x6d/0x80
> Jul 22 09:44:27 dommy0 kernel: [  370.097532]  <EOI> 
> [<ffffffff8150ad89>] ? _raw_spin_lock+0x9/0x30
> Jul 22 09:44:27 dommy0 kernel: [  370.097537] [<ffffffff814927bb>] ? 
> inet_evict_bucket+0x3b/0x140
> Jul 22 09:44:27 dommy0 kernel: [  370.097539] [<ffffffff8149287f>] ? 
> inet_evict_bucket+0xff/0x140
> Jul 22 09:44:27 dommy0 kernel: [  370.097542] [<ffffffff814929b0>] ? 
> inet_frag_worker+0x60/0x210
> Jul 22 09:44:27 dommy0 kernel: [  370.097545] [<ffffffff8107e3a2>] ? 
> process_one_work+0x142/0x3b0
> Jul 22 09:44:27 dommy0 kernel: [  370.097547] [<ffffffff815078ed>] ? 
> schedule+0x1d/0x70
> Jul 22 09:44:27 dommy0 kernel: [  370.097550] [<ffffffff8107eb94>] ? 
> worker_thread+0x114/0x440
> Jul 22 09:44:27 dommy0 kernel: [  370.097552] [<ffffffff815073ad>] ? 
> __schedule+0x2cd/0x7b0
> Jul 22 09:44:27 dommy0 kernel: [  370.097554] [<ffffffff8107ea80>] ? 
> create_worker+0x1a0/0x1a0
> Jul 22 09:44:27 dommy0 kernel: [  370.097557] [<ffffffff81083dfc>] ? 
> kthread+0xbc/0xe0
> Jul 22 09:44:27 dommy0 kernel: [  370.097559] [<ffffffff81083d40>] ? 
> kthread_create_on_node+0x1c0/0x1c0
> Jul 22 09:44:27 dommy0 kernel: [  370.097562] [<ffffffff8150b218>] ? 
> ret_from_fork+0x58/0x90
> Jul 22 09:44:27 dommy0 kernel: [  370.097564] [<ffffffff81083d40>] ? 
> kthread_create_on_node+0x1c0/0x1c0
>
> Jul 22 09:44:53 dommy0 kernel: [  396.106303] Modules linked in: 
> parport_pc ppdev lp parport bnep rfcomm bluetooth rfkill uinput nfsd 
> auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop 
> coretemp kvm ttm drm_kms_helper iTCO_wdt drm psmouse ipmi_si 
> iTCO_vendor_support tpm_tis tpm ipmi_msghandler i2c_algo_bit i2c_core 
> i7core_edac dcdbas serio_raw pcspkr wmi lpc_ich edac_core mfd_core 
> evdev button acpi_power_meter processor thermal_sys ext4 crc16 mbcache 
> jbd2 sd_mod sg sr_mod cdrom hid_generic usbhid ata_generic hid 
> crc32c_intel ata_piix mptsas scsi_transport_sas mptscsih libata 
> mptbase ehci_pci scsi_mod uhci_hcd ehci_hcd usbcore usb_common ixgbe 
> dca ptp bnx2 pps_core mdio
> Jul 22 09:44:53 dommy0 kernel: [  396.106347] CPU: 3 PID: 42 Comm: 
> kworker/3:1 Tainted: G        W    L 3.18.18-transip-1.6 #1
> Jul 22 09:44:53 dommy0 kernel: [  396.106348] Hardware name: Dell Inc. 
> PowerEdge R410/01V648, BIOS 1.12.0 07/30/2013
> Jul 22 09:44:53 dommy0 kernel: [  396.106353] Workqueue: events 
> inet_frag_worker
> Jul 22 09:44:53 dommy0 kernel: [  396.106355] task: ffff880224574490 
> ti: ffff8802240a0000 task.ti: ffff8802240a0000
> Jul 22 09:44:53 dommy0 kernel: [  396.106356] RIP: 
> 0010:[<ffffffff8149288d>]  [<ffffffff8149288d>] 
> inet_evict_bucket+0x10d/0x140
> Jul 22 09:44:53 dommy0 kernel: [  396.106359] RSP: 
> 0018:ffff8802240a3d58  EFLAGS: 00000206
> Jul 22 09:44:53 dommy0 kernel: [  396.106361] RAX: 0000000000000000 
> RBX: 0000000000000286 RCX: 0000000000000000
> Jul 22 09:44:53 dommy0 kernel: [  396.106362] RDX: ffff88022215ce40 
> RSI: 0000000000300000 RDI: ffff88022215cdf0
> Jul 22 09:44:53 dommy0 kernel: [  396.106364] RBP: 0000000000000003 
> R08: ffff880222343c00 R09: 0000000000000101
> Jul 22 09:44:53 dommy0 kernel: [  396.106365] R10: 0000000000000000 
> R11: 0000000000000027 R12: 0000000000000000
> Jul 22 09:44:53 dommy0 kernel: [  396.106366] R13: 0000000000000000 
> R14: ffff880222343c00 R15: 0000000000000101
> Jul 22 09:44:53 dommy0 kernel: [  396.106368] FS: 
> 0000000000000000(0000) GS:ffff88022f260000(0000) knlGS:0000000000000000
> Jul 22 09:44:53 dommy0 kernel: [  396.106370] CS:  0010 DS: 0000 ES: 
> 0000 CR0: 000000008005003b
> Jul 22 09:44:53 dommy0 kernel: [  396.106371] CR2: 00007fb1e9884095 
> CR3: 000000021c084000 CR4: 00000000000007e0
> Jul 22 09:44:53 dommy0 kernel: [  396.106372] Stack:
> Jul 22 09:44:53 dommy0 kernel: [  396.106373]  ffffffff81a8aa30 
> ffffffff81a8aa38 0000000000000286 ffff88022215ce88
> Jul 22 09:44:53 dommy0 kernel: [  396.106376]  ffffffff8149287f 
> 0000000000000394 ffffffff81a87100 0000000000000001
> Jul 22 09:44:53 dommy0 kernel: [  396.106378]  000000000000007c 
> 0000000000000000 00000000000000c0 ffffffff814929b0
> Jul 22 09:44:53 dommy0 kernel: [  396.106380] Call Trace:
> Jul 22 09:44:53 dommy0 kernel: [  396.106383] [<ffffffff8149287f>] ? 
> inet_evict_bucket+0xff/0x140
> Jul 22 09:44:53 dommy0 kernel: [  396.106386] [<ffffffff814929b0>] ? 
> inet_frag_worker+0x60/0x210
> Jul 22 09:44:53 dommy0 kernel: [  396.106390] [<ffffffff8107e3a2>] ? 
> process_one_work+0x142/0x3b0
> Jul 22 09:44:53 dommy0 kernel: [  396.106393] [<ffffffff815078ed>] ? 
> schedule+0x1d/0x70
> Jul 22 09:44:53 dommy0 kernel: [  396.106396] [<ffffffff8107eb94>] ? 
> worker_thread+0x114/0x440
> Jul 22 09:44:53 dommy0 kernel: [  396.106398] [<ffffffff815073ad>] ? 
> __schedule+0x2cd/0x7b0
> Jul 22 09:44:53 dommy0 kernel: [  396.106401] [<ffffffff8107ea80>] ? 
> create_worker+0x1a0/0x1a0
> Jul 22 09:44:53 dommy0 kernel: [  396.106403] [<ffffffff81083dfc>] ? 
> kthread+0xbc/0xe0
> Jul 22 09:44:53 dommy0 kernel: [  396.106406] [<ffffffff81083d40>] ? 
> kthread_create_on_node+0x1c0/0x1c0
> Jul 22 09:44:53 dommy0 kernel: [  396.106409] [<ffffffff8150b218>] ? 
> ret_from_fork+0x58/0x90
> Jul 22 09:44:53 dommy0 kernel: [  396.106411] [<ffffffff81083d40>] ? 
> kthread_create_on_node+0x1c0/0x1c0
> Jul 22 09:44:53 dommy0 kernel: [  396.106412] Code: a0 00 00 00 41 ff 
> 94 24 70 40 00 00 48 85 db 75 e5 48 83 c4 28 89 e8 5b 5d 41 5c 41 5d 
> 41 5e 41 5f c3 0f 1f 40 00 f0 41 ff 47 68 <48> 8b 44 24 08 66 83 00 01 
> 48 89 df e8 92 df c2 ff f0 41 ff 4f
>
> Jul 22 09:45:21 dommy0 kernel: [  424.094444] Modules linked in: 
> parport_pc ppdev lp parport bnep rfcomm bluetooth rfkill uinput nfsd 
> auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop 
> coretemp kvm ttm drm_kms_helper iTCO_wdt drm psmouse ipmi_si 
> iTCO_vendor_support tpm_tis tpm ipmi_msghandler i2c_algo_bit i2c_core 
> i7core_edac dcdbas serio_raw pcspkr wmi lpc_ich edac_core mfd_core 
> evdev button acpi_power_meter processor thermal_sys ext4 crc16 mbcache 
> jbd2 sd_mod sg sr_mod cdrom hid_generic usbhid ata_generic hid 
> crc32c_intel ata_piix mptsas scsi_transport_sas mptscsih libata 
> mptbase ehci_pci scsi_mod uhci_hcd ehci_hcd usbcore usb_common ixgbe 
> dca ptp bnx2 pps_core mdio
> Jul 22 09:45:21 dommy0 kernel: [  424.094487] CPU: 3 PID: 42 Comm: 
> kworker/3:1 Tainted: G        W    L 3.18.18-transip-1.6 #1
> Jul 22 09:45:21 dommy0 kernel: [  424.094488] Hardware name: Dell Inc. 
> PowerEdge R410/01V648, BIOS 1.12.0 07/30/2013
> Jul 22 09:45:21 dommy0 kernel: [  424.094492] Workqueue: events 
> inet_frag_worker
> Jul 22 09:45:21 dommy0 kernel: [  424.094494] task: ffff880224574490 
> ti: ffff8802240a0000 task.ti: ffff8802240a0000
> Jul 22 09:45:21 dommy0 kernel: [  424.094495] RIP: 
> 0010:[<ffffffff810c08ac>]  [<ffffffff810c08ac>] del_timer+0x1c/0x70
> Jul 22 09:45:21 dommy0 kernel: [  424.094500] RSP: 
> 0018:ffff8802240a3d28  EFLAGS: 00000246
> Jul 22 09:45:21 dommy0 kernel: [  424.094502] RAX: ffffffff81895380 
> RBX: 0000000000000000 RCX: 0000000000000000
> Jul 22 09:45:21 dommy0 kernel: [  424.094503] RDX: ffff88022215ce40 
> RSI: 0000000000300000 RDI: ffff88022215cdf0
> Jul 22 09:45:21 dommy0 kernel: [  424.094505] RBP: 0000000000000000 
> R08: ffff880222343c00 R09: 0000000000000101
> Jul 22 09:45:21 dommy0 kernel: [  424.094506] R10: 0000000000000000 
> R11: 0000000000000027 R12: 0000000000000000
> Jul 22 09:45:21 dommy0 kernel: [  424.094507] R13: ffff8802245a8000 
> R14: ffff880222343c00 R15: 0000000000000101
> Jul 22 09:45:21 dommy0 kernel: [  424.094509] FS: 
> 0000000000000000(0000) GS:ffff88022f260000(0000) knlGS:0000000000000000
> Jul 22 09:45:21 dommy0 kernel: [  424.094511] CS:  0010 DS: 0000 ES: 
> 0000 CR0: 000000008005003b
> Jul 22 09:45:21 dommy0 kernel: [  424.094512] CR2: 00007fb1e9884095 
> CR3: 000000021c084000 CR4: 00000000000007e0
> Jul 22 09:45:21 dommy0 kernel: [  424.094513] Stack:
> Jul 22 09:45:21 dommy0 kernel: [  424.094514]  0000000000000296 
> ffff88022215cdf0 ffff88022215cdf0 0000000000000003
> Jul 22 09:45:21 dommy0 kernel: [  424.094517]  ffffffff81a87100 
> ffffffff814927f7 ffffffff81a8aa30 ffffffff81a8aa38
> Jul 22 09:45:21 dommy0 kernel: [  424.094519]  0000000000000286 
> ffff88022215ce88 ffffffff8149287f 0000000000000394
> Jul 22 09:45:21 dommy0 kernel: [  424.094521] Call Trace:
> Jul 22 09:45:21 dommy0 kernel: [  424.094524] [<ffffffff814927f7>] ? 
> inet_evict_bucket+0x77/0x140
> Jul 22 09:45:21 dommy0 kernel: [  424.094527] [<ffffffff8149287f>] ? 
> inet_evict_bucket+0xff/0x140
> Jul 22 09:45:21 dommy0 kernel: [  424.094529] [<ffffffff814929b0>] ? 
> inet_frag_worker+0x60/0x210
> Jul 22 09:45:21 dommy0 kernel: [  424.094533] [<ffffffff8107e3a2>] ? 
> process_one_work+0x142/0x3b0
> Jul 22 09:45:21 dommy0 kernel: [  424.094536] [<ffffffff815078ed>] ? 
> schedule+0x1d/0x70
> Jul 22 09:45:21 dommy0 kernel: [  424.094539] [<ffffffff8107eb94>] ? 
> worker_thread+0x114/0x440
> Jul 22 09:45:21 dommy0 kernel: [  424.094541] [<ffffffff815073ad>] ? 
> __schedule+0x2cd/0x7b0
> Jul 22 09:45:21 dommy0 kernel: [  424.094544] [<ffffffff8107ea80>] ? 
> create_worker+0x1a0/0x1a0
> Jul 22 09:45:21 dommy0 kernel: [  424.094546] [<ffffffff81083dfc>] ? 
> kthread+0xbc/0xe0
> Jul 22 09:45:21 dommy0 kernel: [  424.094549] [<ffffffff81083d40>] ? 
> kthread_create_on_node+0x1c0/0x1c0
> Jul 22 09:45:21 dommy0 kernel: [  424.094552] [<ffffffff8150b218>] ? 
> ret_from_fork+0x58/0x90
> Jul 22 09:45:21 dommy0 kernel: [  424.094554] [<ffffffff81083d40>] ? 
> kthread_create_on_node+0x1c0/0x1c0
> Jul 22 09:45:21 dommy0 kernel: [  424.094555] Code: 66 66 66 66 66 66 
> 2e 0f 1f 84 00 00 00 00 00 48 83 ec 28 48 89 5c 24 10 48 89 6c 24 18 
> 31 ed 4c 89 64 24 20 48 83 3f 00 48 89 fb <48> c7 47 38 00 00 00 00 74 
> 30 48 8d 7f 18 48 8d 74 24 08 e8 0c
>

-- 

TransIP BV

Schipholweg 11E
2316XB Leiden
E: fschreuder@transip.nl
I: https://www.transip.nl

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-22  8:17                     ` Frank Schreuder
@ 2015-07-22  9:11                       ` Nikolay Aleksandrov
  2015-07-22 10:55                         ` Frank Schreuder
  2015-07-22 13:58                         ` Florian Westphal
  0 siblings, 2 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2015-07-22  9:11 UTC (permalink / raw)
  To: Frank Schreuder, Florian Westphal
  Cc: Johan Schuijt, Eric Dumazet, nikolay, davem, chutzpah,
	Robin Geuze, netdev

On 07/22/2015 10:17 AM, Frank Schreuder wrote:
> I got some additional information from syslog:
> 
> Jul 22 09:49:33 dommy0 kernel: [  675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:42]
> Jul 22 09:49:42 dommy0 kernel: [  685.114033] INFO: rcu_sched self-detected stall on CPU { 3}  (t=39918 jiffies g=988 c=987 q=23168)
> 
> Thanks,
> Frank
> 
> 

Hi,
It looks like it's happening because of the evict_again logic, I think we should also
add Florian's first suggestion about simplifying it to the patch and just skip the
entry if we can't delete its timer otherwise we can restart the eviction and see
entries that already had their timer stopped by us and can keep restarting for
a long time.
Here's an updated patch that removes the evict_again logic.


diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index e1300b3dd597..56a3a5685f76 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -45,6 +45,7 @@ enum {
  * @flags: fragment queue flags
  * @max_size: maximum received fragment size
  * @net: namespace that this frag belongs to
+ * @list_evictor: list of queues to forcefully evict (e.g. due to low memory)
  */
 struct inet_frag_queue {
 	spinlock_t		lock;
@@ -59,6 +60,7 @@ struct inet_frag_queue {
 	__u8			flags;
 	u16			max_size;
 	struct netns_frags	*net;
+	struct hlist_node	list_evictor;
 };
 
 #define INETFRAGS_HASHSZ	1024
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 5e346a082e5f..aaae37949c14 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -138,27 +138,17 @@ evict_again:
 		if (!inet_fragq_should_evict(fq))
 			continue;
 
-		if (!del_timer(&fq->timer)) {
-			/* q expiring right now thus increment its refcount so
-			 * it won't be freed under us and wait until the timer
-			 * has finished executing then destroy it
-			 */
-			atomic_inc(&fq->refcnt);
-			spin_unlock(&hb->chain_lock);
-			del_timer_sync(&fq->timer);
-			inet_frag_put(fq, f);
-			goto evict_again;
-		}
+		if (!del_timer(&fq->timer))
+			continue;
 
 		fq->flags |= INET_FRAG_EVICTED;
-		hlist_del(&fq->list);
-		hlist_add_head(&fq->list, &expired);
+		hlist_add_head(&fq->list_evictor, &expired);
 		++evicted;
 	}
 
 	spin_unlock(&hb->chain_lock);
 
-	hlist_for_each_entry_safe(fq, n, &expired, list)
+	hlist_for_each_entry_safe(fq, n, &expired, list_evictor)
 		f->frag_expire((unsigned long) fq);
 
 	return evicted;
@@ -284,8 +274,7 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
 	struct inet_frag_bucket *hb;
 
 	hb = get_frag_bucket_locked(fq, f);
-	if (!(fq->flags & INET_FRAG_EVICTED))
-		hlist_del(&fq->list);
+	hlist_del(&fq->list);
 	spin_unlock(&hb->chain_lock);
 }
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-22  9:11                       ` Nikolay Aleksandrov
@ 2015-07-22 10:55                         ` Frank Schreuder
  2015-07-22 13:58                         ` Florian Westphal
  1 sibling, 0 replies; 20+ messages in thread
From: Frank Schreuder @ 2015-07-22 10:55 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Florian Westphal
  Cc: Johan Schuijt, Eric Dumazet, nikolay, davem, chutzpah,
	Robin Geuze, netdev

Hi Nikolay,

Thanks for this patch. I'm no longer able to reproduce this panic on our 
test environment!
The server has been handling >120k fragmented UDP packets per second for 
over 40 minutes
So far everything is running stable without stacktraces in the logs. All 
other panics happened within 5-10 minutes.

I will let this test environment run for another day or 2. I will inform 
you as soon as something happens!

Thanks,
Frank



Op 7/22/2015 om 11:11 AM schreef Nikolay Aleksandrov:
> On 07/22/2015 10:17 AM, Frank Schreuder wrote:
>> I got some additional information from syslog:
>>
>> Jul 22 09:49:33 dommy0 kernel: [  675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:42]
>> Jul 22 09:49:42 dommy0 kernel: [  685.114033] INFO: rcu_sched self-detected stall on CPU { 3}  (t=39918 jiffies g=988 c=987 q=23168)
>>
>> Thanks,
>> Frank
>>
>>
> Hi,
> It looks like it's happening because of the evict_again logic, I think we should also
> add Florian's first suggestion about simplifying it to the patch and just skip the
> entry if we can't delete its timer otherwise we can restart the eviction and see
> entries that already had their timer stopped by us and can keep restarting for
> a long time.
> Here's an updated patch that removes the evict_again logic.
>
>
> diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
> index e1300b3dd597..56a3a5685f76 100644
> --- a/include/net/inet_frag.h
> +++ b/include/net/inet_frag.h
> @@ -45,6 +45,7 @@ enum {
>    * @flags: fragment queue flags
>    * @max_size: maximum received fragment size
>    * @net: namespace that this frag belongs to
> + * @list_evictor: list of queues to forcefully evict (e.g. due to low memory)
>    */
>   struct inet_frag_queue {
>   	spinlock_t		lock;
> @@ -59,6 +60,7 @@ struct inet_frag_queue {
>   	__u8			flags;
>   	u16			max_size;
>   	struct netns_frags	*net;
> +	struct hlist_node	list_evictor;
>   };
>   
>   #define INETFRAGS_HASHSZ	1024
> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> index 5e346a082e5f..aaae37949c14 100644
> --- a/net/ipv4/inet_fragment.c
> +++ b/net/ipv4/inet_fragment.c
> @@ -138,27 +138,17 @@ evict_again:
>   		if (!inet_fragq_should_evict(fq))
>   			continue;
>   
> -		if (!del_timer(&fq->timer)) {
> -			/* q expiring right now thus increment its refcount so
> -			 * it won't be freed under us and wait until the timer
> -			 * has finished executing then destroy it
> -			 */
> -			atomic_inc(&fq->refcnt);
> -			spin_unlock(&hb->chain_lock);
> -			del_timer_sync(&fq->timer);
> -			inet_frag_put(fq, f);
> -			goto evict_again;
> -		}
> +		if (!del_timer(&fq->timer))
> +			continue;
>   
>   		fq->flags |= INET_FRAG_EVICTED;
> -		hlist_del(&fq->list);
> -		hlist_add_head(&fq->list, &expired);
> +		hlist_add_head(&fq->list_evictor, &expired);
>   		++evicted;
>   	}
>   
>   	spin_unlock(&hb->chain_lock);
>   
> -	hlist_for_each_entry_safe(fq, n, &expired, list)
> +	hlist_for_each_entry_safe(fq, n, &expired, list_evictor)
>   		f->frag_expire((unsigned long) fq);
>   
>   	return evicted;
> @@ -284,8 +274,7 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
>   	struct inet_frag_bucket *hb;
>   
>   	hb = get_frag_bucket_locked(fq, f);
> -	if (!(fq->flags & INET_FRAG_EVICTED))
> -		hlist_del(&fq->list);
> +	hlist_del(&fq->list);
>   	spin_unlock(&hb->chain_lock);
>   }
>   
>
>

-- 

TransIP BV

Schipholweg 11E
2316XB Leiden
E: fschreuder@transip.nl
I: https://www.transip.nl

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-22  9:11                       ` Nikolay Aleksandrov
  2015-07-22 10:55                         ` Frank Schreuder
@ 2015-07-22 13:58                         ` Florian Westphal
  2015-07-22 14:03                           ` Nikolay Aleksandrov
  1 sibling, 1 reply; 20+ messages in thread
From: Florian Westphal @ 2015-07-22 13:58 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Frank Schreuder, Florian Westphal, Johan Schuijt, Eric Dumazet,
	nikolay, davem, chutzpah, Robin Geuze, netdev

Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
> On 07/22/2015 10:17 AM, Frank Schreuder wrote:
> > I got some additional information from syslog:
> > 
> > Jul 22 09:49:33 dommy0 kernel: [  675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:42]
> > Jul 22 09:49:42 dommy0 kernel: [  685.114033] INFO: rcu_sched self-detected stall on CPU { 3}  (t=39918 jiffies g=988 c=987 q=23168)
> > 
> > Thanks,
> > Frank
> > 
> > 
> 
> Hi,
> It looks like it's happening because of the evict_again logic, I think we should also
> add Florian's first suggestion about simplifying it to the patch and just skip the
> entry if we can't delete its timer otherwise we can restart the eviction and see
> entries that already had their timer stopped by us and can keep restarting for
> a long time.
> Here's an updated patch that removes the evict_again logic.

Thanks Nik.  I'm afraid this adds bug when netns is exiting.

Currently, we wait until timer has finished, but after the change
we might destroy percpu counter while a timer is still executing on
another cpu.

I pushed a patch series to
https://git.breakpoint.cc/cgit/fw/net.git/log/?h=inetfrag_fixes_02

It includes this patch with a small change -- deferral of the percpu
counter subtraction until after queue has been free'd.

Frank -- it would be great if you could test with the four patches in
that series applied.

I'll then add your tested-by Tag to all of them before submitting this.

Thanks again for all your help in getting this fixed!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-22 13:58                         ` Florian Westphal
@ 2015-07-22 14:03                           ` Nikolay Aleksandrov
  2015-07-22 14:14                             ` Nikolay Aleksandrov
  0 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2015-07-22 14:03 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Frank Schreuder, Johan Schuijt, Eric Dumazet, nikolay, davem,
	chutzpah, Robin Geuze, netdev

On 07/22/2015 03:58 PM, Florian Westphal wrote:
> Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>> On 07/22/2015 10:17 AM, Frank Schreuder wrote:
>>> I got some additional information from syslog:
>>>
>>> Jul 22 09:49:33 dommy0 kernel: [  675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:42]
>>> Jul 22 09:49:42 dommy0 kernel: [  685.114033] INFO: rcu_sched self-detected stall on CPU { 3}  (t=39918 jiffies g=988 c=987 q=23168)
>>>
>>> Thanks,
>>> Frank
>>>
>>>
>>
>> Hi,
>> It looks like it's happening because of the evict_again logic, I think we should also
>> add Florian's first suggestion about simplifying it to the patch and just skip the
>> entry if we can't delete its timer otherwise we can restart the eviction and see
>> entries that already had their timer stopped by us and can keep restarting for
>> a long time.
>> Here's an updated patch that removes the evict_again logic.
> 
> Thanks Nik.  I'm afraid this adds bug when netns is exiting.
> 
> Currently, we wait until timer has finished, but after the change
> we might destroy percpu counter while a timer is still executing on
> another cpu.
> 
> I pushed a patch series to
> https://git.breakpoint.cc/cgit/fw/net.git/log/?h=inetfrag_fixes_02
> 
> It includes this patch with a small change -- deferral of the percpu
> counter subtraction until after queue has been free'd.
> 
> Frank -- it would be great if you could test with the four patches in
> that series applied.
> 
> I'll then add your tested-by Tag to all of them before submitting this.
> 
> Thanks again for all your help in getting this fixed!
> 

Sure, I didn't think it through, just supplied it for the test. :-)
Thanks for fixing it up!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-22 14:03                           ` Nikolay Aleksandrov
@ 2015-07-22 14:14                             ` Nikolay Aleksandrov
  2015-07-22 15:31                               ` Frank Schreuder
  0 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2015-07-22 14:14 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Frank Schreuder, Johan Schuijt, Eric Dumazet, nikolay, davem,
	chutzpah, Robin Geuze, netdev

On 07/22/2015 04:03 PM, Nikolay Aleksandrov wrote:
> On 07/22/2015 03:58 PM, Florian Westphal wrote:
>> Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>>> On 07/22/2015 10:17 AM, Frank Schreuder wrote:
>>>> I got some additional information from syslog:
>>>>
>>>> Jul 22 09:49:33 dommy0 kernel: [  675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:42]
>>>> Jul 22 09:49:42 dommy0 kernel: [  685.114033] INFO: rcu_sched self-detected stall on CPU { 3}  (t=39918 jiffies g=988 c=987 q=23168)
>>>>
>>>> Thanks,
>>>> Frank
>>>>
>>>>
>>>
>>> Hi,
>>> It looks like it's happening because of the evict_again logic, I think we should also
>>> add Florian's first suggestion about simplifying it to the patch and just skip the
>>> entry if we can't delete its timer otherwise we can restart the eviction and see
>>> entries that already had their timer stopped by us and can keep restarting for
>>> a long time.
>>> Here's an updated patch that removes the evict_again logic.
>>
>> Thanks Nik.  I'm afraid this adds bug when netns is exiting.
>>
>> Currently, we wait until timer has finished, but after the change
>> we might destroy percpu counter while a timer is still executing on
>> another cpu.
>>
>> I pushed a patch series to
>> https://git.breakpoint.cc/cgit/fw/net.git/log/?h=inetfrag_fixes_02
>>
>> It includes this patch with a small change -- deferral of the percpu
>> counter subtraction until after queue has been free'd.
>>
>> Frank -- it would be great if you could test with the four patches in
>> that series applied.
>>
>> I'll then add your tested-by Tag to all of them before submitting this.
>>
>> Thanks again for all your help in getting this fixed!
>>
> 
> Sure, I didn't think it through, just supplied it for the test. :-)
> Thanks for fixing it up!
> 

Patches look great, even the INET_FRAG_EVICTED flag will not be accidentally cleared 
this way. I'll give them a try.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reproducable panic eviction work queue
  2015-07-22 14:14                             ` Nikolay Aleksandrov
@ 2015-07-22 15:31                               ` Frank Schreuder
  0 siblings, 0 replies; 20+ messages in thread
From: Frank Schreuder @ 2015-07-22 15:31 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Florian Westphal
  Cc: Johan Schuijt, Eric Dumazet, nikolay, davem, chutzpah,
	Robin Geuze, netdev


Op 7/22/2015 om 4:14 PM schreef Nikolay Aleksandrov:
> On 07/22/2015 04:03 PM, Nikolay Aleksandrov wrote:
>> On 07/22/2015 03:58 PM, Florian Westphal wrote:
>>> Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>>>> On 07/22/2015 10:17 AM, Frank Schreuder wrote:
>>>>> I got some additional information from syslog:
>>>>>
>>>>> Jul 22 09:49:33 dommy0 kernel: [  675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:42]
>>>>> Jul 22 09:49:42 dommy0 kernel: [  685.114033] INFO: rcu_sched self-detected stall on CPU { 3}  (t=39918 jiffies g=988 c=987 q=23168)
>>>>>
>>>>> Thanks,
>>>>> Frank
>>>>>
>>>>>
>>>> Hi,
>>>> It looks like it's happening because of the evict_again logic, I think we should also
>>>> add Florian's first suggestion about simplifying it to the patch and just skip the
>>>> entry if we can't delete its timer otherwise we can restart the eviction and see
>>>> entries that already had their timer stopped by us and can keep restarting for
>>>> a long time.
>>>> Here's an updated patch that removes the evict_again logic.
>>> Thanks Nik.  I'm afraid this adds bug when netns is exiting.
>>>
>>> Currently, we wait until timer has finished, but after the change
>>> we might destroy percpu counter while a timer is still executing on
>>> another cpu.
>>>
>>> I pushed a patch series to
>>> https://git.breakpoint.cc/cgit/fw/net.git/log/?h=inetfrag_fixes_02
>>>
>>> It includes this patch with a small change -- deferral of the percpu
>>> counter subtraction until after queue has been free'd.
>>>
>>> Frank -- it would be great if you could test with the four patches in
>>> that series applied.
>>>
>>> I'll then add your tested-by Tag to all of them before submitting this.
>>>
>>> Thanks again for all your help in getting this fixed!
>>>
>> Sure, I didn't think it through, just supplied it for the test. :-)
>> Thanks for fixing it up!
>>
> Patches look great, even the INET_FRAG_EVICTED flag will not be accidentally cleared
> this way. I'll give them a try.
>
>

Hi,

I'm currently building a new kernel bases on 3.18.19 + patches.
One of the patches however fails to apply as we dont have a 
"net/ieee802154/6lowpan/" directory.
Modifying the patch to use "net/ieee802154/reassembly.c" does work 
without problems.
Is this a due to the different kernel version or something else?

I'll come back to you as soon as I have my first test results.

Thanks,
Frank

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-07-22 15:31 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <F8D94413-90A2-4F80-AAA2-7A6AB57DF314@transip.nl>
2015-07-18  8:56 ` reproducable panic eviction work queue Eric Dumazet
2015-07-18  9:01   ` Johan Schuijt
2015-07-18 10:02     ` Nikolay Aleksandrov
2015-07-18 13:31       ` Nikolay Aleksandrov
2015-07-18 15:28       ` Johan Schuijt
2015-07-18 15:30         ` Johan Schuijt
2015-07-18 15:32         ` Nikolay Aleksandrov
2015-07-20 12:47           ` Frank Schreuder
2015-07-20 14:02             ` Nikolay Aleksandrov
2015-07-20 14:30             ` Florian Westphal
2015-07-21 11:50               ` Frank Schreuder
2015-07-21 18:34                 ` Florian Westphal
2015-07-22  8:09                   ` Frank Schreuder
2015-07-22  8:17                     ` Frank Schreuder
2015-07-22  9:11                       ` Nikolay Aleksandrov
2015-07-22 10:55                         ` Frank Schreuder
2015-07-22 13:58                         ` Florian Westphal
2015-07-22 14:03                           ` Nikolay Aleksandrov
2015-07-22 14:14                             ` Nikolay Aleksandrov
2015-07-22 15:31                               ` Frank Schreuder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.