* BUG: Fatal in exception in interrupt, at nf_conncount_count [regression in 4.19(.1)]
@ 2018-11-12 14:04 Bruno Prémont
2018-11-12 14:10 ` Florian Westphal
0 siblings, 1 reply; 3+ messages in thread
From: Bruno Prémont @ 2018-11-12 14:04 UTC (permalink / raw)
To: Yi-Hung Wei, Florian Westphal, Pablo Neira Ayuso
Cc: David S. Miller, netfilter-devel, coreteam, netdev
Hi,
With linux-4.19.1 I'm seeing regular kernel panics since this night
with uptime of 5 to 30 minutes in between. System is not heavily loaded.
With the following trace (transcribed):
Call Trace:
<IRQ>
nf_conncount_count+0x48c/0x4f0
? nf_ct_ext_add+0x80/0x170
connlimit_mt+0xa1/0x1a0
? ipt_do_table+0x245/0x420
ipt_do_table+0x245/0x420
nf_hook_slow+0x3e/0xb0
ip_local_deliver+0x9a/0xd0
? ip_sublist_rcv_finish+0x60/0x60
ip_rcv+0x8f/0xb0
? ip_rcv_finish_core.isra.17+0x300/0x300
__netif_receive_skb_internal+0x4d/0x70
netif_receive_skb_internal+0x3e/0xd0
napi_gro_receive+0x6a/0x80
receive_buf+0x294/0xe40
? detach_buf+0x63/0x100
virtnet_poll+0xba/0x2f0
net_rx_action+0x137/0x330
__do_softirq+0xd6/0x238
irq_exit+0xc6/0xd0
do_IRQ+0x78/0xd0
common_interrupt+0xf/xf
</IRQ>
RIP: :native_safe_halt+0x2/0x10
Code: f3 c3 65 48 8b 04 25 40 4c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74
8b eb c1 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 fb f4 <c3>
0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
RSP: 0018:ffffc90000073ec8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdc
RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffff88007db19200
RDX: ffffffff81c30638 RSI: ffff88007db19200 RDI: 0000000000000087
RBP: ffffffff81c670e8 R08: 000001b3fa8aad88 R09: ffff88007c417c00
R10: 000000010000ecef R11: 000000000000a000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
default_idle+0xc/0x20
do_idle+0x1f0/0x220
? do_idle+0x172/0x220
cpu_startup_entry+0x6a/0x70
secondary_startup_64+0xa4/0xb0
---[ end trace a4bf7eecae5cc0ae ]---
RIP: 0010rb_insert_color+0x17/0x190
Code: 4c 89 78 10 e9 72 ff ff ff 49 89 ef e9 27 ff ff ff 66 90 48 8b 17
48 85 d2 0f 84 4d 01 00 00 48 8b 02 a8 01 0f 85 6d 01 00 00 <48>
8b 48 08 49 89 c0 48 39 d1 74 53 48 85 c9 74 09 f6 01 01 0f 84
RSP: 0018:ffff88007db03a58 EFLAGS: 00010246
RAX: 930d659731af356e RBX: ffff88007db03b3c RCX: ffff88005f09c8c0
RDX: ffff8800631c4c00 RSI: ffff88007c4474b0 RDI: ffff88005f09c8a0
RBP: 0000000000000001 R08: ffff8800631c4c00 R09: ffff88005f09c8d0
R10: ffff88007db03bc8 R11: 0000000000000000 R12: ffff88007c4474b0
R13: 0000000000000002 R14: ffff88005f09c8a0 R15: ffff8800631c4c00
FS: 0000000000000000(0000) GS:ffff88007db00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f83d0291018 CR3: 000000007b036000 CR4: 00000000000406a0
Kernel panic - not syncing: Fatal exception in interrupt
That's all I can get from machine's display.
The following commits have touched nf_conncount/connlimit code:
- 33b78aaa4457ce5d531c6a06f461f8d402774cad netfilter: use PTR_ERR_OR_ZERO()
- 5c789e131cbb997a528451564ea4613e812fc718 netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search
- 34848d5c896ea1ab4e3c441b9c4fed39928ccbaf netfilter: nf_conncount: Split insert and traversal
- 2ba39118c10ae3a7d3411c073485bba9576684cd netfilter: nf_conncount: Move locking into count_tree()
- 976afca1ceba53df6f4a543014e15d1c7a962571 netfilter: nf_conncount: Early exit in nf_conncount_lookup() and cleanup
- cb2b36f5a97df76f547fcc4ab444a02522fb6c96 netfilter: nf_conncount: Switch to plain list
- 2a406e8ac7c3e7e96b94d6c0765d5a4641970446 netfilter: nf_conncount: Early exit for garbage collection
- 5cd3da4ba2397ef07226ca2aa5094ed21ff8198f Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
It looks like those locking related changes may be the cause.
Bisecting it will be hard as I don't have exact packet stream
triggering the issue and as a production system it's not ideal
to run loops of testing.
(note, system is running under QEMU at a hosting provider)
Regards,
Bruno
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: BUG: Fatal in exception in interrupt, at nf_conncount_count [regression in 4.19(.1)]
2018-11-12 14:04 BUG: Fatal in exception in interrupt, at nf_conncount_count [regression in 4.19(.1)] Bruno Prémont
@ 2018-11-12 14:10 ` Florian Westphal
2018-11-13 7:52 ` Bruno Prémont
0 siblings, 1 reply; 3+ messages in thread
From: Florian Westphal @ 2018-11-12 14:10 UTC (permalink / raw)
To: Bruno Prémont
Cc: Yi-Hung Wei, Florian Westphal, Pablo Neira Ayuso,
David S. Miller, netfilter-devel, coreteam, netdev
Bruno Prémont <bonbons@sysophe.eu> wrote:
> Hi,
>
> With linux-4.19.1 I'm seeing regular kernel panics since this night
> with uptime of 5 to 30 minutes in between. System is not heavily loaded.
[..]
> It looks like those locking related changes may be the cause.
Yes.
> Bisecting it will be hard as I don't have exact packet stream
No need. Can you give these three patches a try?
https://patchwork.ozlabs.org/project/netfilter-devel/list/?series=73972
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: BUG: Fatal in exception in interrupt, at nf_conncount_count [regression in 4.19(.1)]
2018-11-12 14:10 ` Florian Westphal
@ 2018-11-13 7:52 ` Bruno Prémont
0 siblings, 0 replies; 3+ messages in thread
From: Bruno Prémont @ 2018-11-13 7:52 UTC (permalink / raw)
To: Florian Westphal
Cc: Yi-Hung Wei, Pablo Neira Ayuso, David S. Miller, netfilter-devel,
coreteam, netdev
On Mon, 12 Nov 2018 15:10:45 +0100 Florian Westphal wrote:
> Bruno Prémont <bonbons@sysophe.eu> wrote:
> > Hi,
> >
> > With linux-4.19.1 I'm seeing regular kernel panics since this night
> > with uptime of 5 to 30 minutes in between. System is not heavily loaded.
> [..]
>
> > It looks like those locking related changes may be the cause.
>
> Yes.
>
> > Bisecting it will be hard as I don't have exact packet stream
>
> No need. Can you give these three patches a try?
>
> https://patchwork.ozlabs.org/project/netfilter-devel/list/?series=73972
I applied them yesterday evening and until now system survived
without panic or other anomaly.
If the fix is confirmed, don't forget to let the patches go to
stable 4.19.x kernels!
Thanks,
Bruno
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-11-13 7:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-12 14:04 BUG: Fatal in exception in interrupt, at nf_conncount_count [regression in 4.19(.1)] Bruno Prémont
2018-11-12 14:10 ` Florian Westphal
2018-11-13 7:52 ` Bruno Prémont
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).