All of lore.kernel.org
 help / color / mirror / Atom feed
* 4.1.0, kernel panic, pppoe_release
@ 2015-07-14 10:57 Denys Fedoryshchenko
  2015-07-17  9:24 ` Denys Fedoryshchenko
  0 siblings, 1 reply; 9+ messages in thread
From: Denys Fedoryshchenko @ 2015-07-14 10:57 UTC (permalink / raw)
  To: Netdev, mostrows

Here is panic message from netconsole. Please let me know if any 
additional information required.

Jul 14 13:49:16 10.0.252.10 [76078.867822] BUG: unable to handle kernel
Jul 14 13:49:16 10.0.252.10 NULL pointer dereference
Jul 14 13:49:16 10.0.252.10 at 00000000000003f0
Jul 14 13:49:16 10.0.252.10 [76078.868280] IP:
Jul 14 13:49:16 10.0.252.10 [<ffffffffa011e12a>] 
pppoe_release+0x56/0x142 [pppoe]
Jul 14 13:49:16 10.0.252.10 [76078.868541] PGD 336e4a067
Jul 14 13:49:16 10.0.252.10 PUD 333f17067
Jul 14 13:49:16 10.0.252.10 PMD 0
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.868918] Oops: 0000 [#1]
Jul 14 13:49:16 10.0.252.10 SMP
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.869226] Modules linked in:
Jul 14 13:49:16 10.0.252.10 netconsole
Jul 14 13:49:16 10.0.252.10 configfs
Jul 14 13:49:16 10.0.252.10 coretemp
Jul 14 13:49:16 10.0.252.10 sch_fq
Jul 14 13:49:16 10.0.252.10 cls_fw
Jul 14 13:49:16 10.0.252.10 act_police
Jul 14 13:49:16 10.0.252.10 cls_u32
Jul 14 13:49:16 10.0.252.10 sch_ingress
Jul 14 13:49:16 10.0.252.10 sch_sfq
Jul 14 13:49:16 10.0.252.10 sch_htb
Jul 14 13:49:16 10.0.252.10 pppoe
Jul 14 13:49:16 10.0.252.10 pppox
Jul 14 13:49:16 10.0.252.10 ppp_generic
Jul 14 13:49:16 10.0.252.10 slhc
Jul 14 13:49:16 10.0.252.10 nf_nat_pptp
Jul 14 13:49:16 10.0.252.10 nf_nat_proto_gre
Jul 14 13:49:16 10.0.252.10 nf_conntrack_pptp
Jul 14 13:49:16 10.0.252.10 nf_conntrack_proto_gre
Jul 14 13:49:16 10.0.252.10 tun
Jul 14 13:49:16 10.0.252.10 xt_REDIRECT
Jul 14 13:49:16 10.0.252.10 nf_nat_redirect
Jul 14 13:49:16 10.0.252.10 xt_set
Jul 14 13:49:16 10.0.252.10 xt_TCPMSS
Jul 14 13:49:16 10.0.252.10 ipt_REJECT
Jul 14 13:49:16 10.0.252.10 nf_reject_ipv4
Jul 14 13:49:16 10.0.252.10 ts_bm
Jul 14 13:49:16 10.0.252.10 xt_string
Jul 14 13:49:16 10.0.252.10 xt_connmark
Jul 14 13:49:16 10.0.252.10 xt_DSCP
Jul 14 13:49:16 10.0.252.10 xt_mark
Jul 14 13:49:16 10.0.252.10 xt_tcpudp
Jul 14 13:49:16 10.0.252.10 iptable_mangle
Jul 14 13:49:16 10.0.252.10 iptable_filter
Jul 14 13:49:16 10.0.252.10 iptable_nat
Jul 14 13:49:16 10.0.252.10 nf_conntrack_ipv4
Jul 14 13:49:16 10.0.252.10 nf_defrag_ipv4
Jul 14 13:49:16 10.0.252.10 nf_nat_ipv4
Jul 14 13:49:16 10.0.252.10 nf_nat
Jul 14 13:49:16 10.0.252.10 nf_conntrack
Jul 14 13:49:16 10.0.252.10 ip_tables
Jul 14 13:49:16 10.0.252.10 x_tables
Jul 14 13:49:16 10.0.252.10 ip_set_hash_ip
Jul 14 13:49:16 10.0.252.10 ip_set
Jul 14 13:49:16 10.0.252.10 nfnetlink
Jul 14 13:49:16 10.0.252.10 8021q
Jul 14 13:49:16 10.0.252.10 garp
Jul 14 13:49:16 10.0.252.10 mrp
Jul 14 13:49:16 10.0.252.10 stp
Jul 14 13:49:16 10.0.252.10 llc
Jul 14 13:49:16 10.0.252.10 [last unloaded: netconsole]
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.873195] CPU: 3 PID: 2940 Comm: 
accel-pppd Not tainted 4.1.0-build-0074 #7
Jul 14 13:49:16 10.0.252.10 [76078.873396] Hardware name: HP ProLiant 
DL320e Gen8 v2, BIOS P80 04/02/2015
Jul 14 13:49:16 10.0.252.10 [76078.873598] task: ffff8800b1886ba0 ti: 
ffff8800b09f4000 task.ti: ffff8800b09f4000
Jul 14 13:49:16 10.0.252.10 [76078.873929] RIP: 
0010:[<ffffffffa011e12a>]
Jul 14 13:49:16 10.0.252.10 [<ffffffffa011e12a>] 
pppoe_release+0x56/0x142 [pppoe]
Jul 14 13:49:16 10.0.252.10 [76078.874317] RSP: 0018:ffff8800b09f7e28  
EFLAGS: 00010202
Jul 14 13:49:16 10.0.252.10 [76078.874512] RAX: 0000000000000000 RBX: 
ffff88032a214400 RCX: 0000000000000000
Jul 14 13:49:16 10.0.252.10 [76078.874709] RDX: 000000000000000d RSI: 
00000000fffffe01 RDI: ffffffff8180d6da
Jul 14 13:49:16 10.0.252.10 [76078.874906] RBP: ffff8800b09f7e68 R08: 
0000000000000000 R09: 0000000000000000
Jul 14 13:49:16 10.0.252.10 [76078.875102] R10: ffff88031ef6a110 R11: 
0000000000000293 R12: ffff88030f8d8fc0
Jul 14 13:49:16 10.0.252.10 [76078.875299] R13: ffff88030f8d8ff0 R14: 
ffff88033115ee40 R15: ffff8803394e4920
Jul 14 13:49:16 10.0.252.10 [76078.875499] FS:  00007f79b602c700(0000) 
GS:ffff880347460000(0000) knlGS:0000000000000000
Jul 14 13:49:16 10.0.252.10 [76078.875837] CS:  0010 DS: 0000 ES: 0000 
CR0: 0000000080050033
Jul 14 13:49:16 10.0.252.10 [76078.876036] CR2: 00000000000003f0 CR3: 
0000000335425000 CR4: 00000000001407e0
Jul 14 13:49:16 10.0.252.10 [76078.876239] Stack:
Jul 14 13:49:16 10.0.252.10 [76078.876434]  ffff88033ac45c80
Jul 14 13:49:16 10.0.252.10 0000000000000000
Jul 14 13:49:16 10.0.252.10 0000000100000000
Jul 14 13:49:16 10.0.252.10 ffff88030f8d8fc0
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.877001]  ffffffffa0120260
Jul 14 13:49:16 10.0.252.10 ffff88030f8d8ff0
Jul 14 13:49:16 10.0.252.10 ffff88033115ee40
Jul 14 13:49:16 10.0.252.10 ffff8803394e4920
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.877564]  ffff8800b09f7e88
Jul 14 13:49:16 10.0.252.10 ffffffff81809e2e
Jul 14 13:49:16 10.0.252.10 ffff88031ef6a100
Jul 14 13:49:16 10.0.252.10 0000000000000008
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.878128] Call Trace:
Jul 14 13:49:16 10.0.252.10 [76078.878327]  [<ffffffff81809e2e>] 
sock_release+0x1a/0x78
Jul 14 13:49:16 10.0.252.10 [76078.878528]  [<ffffffff81809e99>] 
sock_close+0xd/0x11
Jul 14 13:49:16 10.0.252.10 [76078.878728]  [<ffffffff81150395>] 
__fput+0xdf/0x193
Jul 14 13:49:16 10.0.252.10 [76078.878926]  [<ffffffff81150477>] 
____fput+0x9/0xb
Jul 14 13:49:16 10.0.252.10 [76078.879124]  [<ffffffff810cfa95>] 
task_work_run+0x85/0x9c
Jul 14 13:49:16 10.0.252.10 [76078.879326]  [<ffffffff81002979>] 
do_notify_resume+0x40/0x4e
Jul 14 13:49:16 10.0.252.10 [76078.879527]  [<ffffffff818a4a0a>] 
int_signal+0x12/0x17
Jul 14 13:49:16 10.0.252.10 [76078.879726] Code:
Jul 14 13:49:16 10.0.252.10 48
Jul 14 13:49:16 10.0.252.10 8b
Jul 14 13:49:16 10.0.252.10 83
Jul 14 13:49:16 10.0.252.10 e0
Jul 14 13:49:16 10.0.252.10 00
Jul 14 13:49:16 10.0.252.10 00
Jul 14 13:49:16 10.0.252.10 00
Jul 14 13:49:16 10.0.252.10 a8
Jul 14 13:49:16 10.0.252.10 01
Jul 14 13:49:16 10.0.252.10 74
Jul 14 13:49:16 10.0.252.10 12
Jul 14 13:49:16 10.0.252.10 48
Jul 14 13:49:16 10.0.252.10 89
Jul 14 13:49:16 10.0.252.10 df
Jul 14 13:49:16 10.0.252.10 e8
Jul 14 13:49:16 10.0.252.10 87
Jul 14 13:49:16 10.0.252.10 f9
Jul 14 13:49:16 10.0.252.10 6e
Jul 14 13:49:16 10.0.252.10 e1
Jul 14 13:49:16 10.0.252.10 b8
Jul 14 13:49:16 10.0.252.10 f7
Jul 14 13:49:16 10.0.252.10 ff
Jul 14 13:49:16 10.0.252.10 ff
Jul 14 13:49:16 10.0.252.10 ff
Jul 14 13:49:16 10.0.252.10 e9
Jul 14 13:49:16 10.0.252.10 eb
Jul 14 13:49:16 10.0.252.10 00
Jul 14 13:49:16 10.0.252.10 00
Jul 14 13:49:16 10.0.252.10 00
Jul 14 13:49:16 10.0.252.10 8a
Jul 14 13:49:16 10.0.252.10 43
Jul 14 13:49:16 10.0.252.10 12
Jul 14 13:49:16 10.0.252.10 a8
Jul 14 13:49:16 10.0.252.10 0b
Jul 14 13:49:16 10.0.252.10 74
Jul 14 13:49:16 10.0.252.10 1c
Jul 14 13:49:16 10.0.252.10 48
Jul 14 13:49:16 10.0.252.10 8b
Jul 14 13:49:16 10.0.252.10 83
Jul 14 13:49:16 10.0.252.10 b0
Jul 14 13:49:16 10.0.252.10 02
Jul 14 13:49:16 10.0.252.10 00
Jul 14 13:49:16 10.0.252.10 00
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 8b
Jul 14 13:49:16 10.0.252.10 80
Jul 14 13:49:16 10.0.252.10 f0
Jul 14 13:49:16 10.0.252.10 03
Jul 14 13:49:16 10.0.252.10 00
Jul 14 13:49:16 10.0.252.10 00
Jul 14 13:49:16 10.0.252.10 65
Jul 14 13:49:16 10.0.252.10 ff
Jul 14 13:49:16 10.0.252.10 08
Jul 14 13:49:16 10.0.252.10 48
Jul 14 13:49:16 10.0.252.10 c7
Jul 14 13:49:16 10.0.252.10 83
Jul 14 13:49:16 10.0.252.10 b0
Jul 14 13:49:16 10.0.252.10 02
Jul 14 13:49:16 10.0.252.10 00
Jul 14 13:49:16 10.0.252.10 00
Jul 14 13:49:16 10.0.252.10 00
Jul 14 13:49:16 10.0.252.10 00
Jul 14 13:49:16 10.0.252.10 00
Jul 14 13:49:16 10.0.252.10 00
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.883913] RIP
Jul 14 13:49:16 10.0.252.10 [<ffffffffa011e12a>] 
pppoe_release+0x56/0x142 [pppoe]
Jul 14 13:49:16 10.0.252.10 [76078.884171]  RSP <ffff8800b09f7e28>
Jul 14 13:49:16 10.0.252.10 [76078.884368] CR2: 00000000000003f0
Jul 14 10:49:16 10.0.252.10 kernel: [76078.867822] BUG: unable to handle 
kernel NULL pointer dereference at 00000000000003f0
Jul 14 10:49:16 10.0.252.10 kernel: [76078.868280] IP: 
[<ffffffffa011e12a>] pppoe_release+0x56/0x142 [pppoe]
Jul 14 10:49:16 10.0.252.10 kernel: [76078.868541] PGD 336e4a067 PUD 
333f17067 PMD 0
Jul 14 10:49:16 10.0.252.10 kernel: [76078.868918] Oops: 0000 [#1] SMP
Jul 14 10:49:16 10.0.252.10 kernel: [76078.869226] Modules linked in: 
netconsole configfs coretemp sch_fq cls_fw act_police cls_u32 
sch_ingress sch_sfq sch_htb pppoe pppox ppp_generic slhc nf_nat_pptp 
nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre tun
Jul 14 10:49:16 10.0.252.10 kernel: [76078.873195] CPU: 3 PID: 2940 
Comm: accel-pppd Not tainted 4.1.0-build-0074 #7
Jul 14 10:49:16 10.0.252.10 kernel: [76078.873396] Hardware name: HP 
ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015
Jul 14 10:49:16 10.0.252.10 kernel: [76078.873598] task: 
ffff8800b1886ba0 ti: ffff8800b09f4000 task.ti: ffff8800b09f4000
Jul 14 10:49:16 10.0.252.10 kernel: [76078.873929] RIP: 
0010:[<ffffffffa011e12a>]  [<ffffffffa011e12a>] pppoe_release+0x56/0x142 
[pppoe]
Jul 14 10:49:16 10.0.252.10 kernel: [76078.874317] RSP: 
0018:ffff8800b09f7e28  EFLAGS: 00010202
Jul 14 10:49:16 10.0.252.10 kernel: [76078.874512] RAX: 0000000000000000 
RBX: ffff88032a214400 RCX: 0000000000000000
Jul 14 10:49:16 10.0.252.10 kernel: [76078.874709] RDX: 000000000000000d 
RSI: 00000000fffffe01 RDI: ffffffff8180d6da
Jul 14 10:49:16 10.0.252.10 kernel: [76078.874906] RBP: ffff8800b09f7e68 
R08: 0000000000000000 R09: 0000000000000000
Jul 14 10:49:16 10.0.252.10 kernel: [76078.875102] R10: ffff88031ef6a110 
R11: 0000000000000293 R12: ffff88030f8d8fc0
Jul 14 10:49:16 10.0.252.10 kernel: [76078.875299] R13: ffff88030f8d8ff0 
R14: ffff88033115ee40 R15: ffff8803394e4920
Jul 14 10:49:16 10.0.252.10 kernel: [76078.875499] FS:  
00007f79b602c700(0000) GS:ffff880347460000(0000) knlGS:0000000000000000
Jul 14 10:49:16 10.0.252.10 kernel: [76078.875837] CS:  0010 DS: 0000 
ES: 0000 CR0: 0000000080050033
Jul 14 10:49:16 10.0.252.10 kernel: [76078.876036] CR2: 00000000000003f0 
CR3: 0000000335425000 CR4: 00000000001407e0
Jul 14 10:49:16 10.0.252.10 kernel: [76078.876239] Stack:
Jul 14 10:49:16 10.0.252.10 kernel: [76078.876434]  ffff88033ac45c80 
0000000000000000 0000000100000000 ffff88030f8d8fc0
Jul 14 10:49:16 10.0.252.10 kernel: [76078.877001]  ffffffffa0120260 
ffff88030f8d8ff0 ffff88033115ee40 ffff8803394e4920
Jul 14 10:49:16 10.0.252.10 kernel: [76078.877564]  ffff8800b09f7e88 
ffffffff81809e2e ffff88031ef6a100 0000000000000008
Jul 14 10:49:16 10.0.252.10 kernel: [76078.878128] Call Trace:
Jul 14 10:49:16 10.0.252.10 kernel: [76078.878327]  [<ffffffff81809e2e>] 
sock_release+0x1a/0x78
Jul 14 10:49:16 10.0.252.10 kernel: [76078.878528]  [<ffffffff81809e99>] 
sock_close+0xd/0x11
Jul 14 10:49:16 10.0.252.10 kernel: [76078.878728]  [<ffffffff81150395>] 
__fput+0xdf/0x193
Jul 14 10:49:16 10.0.252.10 kernel: [76078.878926]  [<ffffffff81150477>] 
____fput+0x9/0xb
Jul 14 10:49:16 10.0.252.10 kernel: [76078.879124]  [<ffffffff810cfa95>] 
task_work_run+0x85/0x9c
Jul 14 10:49:16 10.0.252.10 kernel: [76078.879326]  [<ffffffff81002979>] 
do_notify_resume+0x40/0x4e
Jul 14 10:49:16 10.0.252.10 kernel: [76078.879527]  [<ffffffff818a4a0a>] 
int_signal+0x12/0x17
Jul 14 10:49:16 10.0.252.10 kernel: [76078.879726] Code: 48 8b 83 e0 00 
00 00 a8 01 74 12 48 89 df e8 87 f9 6e e1 b8 f7 ff ff ff e9 eb 00 00 00 
8a 43 12 a8 0b 74 1c 48 8b 83 b0 02 00 00 <48> 8b 80 f0 03 00 00 65 ff 
08 48 c7 83 b0 02 00 00 00 00 00 00
Jul 14 10:49:16 10.0.252.10 kernel: [76078.883913] RIP  
[<ffffffffa011e12a>] pppoe_release+0x56/0x142 [pppoe]
Jul 14 10:49:16 10.0.252.10 kernel: [76078.884171]  RSP 
<ffff8800b09f7e28>
Jul 14 10:49:16 10.0.252.10 kernel: [76078.884368] CR2: 00000000000003f0
Jul 14 13:49:16 10.0.252.10 [76078.884972] ---[ end trace 
7fa41f8b4758f1fa ]---
Jul 14 10:49:16 10.0.252.10 accel-pppd: pppoe: discard PADR packet 
(incorrect AC-Cookie)
Jul 14 10:49:17 10.0.252.10 kernel: [76078.884972] ---[ end trace 
7fa41f8b4758f1fa ]---
Jul 14 13:49:17 10.0.252.10 [76078.936849] Kernel panic - not syncing: 
Fatal exception
Jul 14 13:49:17 10.0.252.10 [76078.937054] Kernel Offset: disabled

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 4.1.0, kernel panic, pppoe_release
  2015-07-14 10:57 4.1.0, kernel panic, pppoe_release Denys Fedoryshchenko
@ 2015-07-17  9:24 ` Denys Fedoryshchenko
  2015-07-17 15:36   ` Dan Williams
  0 siblings, 1 reply; 9+ messages in thread
From: Denys Fedoryshchenko @ 2015-07-17  9:24 UTC (permalink / raw)
  To: Netdev, ebiederm, davem, simon, dcbw, develop

As i suspect, this kernel panic caused by recent changes to pppoe.
This problem appearing in accel-pppd (server), on loaded servers (2k 
users and more).
Most probably related to changed "pppoe: Use workqueue to die properly 
when a PADT is received"
I will try to reverse this and related patches.

On 2015-07-14 13:57, Denys Fedoryshchenko wrote:
> Here is panic message from netconsole. Please let me know if any
> additional information required.
> 
> Jul 14 13:49:16 10.0.252.10 [76078.867822] BUG: unable to handle kernel
> Jul 14 13:49:16 10.0.252.10 NULL pointer dereference
> Jul 14 13:49:16 10.0.252.10 at 00000000000003f0
> Jul 14 13:49:16 10.0.252.10 [76078.868280] IP:
> Jul 14 13:49:16 10.0.252.10 [<ffffffffa011e12a>]
> pppoe_release+0x56/0x142 [pppoe]
> Jul 14 13:49:16 10.0.252.10 [76078.868541] PGD 336e4a067
> Jul 14 13:49:16 10.0.252.10 PUD 333f17067
> Jul 14 13:49:16 10.0.252.10 PMD 0
> Jul 14 13:49:16 10.0.252.10
> Jul 14 13:49:16 10.0.252.10 [76078.868918] Oops: 0000 [#1]
> Jul 14 13:49:16 10.0.252.10 SMP
> Jul 14 13:49:16 10.0.252.10
> Jul 14 13:49:16 10.0.252.10 [76078.869226] Modules linked in:
> Jul 14 13:49:16 10.0.252.10 netconsole
> Jul 14 13:49:16 10.0.252.10 configfs
> Jul 14 13:49:16 10.0.252.10 coretemp
> Jul 14 13:49:16 10.0.252.10 sch_fq
> Jul 14 13:49:16 10.0.252.10 cls_fw
> Jul 14 13:49:16 10.0.252.10 act_police
> Jul 14 13:49:16 10.0.252.10 cls_u32
> Jul 14 13:49:16 10.0.252.10 sch_ingress
> Jul 14 13:49:16 10.0.252.10 sch_sfq
> Jul 14 13:49:16 10.0.252.10 sch_htb
> Jul 14 13:49:16 10.0.252.10 pppoe
> Jul 14 13:49:16 10.0.252.10 pppox
> Jul 14 13:49:16 10.0.252.10 ppp_generic
> Jul 14 13:49:16 10.0.252.10 slhc
> Jul 14 13:49:16 10.0.252.10 nf_nat_pptp
> Jul 14 13:49:16 10.0.252.10 nf_nat_proto_gre
> Jul 14 13:49:16 10.0.252.10 nf_conntrack_pptp
> Jul 14 13:49:16 10.0.252.10 nf_conntrack_proto_gre
> Jul 14 13:49:16 10.0.252.10 tun
> Jul 14 13:49:16 10.0.252.10 xt_REDIRECT
> Jul 14 13:49:16 10.0.252.10 nf_nat_redirect
> Jul 14 13:49:16 10.0.252.10 xt_set
> Jul 14 13:49:16 10.0.252.10 xt_TCPMSS
> Jul 14 13:49:16 10.0.252.10 ipt_REJECT
> Jul 14 13:49:16 10.0.252.10 nf_reject_ipv4
> Jul 14 13:49:16 10.0.252.10 ts_bm
> Jul 14 13:49:16 10.0.252.10 xt_string
> Jul 14 13:49:16 10.0.252.10 xt_connmark
> Jul 14 13:49:16 10.0.252.10 xt_DSCP
> Jul 14 13:49:16 10.0.252.10 xt_mark
> Jul 14 13:49:16 10.0.252.10 xt_tcpudp
> Jul 14 13:49:16 10.0.252.10 iptable_mangle
> Jul 14 13:49:16 10.0.252.10 iptable_filter
> Jul 14 13:49:16 10.0.252.10 iptable_nat
> Jul 14 13:49:16 10.0.252.10 nf_conntrack_ipv4
> Jul 14 13:49:16 10.0.252.10 nf_defrag_ipv4
> Jul 14 13:49:16 10.0.252.10 nf_nat_ipv4
> Jul 14 13:49:16 10.0.252.10 nf_nat
> Jul 14 13:49:16 10.0.252.10 nf_conntrack
> Jul 14 13:49:16 10.0.252.10 ip_tables
> Jul 14 13:49:16 10.0.252.10 x_tables
> Jul 14 13:49:16 10.0.252.10 ip_set_hash_ip
> Jul 14 13:49:16 10.0.252.10 ip_set
> Jul 14 13:49:16 10.0.252.10 nfnetlink
> Jul 14 13:49:16 10.0.252.10 8021q
> Jul 14 13:49:16 10.0.252.10 garp
> Jul 14 13:49:16 10.0.252.10 mrp
> Jul 14 13:49:16 10.0.252.10 stp
> Jul 14 13:49:16 10.0.252.10 llc
> Jul 14 13:49:16 10.0.252.10 [last unloaded: netconsole]
> Jul 14 13:49:16 10.0.252.10
> Jul 14 13:49:16 10.0.252.10 [76078.873195] CPU: 3 PID: 2940 Comm:
> accel-pppd Not tainted 4.1.0-build-0074 #7
> Jul 14 13:49:16 10.0.252.10 [76078.873396] Hardware name: HP ProLiant
> DL320e Gen8 v2, BIOS P80 04/02/2015
> Jul 14 13:49:16 10.0.252.10 [76078.873598] task: ffff8800b1886ba0 ti:
> ffff8800b09f4000 task.ti: ffff8800b09f4000
> Jul 14 13:49:16 10.0.252.10 [76078.873929] RIP: 
> 0010:[<ffffffffa011e12a>]
> Jul 14 13:49:16 10.0.252.10 [<ffffffffa011e12a>]
> pppoe_release+0x56/0x142 [pppoe]
> Jul 14 13:49:16 10.0.252.10 [76078.874317] RSP: 0018:ffff8800b09f7e28
> EFLAGS: 00010202
> Jul 14 13:49:16 10.0.252.10 [76078.874512] RAX: 0000000000000000 RBX:
> ffff88032a214400 RCX: 0000000000000000
> Jul 14 13:49:16 10.0.252.10 [76078.874709] RDX: 000000000000000d RSI:
> 00000000fffffe01 RDI: ffffffff8180d6da
> Jul 14 13:49:16 10.0.252.10 [76078.874906] RBP: ffff8800b09f7e68 R08:
> 0000000000000000 R09: 0000000000000000
> Jul 14 13:49:16 10.0.252.10 [76078.875102] R10: ffff88031ef6a110 R11:
> 0000000000000293 R12: ffff88030f8d8fc0
> Jul 14 13:49:16 10.0.252.10 [76078.875299] R13: ffff88030f8d8ff0 R14:
> ffff88033115ee40 R15: ffff8803394e4920
> Jul 14 13:49:16 10.0.252.10 [76078.875499] FS:  00007f79b602c700(0000)
> GS:ffff880347460000(0000) knlGS:0000000000000000
> Jul 14 13:49:16 10.0.252.10 [76078.875837] CS:  0010 DS: 0000 ES: 0000
> CR0: 0000000080050033
> Jul 14 13:49:16 10.0.252.10 [76078.876036] CR2: 00000000000003f0 CR3:
> 0000000335425000 CR4: 00000000001407e0
> Jul 14 13:49:16 10.0.252.10 [76078.876239] Stack:
> Jul 14 13:49:16 10.0.252.10 [76078.876434]  ffff88033ac45c80
> Jul 14 13:49:16 10.0.252.10 0000000000000000
> Jul 14 13:49:16 10.0.252.10 0000000100000000
> Jul 14 13:49:16 10.0.252.10 ffff88030f8d8fc0
> Jul 14 13:49:16 10.0.252.10
> Jul 14 13:49:16 10.0.252.10 [76078.877001]  ffffffffa0120260
> Jul 14 13:49:16 10.0.252.10 ffff88030f8d8ff0
> Jul 14 13:49:16 10.0.252.10 ffff88033115ee40
> Jul 14 13:49:16 10.0.252.10 ffff8803394e4920
> Jul 14 13:49:16 10.0.252.10
> Jul 14 13:49:16 10.0.252.10 [76078.877564]  ffff8800b09f7e88
> Jul 14 13:49:16 10.0.252.10 ffffffff81809e2e
> Jul 14 13:49:16 10.0.252.10 ffff88031ef6a100
> Jul 14 13:49:16 10.0.252.10 0000000000000008
> Jul 14 13:49:16 10.0.252.10
> Jul 14 13:49:16 10.0.252.10 [76078.878128] Call Trace:
> Jul 14 13:49:16 10.0.252.10 [76078.878327]  [<ffffffff81809e2e>]
> sock_release+0x1a/0x78
> Jul 14 13:49:16 10.0.252.10 [76078.878528]  [<ffffffff81809e99>]
> sock_close+0xd/0x11
> Jul 14 13:49:16 10.0.252.10 [76078.878728]  [<ffffffff81150395>]
> __fput+0xdf/0x193
> Jul 14 13:49:16 10.0.252.10 [76078.878926]  [<ffffffff81150477>]
> ____fput+0x9/0xb
> Jul 14 13:49:16 10.0.252.10 [76078.879124]  [<ffffffff810cfa95>]
> task_work_run+0x85/0x9c
> Jul 14 13:49:16 10.0.252.10 [76078.879326]  [<ffffffff81002979>]
> do_notify_resume+0x40/0x4e
> Jul 14 13:49:16 10.0.252.10 [76078.879527]  [<ffffffff818a4a0a>]
> int_signal+0x12/0x17
> Jul 14 13:49:16 10.0.252.10 [76078.879726] Code:
> Jul 14 13:49:16 10.0.252.10 48
> Jul 14 13:49:16 10.0.252.10 8b
> Jul 14 13:49:16 10.0.252.10 83
> Jul 14 13:49:16 10.0.252.10 e0
> Jul 14 13:49:16 10.0.252.10 00
> Jul 14 13:49:16 10.0.252.10 00
> Jul 14 13:49:16 10.0.252.10 00
> Jul 14 13:49:16 10.0.252.10 a8
> Jul 14 13:49:16 10.0.252.10 01
> Jul 14 13:49:16 10.0.252.10 74
> Jul 14 13:49:16 10.0.252.10 12
> Jul 14 13:49:16 10.0.252.10 48
> Jul 14 13:49:16 10.0.252.10 89
> Jul 14 13:49:16 10.0.252.10 df
> Jul 14 13:49:16 10.0.252.10 e8
> Jul 14 13:49:16 10.0.252.10 87
> Jul 14 13:49:16 10.0.252.10 f9
> Jul 14 13:49:16 10.0.252.10 6e
> Jul 14 13:49:16 10.0.252.10 e1
> Jul 14 13:49:16 10.0.252.10 b8
> Jul 14 13:49:16 10.0.252.10 f7
> Jul 14 13:49:16 10.0.252.10 ff
> Jul 14 13:49:16 10.0.252.10 ff
> Jul 14 13:49:16 10.0.252.10 ff
> Jul 14 13:49:16 10.0.252.10 e9
> Jul 14 13:49:16 10.0.252.10 eb
> Jul 14 13:49:16 10.0.252.10 00
> Jul 14 13:49:16 10.0.252.10 00
> Jul 14 13:49:16 10.0.252.10 00
> Jul 14 13:49:16 10.0.252.10 8a
> Jul 14 13:49:16 10.0.252.10 43
> Jul 14 13:49:16 10.0.252.10 12
> Jul 14 13:49:16 10.0.252.10 a8
> Jul 14 13:49:16 10.0.252.10 0b
> Jul 14 13:49:16 10.0.252.10 74
> Jul 14 13:49:16 10.0.252.10 1c
> Jul 14 13:49:16 10.0.252.10 48
> Jul 14 13:49:16 10.0.252.10 8b
> Jul 14 13:49:16 10.0.252.10 83
> Jul 14 13:49:16 10.0.252.10 b0
> Jul 14 13:49:16 10.0.252.10 02
> Jul 14 13:49:16 10.0.252.10 00
> Jul 14 13:49:16 10.0.252.10 00
> Jul 14 13:49:16 10.0.252.10
> Jul 14 13:49:16 10.0.252.10 8b
> Jul 14 13:49:16 10.0.252.10 80
> Jul 14 13:49:16 10.0.252.10 f0
> Jul 14 13:49:16 10.0.252.10 03
> Jul 14 13:49:16 10.0.252.10 00
> Jul 14 13:49:16 10.0.252.10 00
> Jul 14 13:49:16 10.0.252.10 65
> Jul 14 13:49:16 10.0.252.10 ff
> Jul 14 13:49:16 10.0.252.10 08
> Jul 14 13:49:16 10.0.252.10 48
> Jul 14 13:49:16 10.0.252.10 c7
> Jul 14 13:49:16 10.0.252.10 83
> Jul 14 13:49:16 10.0.252.10 b0
> Jul 14 13:49:16 10.0.252.10 02
> Jul 14 13:49:16 10.0.252.10 00
> Jul 14 13:49:16 10.0.252.10 00
> Jul 14 13:49:16 10.0.252.10 00
> Jul 14 13:49:16 10.0.252.10 00
> Jul 14 13:49:16 10.0.252.10 00
> Jul 14 13:49:16 10.0.252.10 00
> Jul 14 13:49:16 10.0.252.10
> Jul 14 13:49:16 10.0.252.10 [76078.883913] RIP
> Jul 14 13:49:16 10.0.252.10 [<ffffffffa011e12a>]
> pppoe_release+0x56/0x142 [pppoe]
> Jul 14 13:49:16 10.0.252.10 [76078.884171]  RSP <ffff8800b09f7e28>
> Jul 14 13:49:16 10.0.252.10 [76078.884368] CR2: 00000000000003f0
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.867822] BUG: unable to
> handle kernel NULL pointer dereference at 00000000000003f0
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.868280] IP:
> [<ffffffffa011e12a>] pppoe_release+0x56/0x142 [pppoe]
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.868541] PGD 336e4a067 PUD
> 333f17067 PMD 0
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.868918] Oops: 0000 [#1] SMP
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.869226] Modules linked in:
> netconsole configfs coretemp sch_fq cls_fw act_police cls_u32
> sch_ingress sch_sfq sch_htb pppoe pppox ppp_generic slhc nf_nat_pptp
> nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre tun
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.873195] CPU: 3 PID: 2940
> Comm: accel-pppd Not tainted 4.1.0-build-0074 #7
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.873396] Hardware name: HP
> ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.873598] task:
> ffff8800b1886ba0 ti: ffff8800b09f4000 task.ti: ffff8800b09f4000
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.873929] RIP:
> 0010:[<ffffffffa011e12a>]  [<ffffffffa011e12a>]
> pppoe_release+0x56/0x142 [pppoe]
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.874317] RSP:
> 0018:ffff8800b09f7e28  EFLAGS: 00010202
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.874512] RAX:
> 0000000000000000 RBX: ffff88032a214400 RCX: 0000000000000000
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.874709] RDX:
> 000000000000000d RSI: 00000000fffffe01 RDI: ffffffff8180d6da
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.874906] RBP:
> ffff8800b09f7e68 R08: 0000000000000000 R09: 0000000000000000
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.875102] R10:
> ffff88031ef6a110 R11: 0000000000000293 R12: ffff88030f8d8fc0
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.875299] R13:
> ffff88030f8d8ff0 R14: ffff88033115ee40 R15: ffff8803394e4920
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.875499] FS:
> 00007f79b602c700(0000) GS:ffff880347460000(0000)
> knlGS:0000000000000000
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.875837] CS:  0010 DS: 0000
> ES: 0000 CR0: 0000000080050033
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.876036] CR2:
> 00000000000003f0 CR3: 0000000335425000 CR4: 00000000001407e0
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.876239] Stack:
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.876434]  ffff88033ac45c80
> 0000000000000000 0000000100000000 ffff88030f8d8fc0
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.877001]  ffffffffa0120260
> ffff88030f8d8ff0 ffff88033115ee40 ffff8803394e4920
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.877564]  ffff8800b09f7e88
> ffffffff81809e2e ffff88031ef6a100 0000000000000008
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.878128] Call Trace:
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.878327]
> [<ffffffff81809e2e>] sock_release+0x1a/0x78
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.878528]
> [<ffffffff81809e99>] sock_close+0xd/0x11
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.878728]
> [<ffffffff81150395>] __fput+0xdf/0x193
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.878926]
> [<ffffffff81150477>] ____fput+0x9/0xb
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.879124]
> [<ffffffff810cfa95>] task_work_run+0x85/0x9c
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.879326]
> [<ffffffff81002979>] do_notify_resume+0x40/0x4e
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.879527]
> [<ffffffff818a4a0a>] int_signal+0x12/0x17
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.879726] Code: 48 8b 83 e0
> 00 00 00 a8 01 74 12 48 89 df e8 87 f9 6e e1 b8 f7 ff ff ff e9 eb 00
> 00 00 8a 43 12 a8 0b 74 1c 48 8b 83 b0 02 00 00 <48> 8b 80 f0 03 00 00
> 65 ff 08 48 c7 83 b0 02 00 00 00 00 00 00
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.883913] RIP
> [<ffffffffa011e12a>] pppoe_release+0x56/0x142 [pppoe]
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.884171]  RSP 
> <ffff8800b09f7e28>
> Jul 14 10:49:16 10.0.252.10 kernel: [76078.884368] CR2: 
> 00000000000003f0
> Jul 14 13:49:16 10.0.252.10 [76078.884972] ---[ end trace 
> 7fa41f8b4758f1fa ]---
> Jul 14 10:49:16 10.0.252.10 accel-pppd: pppoe: discard PADR packet
> (incorrect AC-Cookie)
> Jul 14 10:49:17 10.0.252.10 kernel: [76078.884972] ---[ end trace
> 7fa41f8b4758f1fa ]---
> Jul 14 13:49:17 10.0.252.10 [76078.936849] Kernel panic - not syncing:
> Fatal exception
> Jul 14 13:49:17 10.0.252.10 [76078.937054] Kernel Offset: disabled

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 4.1.0, kernel panic, pppoe_release
  2015-07-17  9:24 ` Denys Fedoryshchenko
@ 2015-07-17 15:36   ` Dan Williams
  2015-07-17 18:16     ` Denys Fedoryshchenko
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Williams @ 2015-07-17 15:36 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: Netdev, ebiederm, davem, simon, develop

On Fri, 2015-07-17 at 12:24 +0300, Denys Fedoryshchenko wrote:
> As i suspect, this kernel panic caused by recent changes to pppoe.
> This problem appearing in accel-pppd (server), on loaded servers (2k 
> users and more).
> Most probably related to changed "pppoe: Use workqueue to die properly 
> when a PADT is received"
> I will try to reverse this and related patches.

While I didn't write the patch, I'm the one that started the process
that got it submitted...  Could you review the patch quickly too to see
if you can spot anything amiss with it, so that it could get fixed up?
The original patch does fix a real problem so ideally we don't have to
revert the whole thing upstream.

Dan

> On 2015-07-14 13:57, Denys Fedoryshchenko wrote:
> > Here is panic message from netconsole. Please let me know if any
> > additional information required.
> > 
> > Jul 14 13:49:16 10.0.252.10 [76078.867822] BUG: unable to handle kernel
> > Jul 14 13:49:16 10.0.252.10 NULL pointer dereference
> > Jul 14 13:49:16 10.0.252.10 at 00000000000003f0
> > Jul 14 13:49:16 10.0.252.10 [76078.868280] IP:
> > Jul 14 13:49:16 10.0.252.10 [<ffffffffa011e12a>]
> > pppoe_release+0x56/0x142 [pppoe]
> > Jul 14 13:49:16 10.0.252.10 [76078.868541] PGD 336e4a067
> > Jul 14 13:49:16 10.0.252.10 PUD 333f17067
> > Jul 14 13:49:16 10.0.252.10 PMD 0
> > Jul 14 13:49:16 10.0.252.10
> > Jul 14 13:49:16 10.0.252.10 [76078.868918] Oops: 0000 [#1]
> > Jul 14 13:49:16 10.0.252.10 SMP
> > Jul 14 13:49:16 10.0.252.10
> > Jul 14 13:49:16 10.0.252.10 [76078.869226] Modules linked in:
> > Jul 14 13:49:16 10.0.252.10 netconsole
> > Jul 14 13:49:16 10.0.252.10 configfs
> > Jul 14 13:49:16 10.0.252.10 coretemp
> > Jul 14 13:49:16 10.0.252.10 sch_fq
> > Jul 14 13:49:16 10.0.252.10 cls_fw
> > Jul 14 13:49:16 10.0.252.10 act_police
> > Jul 14 13:49:16 10.0.252.10 cls_u32
> > Jul 14 13:49:16 10.0.252.10 sch_ingress
> > Jul 14 13:49:16 10.0.252.10 sch_sfq
> > Jul 14 13:49:16 10.0.252.10 sch_htb
> > Jul 14 13:49:16 10.0.252.10 pppoe
> > Jul 14 13:49:16 10.0.252.10 pppox
> > Jul 14 13:49:16 10.0.252.10 ppp_generic
> > Jul 14 13:49:16 10.0.252.10 slhc
> > Jul 14 13:49:16 10.0.252.10 nf_nat_pptp
> > Jul 14 13:49:16 10.0.252.10 nf_nat_proto_gre
> > Jul 14 13:49:16 10.0.252.10 nf_conntrack_pptp
> > Jul 14 13:49:16 10.0.252.10 nf_conntrack_proto_gre
> > Jul 14 13:49:16 10.0.252.10 tun
> > Jul 14 13:49:16 10.0.252.10 xt_REDIRECT
> > Jul 14 13:49:16 10.0.252.10 nf_nat_redirect
> > Jul 14 13:49:16 10.0.252.10 xt_set
> > Jul 14 13:49:16 10.0.252.10 xt_TCPMSS
> > Jul 14 13:49:16 10.0.252.10 ipt_REJECT
> > Jul 14 13:49:16 10.0.252.10 nf_reject_ipv4
> > Jul 14 13:49:16 10.0.252.10 ts_bm
> > Jul 14 13:49:16 10.0.252.10 xt_string
> > Jul 14 13:49:16 10.0.252.10 xt_connmark
> > Jul 14 13:49:16 10.0.252.10 xt_DSCP
> > Jul 14 13:49:16 10.0.252.10 xt_mark
> > Jul 14 13:49:16 10.0.252.10 xt_tcpudp
> > Jul 14 13:49:16 10.0.252.10 iptable_mangle
> > Jul 14 13:49:16 10.0.252.10 iptable_filter
> > Jul 14 13:49:16 10.0.252.10 iptable_nat
> > Jul 14 13:49:16 10.0.252.10 nf_conntrack_ipv4
> > Jul 14 13:49:16 10.0.252.10 nf_defrag_ipv4
> > Jul 14 13:49:16 10.0.252.10 nf_nat_ipv4
> > Jul 14 13:49:16 10.0.252.10 nf_nat
> > Jul 14 13:49:16 10.0.252.10 nf_conntrack
> > Jul 14 13:49:16 10.0.252.10 ip_tables
> > Jul 14 13:49:16 10.0.252.10 x_tables
> > Jul 14 13:49:16 10.0.252.10 ip_set_hash_ip
> > Jul 14 13:49:16 10.0.252.10 ip_set
> > Jul 14 13:49:16 10.0.252.10 nfnetlink
> > Jul 14 13:49:16 10.0.252.10 8021q
> > Jul 14 13:49:16 10.0.252.10 garp
> > Jul 14 13:49:16 10.0.252.10 mrp
> > Jul 14 13:49:16 10.0.252.10 stp
> > Jul 14 13:49:16 10.0.252.10 llc
> > Jul 14 13:49:16 10.0.252.10 [last unloaded: netconsole]
> > Jul 14 13:49:16 10.0.252.10
> > Jul 14 13:49:16 10.0.252.10 [76078.873195] CPU: 3 PID: 2940 Comm:
> > accel-pppd Not tainted 4.1.0-build-0074 #7
> > Jul 14 13:49:16 10.0.252.10 [76078.873396] Hardware name: HP ProLiant
> > DL320e Gen8 v2, BIOS P80 04/02/2015
> > Jul 14 13:49:16 10.0.252.10 [76078.873598] task: ffff8800b1886ba0 ti:
> > ffff8800b09f4000 task.ti: ffff8800b09f4000
> > Jul 14 13:49:16 10.0.252.10 [76078.873929] RIP: 
> > 0010:[<ffffffffa011e12a>]
> > Jul 14 13:49:16 10.0.252.10 [<ffffffffa011e12a>]
> > pppoe_release+0x56/0x142 [pppoe]
> > Jul 14 13:49:16 10.0.252.10 [76078.874317] RSP: 0018:ffff8800b09f7e28
> > EFLAGS: 00010202
> > Jul 14 13:49:16 10.0.252.10 [76078.874512] RAX: 0000000000000000 RBX:
> > ffff88032a214400 RCX: 0000000000000000
> > Jul 14 13:49:16 10.0.252.10 [76078.874709] RDX: 000000000000000d RSI:
> > 00000000fffffe01 RDI: ffffffff8180d6da
> > Jul 14 13:49:16 10.0.252.10 [76078.874906] RBP: ffff8800b09f7e68 R08:
> > 0000000000000000 R09: 0000000000000000
> > Jul 14 13:49:16 10.0.252.10 [76078.875102] R10: ffff88031ef6a110 R11:
> > 0000000000000293 R12: ffff88030f8d8fc0
> > Jul 14 13:49:16 10.0.252.10 [76078.875299] R13: ffff88030f8d8ff0 R14:
> > ffff88033115ee40 R15: ffff8803394e4920
> > Jul 14 13:49:16 10.0.252.10 [76078.875499] FS:  00007f79b602c700(0000)
> > GS:ffff880347460000(0000) knlGS:0000000000000000
> > Jul 14 13:49:16 10.0.252.10 [76078.875837] CS:  0010 DS: 0000 ES: 0000
> > CR0: 0000000080050033
> > Jul 14 13:49:16 10.0.252.10 [76078.876036] CR2: 00000000000003f0 CR3:
> > 0000000335425000 CR4: 00000000001407e0
> > Jul 14 13:49:16 10.0.252.10 [76078.876239] Stack:
> > Jul 14 13:49:16 10.0.252.10 [76078.876434]  ffff88033ac45c80
> > Jul 14 13:49:16 10.0.252.10 0000000000000000
> > Jul 14 13:49:16 10.0.252.10 0000000100000000
> > Jul 14 13:49:16 10.0.252.10 ffff88030f8d8fc0
> > Jul 14 13:49:16 10.0.252.10
> > Jul 14 13:49:16 10.0.252.10 [76078.877001]  ffffffffa0120260
> > Jul 14 13:49:16 10.0.252.10 ffff88030f8d8ff0
> > Jul 14 13:49:16 10.0.252.10 ffff88033115ee40
> > Jul 14 13:49:16 10.0.252.10 ffff8803394e4920
> > Jul 14 13:49:16 10.0.252.10
> > Jul 14 13:49:16 10.0.252.10 [76078.877564]  ffff8800b09f7e88
> > Jul 14 13:49:16 10.0.252.10 ffffffff81809e2e
> > Jul 14 13:49:16 10.0.252.10 ffff88031ef6a100
> > Jul 14 13:49:16 10.0.252.10 0000000000000008
> > Jul 14 13:49:16 10.0.252.10
> > Jul 14 13:49:16 10.0.252.10 [76078.878128] Call Trace:
> > Jul 14 13:49:16 10.0.252.10 [76078.878327]  [<ffffffff81809e2e>]
> > sock_release+0x1a/0x78
> > Jul 14 13:49:16 10.0.252.10 [76078.878528]  [<ffffffff81809e99>]
> > sock_close+0xd/0x11
> > Jul 14 13:49:16 10.0.252.10 [76078.878728]  [<ffffffff81150395>]
> > __fput+0xdf/0x193
> > Jul 14 13:49:16 10.0.252.10 [76078.878926]  [<ffffffff81150477>]
> > ____fput+0x9/0xb
> > Jul 14 13:49:16 10.0.252.10 [76078.879124]  [<ffffffff810cfa95>]
> > task_work_run+0x85/0x9c
> > Jul 14 13:49:16 10.0.252.10 [76078.879326]  [<ffffffff81002979>]
> > do_notify_resume+0x40/0x4e
> > Jul 14 13:49:16 10.0.252.10 [76078.879527]  [<ffffffff818a4a0a>]
> > int_signal+0x12/0x17
> > Jul 14 13:49:16 10.0.252.10 [76078.879726] Code:
> > Jul 14 13:49:16 10.0.252.10 48
> > Jul 14 13:49:16 10.0.252.10 8b
> > Jul 14 13:49:16 10.0.252.10 83
> > Jul 14 13:49:16 10.0.252.10 e0
> > Jul 14 13:49:16 10.0.252.10 00
> > Jul 14 13:49:16 10.0.252.10 00
> > Jul 14 13:49:16 10.0.252.10 00
> > Jul 14 13:49:16 10.0.252.10 a8
> > Jul 14 13:49:16 10.0.252.10 01
> > Jul 14 13:49:16 10.0.252.10 74
> > Jul 14 13:49:16 10.0.252.10 12
> > Jul 14 13:49:16 10.0.252.10 48
> > Jul 14 13:49:16 10.0.252.10 89
> > Jul 14 13:49:16 10.0.252.10 df
> > Jul 14 13:49:16 10.0.252.10 e8
> > Jul 14 13:49:16 10.0.252.10 87
> > Jul 14 13:49:16 10.0.252.10 f9
> > Jul 14 13:49:16 10.0.252.10 6e
> > Jul 14 13:49:16 10.0.252.10 e1
> > Jul 14 13:49:16 10.0.252.10 b8
> > Jul 14 13:49:16 10.0.252.10 f7
> > Jul 14 13:49:16 10.0.252.10 ff
> > Jul 14 13:49:16 10.0.252.10 ff
> > Jul 14 13:49:16 10.0.252.10 ff
> > Jul 14 13:49:16 10.0.252.10 e9
> > Jul 14 13:49:16 10.0.252.10 eb
> > Jul 14 13:49:16 10.0.252.10 00
> > Jul 14 13:49:16 10.0.252.10 00
> > Jul 14 13:49:16 10.0.252.10 00
> > Jul 14 13:49:16 10.0.252.10 8a
> > Jul 14 13:49:16 10.0.252.10 43
> > Jul 14 13:49:16 10.0.252.10 12
> > Jul 14 13:49:16 10.0.252.10 a8
> > Jul 14 13:49:16 10.0.252.10 0b
> > Jul 14 13:49:16 10.0.252.10 74
> > Jul 14 13:49:16 10.0.252.10 1c
> > Jul 14 13:49:16 10.0.252.10 48
> > Jul 14 13:49:16 10.0.252.10 8b
> > Jul 14 13:49:16 10.0.252.10 83
> > Jul 14 13:49:16 10.0.252.10 b0
> > Jul 14 13:49:16 10.0.252.10 02
> > Jul 14 13:49:16 10.0.252.10 00
> > Jul 14 13:49:16 10.0.252.10 00
> > Jul 14 13:49:16 10.0.252.10
> > Jul 14 13:49:16 10.0.252.10 8b
> > Jul 14 13:49:16 10.0.252.10 80
> > Jul 14 13:49:16 10.0.252.10 f0
> > Jul 14 13:49:16 10.0.252.10 03
> > Jul 14 13:49:16 10.0.252.10 00
> > Jul 14 13:49:16 10.0.252.10 00
> > Jul 14 13:49:16 10.0.252.10 65
> > Jul 14 13:49:16 10.0.252.10 ff
> > Jul 14 13:49:16 10.0.252.10 08
> > Jul 14 13:49:16 10.0.252.10 48
> > Jul 14 13:49:16 10.0.252.10 c7
> > Jul 14 13:49:16 10.0.252.10 83
> > Jul 14 13:49:16 10.0.252.10 b0
> > Jul 14 13:49:16 10.0.252.10 02
> > Jul 14 13:49:16 10.0.252.10 00
> > Jul 14 13:49:16 10.0.252.10 00
> > Jul 14 13:49:16 10.0.252.10 00
> > Jul 14 13:49:16 10.0.252.10 00
> > Jul 14 13:49:16 10.0.252.10 00
> > Jul 14 13:49:16 10.0.252.10 00
> > Jul 14 13:49:16 10.0.252.10
> > Jul 14 13:49:16 10.0.252.10 [76078.883913] RIP
> > Jul 14 13:49:16 10.0.252.10 [<ffffffffa011e12a>]
> > pppoe_release+0x56/0x142 [pppoe]
> > Jul 14 13:49:16 10.0.252.10 [76078.884171]  RSP <ffff8800b09f7e28>
> > Jul 14 13:49:16 10.0.252.10 [76078.884368] CR2: 00000000000003f0
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.867822] BUG: unable to
> > handle kernel NULL pointer dereference at 00000000000003f0
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.868280] IP:
> > [<ffffffffa011e12a>] pppoe_release+0x56/0x142 [pppoe]
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.868541] PGD 336e4a067 PUD
> > 333f17067 PMD 0
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.868918] Oops: 0000 [#1] SMP
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.869226] Modules linked in:
> > netconsole configfs coretemp sch_fq cls_fw act_police cls_u32
> > sch_ingress sch_sfq sch_htb pppoe pppox ppp_generic slhc nf_nat_pptp
> > nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre tun
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.873195] CPU: 3 PID: 2940
> > Comm: accel-pppd Not tainted 4.1.0-build-0074 #7
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.873396] Hardware name: HP
> > ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.873598] task:
> > ffff8800b1886ba0 ti: ffff8800b09f4000 task.ti: ffff8800b09f4000
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.873929] RIP:
> > 0010:[<ffffffffa011e12a>]  [<ffffffffa011e12a>]
> > pppoe_release+0x56/0x142 [pppoe]
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.874317] RSP:
> > 0018:ffff8800b09f7e28  EFLAGS: 00010202
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.874512] RAX:
> > 0000000000000000 RBX: ffff88032a214400 RCX: 0000000000000000
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.874709] RDX:
> > 000000000000000d RSI: 00000000fffffe01 RDI: ffffffff8180d6da
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.874906] RBP:
> > ffff8800b09f7e68 R08: 0000000000000000 R09: 0000000000000000
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.875102] R10:
> > ffff88031ef6a110 R11: 0000000000000293 R12: ffff88030f8d8fc0
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.875299] R13:
> > ffff88030f8d8ff0 R14: ffff88033115ee40 R15: ffff8803394e4920
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.875499] FS:
> > 00007f79b602c700(0000) GS:ffff880347460000(0000)
> > knlGS:0000000000000000
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.875837] CS:  0010 DS: 0000
> > ES: 0000 CR0: 0000000080050033
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.876036] CR2:
> > 00000000000003f0 CR3: 0000000335425000 CR4: 00000000001407e0
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.876239] Stack:
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.876434]  ffff88033ac45c80
> > 0000000000000000 0000000100000000 ffff88030f8d8fc0
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.877001]  ffffffffa0120260
> > ffff88030f8d8ff0 ffff88033115ee40 ffff8803394e4920
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.877564]  ffff8800b09f7e88
> > ffffffff81809e2e ffff88031ef6a100 0000000000000008
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.878128] Call Trace:
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.878327]
> > [<ffffffff81809e2e>] sock_release+0x1a/0x78
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.878528]
> > [<ffffffff81809e99>] sock_close+0xd/0x11
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.878728]
> > [<ffffffff81150395>] __fput+0xdf/0x193
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.878926]
> > [<ffffffff81150477>] ____fput+0x9/0xb
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.879124]
> > [<ffffffff810cfa95>] task_work_run+0x85/0x9c
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.879326]
> > [<ffffffff81002979>] do_notify_resume+0x40/0x4e
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.879527]
> > [<ffffffff818a4a0a>] int_signal+0x12/0x17
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.879726] Code: 48 8b 83 e0
> > 00 00 00 a8 01 74 12 48 89 df e8 87 f9 6e e1 b8 f7 ff ff ff e9 eb 00
> > 00 00 8a 43 12 a8 0b 74 1c 48 8b 83 b0 02 00 00 <48> 8b 80 f0 03 00 00
> > 65 ff 08 48 c7 83 b0 02 00 00 00 00 00 00
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.883913] RIP
> > [<ffffffffa011e12a>] pppoe_release+0x56/0x142 [pppoe]
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.884171]  RSP 
> > <ffff8800b09f7e28>
> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.884368] CR2: 
> > 00000000000003f0
> > Jul 14 13:49:16 10.0.252.10 [76078.884972] ---[ end trace 
> > 7fa41f8b4758f1fa ]---
> > Jul 14 10:49:16 10.0.252.10 accel-pppd: pppoe: discard PADR packet
> > (incorrect AC-Cookie)
> > Jul 14 10:49:17 10.0.252.10 kernel: [76078.884972] ---[ end trace
> > 7fa41f8b4758f1fa ]---
> > Jul 14 13:49:17 10.0.252.10 [76078.936849] Kernel panic - not syncing:
> > Fatal exception
> > Jul 14 13:49:17 10.0.252.10 [76078.937054] Kernel Offset: disabled
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 4.1.0, kernel panic, pppoe_release
  2015-07-17 15:36   ` Dan Williams
@ 2015-07-17 18:16     ` Denys Fedoryshchenko
  2015-09-10 15:56       ` Guillaume Nault
  0 siblings, 1 reply; 9+ messages in thread
From: Denys Fedoryshchenko @ 2015-07-17 18:16 UTC (permalink / raw)
  To: Dan Williams; +Cc: Netdev, ebiederm, davem, simon, develop

Probably my knowledge of kernel is not sufficient, but i will try few 
approaches.
One of them to add to pppoe_unbind_sock_work:

         pppox_unbind_sock(sk);
         +/* Signal the death of the socket. */
         +sk->sk_state = PPPOX_DEAD;

I will wait first, to make sure this patch was causing kernel panic (it 
needs 24h testing cycle), then i will try this fix.

On 2015-07-17 18:36, Dan Williams wrote:
> On Fri, 2015-07-17 at 12:24 +0300, Denys Fedoryshchenko wrote:
>> As i suspect, this kernel panic caused by recent changes to pppoe.
>> This problem appearing in accel-pppd (server), on loaded servers (2k
>> users and more).
>> Most probably related to changed "pppoe: Use workqueue to die properly
>> when a PADT is received"
>> I will try to reverse this and related patches.
> 
> While I didn't write the patch, I'm the one that started the process
> that got it submitted...  Could you review the patch quickly too to see
> if you can spot anything amiss with it, so that it could get fixed up?
> The original patch does fix a real problem so ideally we don't have to
> revert the whole thing upstream.
> 
> Dan
> 
>> On 2015-07-14 13:57, Denys Fedoryshchenko wrote:
>> > Here is panic message from netconsole. Please let me know if any
>> > additional information required.
>> >
>> > Jul 14 13:49:16 10.0.252.10 [76078.867822] BUG: unable to handle kernel
>> > Jul 14 13:49:16 10.0.252.10 NULL pointer dereference
>> > Jul 14 13:49:16 10.0.252.10 at 00000000000003f0
>> > Jul 14 13:49:16 10.0.252.10 [76078.868280] IP:
>> > Jul 14 13:49:16 10.0.252.10 [<ffffffffa011e12a>]
>> > pppoe_release+0x56/0x142 [pppoe]
>> > Jul 14 13:49:16 10.0.252.10 [76078.868541] PGD 336e4a067
>> > Jul 14 13:49:16 10.0.252.10 PUD 333f17067
>> > Jul 14 13:49:16 10.0.252.10 PMD 0
>> > Jul 14 13:49:16 10.0.252.10
>> > Jul 14 13:49:16 10.0.252.10 [76078.868918] Oops: 0000 [#1]
>> > Jul 14 13:49:16 10.0.252.10 SMP
>> > Jul 14 13:49:16 10.0.252.10
>> > Jul 14 13:49:16 10.0.252.10 [76078.869226] Modules linked in:
>> > Jul 14 13:49:16 10.0.252.10 netconsole
>> > Jul 14 13:49:16 10.0.252.10 configfs
>> > Jul 14 13:49:16 10.0.252.10 coretemp
>> > Jul 14 13:49:16 10.0.252.10 sch_fq
>> > Jul 14 13:49:16 10.0.252.10 cls_fw
>> > Jul 14 13:49:16 10.0.252.10 act_police
>> > Jul 14 13:49:16 10.0.252.10 cls_u32
>> > Jul 14 13:49:16 10.0.252.10 sch_ingress
>> > Jul 14 13:49:16 10.0.252.10 sch_sfq
>> > Jul 14 13:49:16 10.0.252.10 sch_htb
>> > Jul 14 13:49:16 10.0.252.10 pppoe
>> > Jul 14 13:49:16 10.0.252.10 pppox
>> > Jul 14 13:49:16 10.0.252.10 ppp_generic
>> > Jul 14 13:49:16 10.0.252.10 slhc
>> > Jul 14 13:49:16 10.0.252.10 nf_nat_pptp
>> > Jul 14 13:49:16 10.0.252.10 nf_nat_proto_gre
>> > Jul 14 13:49:16 10.0.252.10 nf_conntrack_pptp
>> > Jul 14 13:49:16 10.0.252.10 nf_conntrack_proto_gre
>> > Jul 14 13:49:16 10.0.252.10 tun
>> > Jul 14 13:49:16 10.0.252.10 xt_REDIRECT
>> > Jul 14 13:49:16 10.0.252.10 nf_nat_redirect
>> > Jul 14 13:49:16 10.0.252.10 xt_set
>> > Jul 14 13:49:16 10.0.252.10 xt_TCPMSS
>> > Jul 14 13:49:16 10.0.252.10 ipt_REJECT
>> > Jul 14 13:49:16 10.0.252.10 nf_reject_ipv4
>> > Jul 14 13:49:16 10.0.252.10 ts_bm
>> > Jul 14 13:49:16 10.0.252.10 xt_string
>> > Jul 14 13:49:16 10.0.252.10 xt_connmark
>> > Jul 14 13:49:16 10.0.252.10 xt_DSCP
>> > Jul 14 13:49:16 10.0.252.10 xt_mark
>> > Jul 14 13:49:16 10.0.252.10 xt_tcpudp
>> > Jul 14 13:49:16 10.0.252.10 iptable_mangle
>> > Jul 14 13:49:16 10.0.252.10 iptable_filter
>> > Jul 14 13:49:16 10.0.252.10 iptable_nat
>> > Jul 14 13:49:16 10.0.252.10 nf_conntrack_ipv4
>> > Jul 14 13:49:16 10.0.252.10 nf_defrag_ipv4
>> > Jul 14 13:49:16 10.0.252.10 nf_nat_ipv4
>> > Jul 14 13:49:16 10.0.252.10 nf_nat
>> > Jul 14 13:49:16 10.0.252.10 nf_conntrack
>> > Jul 14 13:49:16 10.0.252.10 ip_tables
>> > Jul 14 13:49:16 10.0.252.10 x_tables
>> > Jul 14 13:49:16 10.0.252.10 ip_set_hash_ip
>> > Jul 14 13:49:16 10.0.252.10 ip_set
>> > Jul 14 13:49:16 10.0.252.10 nfnetlink
>> > Jul 14 13:49:16 10.0.252.10 8021q
>> > Jul 14 13:49:16 10.0.252.10 garp
>> > Jul 14 13:49:16 10.0.252.10 mrp
>> > Jul 14 13:49:16 10.0.252.10 stp
>> > Jul 14 13:49:16 10.0.252.10 llc
>> > Jul 14 13:49:16 10.0.252.10 [last unloaded: netconsole]
>> > Jul 14 13:49:16 10.0.252.10
>> > Jul 14 13:49:16 10.0.252.10 [76078.873195] CPU: 3 PID: 2940 Comm:
>> > accel-pppd Not tainted 4.1.0-build-0074 #7
>> > Jul 14 13:49:16 10.0.252.10 [76078.873396] Hardware name: HP ProLiant
>> > DL320e Gen8 v2, BIOS P80 04/02/2015
>> > Jul 14 13:49:16 10.0.252.10 [76078.873598] task: ffff8800b1886ba0 ti:
>> > ffff8800b09f4000 task.ti: ffff8800b09f4000
>> > Jul 14 13:49:16 10.0.252.10 [76078.873929] RIP:
>> > 0010:[<ffffffffa011e12a>]
>> > Jul 14 13:49:16 10.0.252.10 [<ffffffffa011e12a>]
>> > pppoe_release+0x56/0x142 [pppoe]
>> > Jul 14 13:49:16 10.0.252.10 [76078.874317] RSP: 0018:ffff8800b09f7e28
>> > EFLAGS: 00010202
>> > Jul 14 13:49:16 10.0.252.10 [76078.874512] RAX: 0000000000000000 RBX:
>> > ffff88032a214400 RCX: 0000000000000000
>> > Jul 14 13:49:16 10.0.252.10 [76078.874709] RDX: 000000000000000d RSI:
>> > 00000000fffffe01 RDI: ffffffff8180d6da
>> > Jul 14 13:49:16 10.0.252.10 [76078.874906] RBP: ffff8800b09f7e68 R08:
>> > 0000000000000000 R09: 0000000000000000
>> > Jul 14 13:49:16 10.0.252.10 [76078.875102] R10: ffff88031ef6a110 R11:
>> > 0000000000000293 R12: ffff88030f8d8fc0
>> > Jul 14 13:49:16 10.0.252.10 [76078.875299] R13: ffff88030f8d8ff0 R14:
>> > ffff88033115ee40 R15: ffff8803394e4920
>> > Jul 14 13:49:16 10.0.252.10 [76078.875499] FS:  00007f79b602c700(0000)
>> > GS:ffff880347460000(0000) knlGS:0000000000000000
>> > Jul 14 13:49:16 10.0.252.10 [76078.875837] CS:  0010 DS: 0000 ES: 0000
>> > CR0: 0000000080050033
>> > Jul 14 13:49:16 10.0.252.10 [76078.876036] CR2: 00000000000003f0 CR3:
>> > 0000000335425000 CR4: 00000000001407e0
>> > Jul 14 13:49:16 10.0.252.10 [76078.876239] Stack:
>> > Jul 14 13:49:16 10.0.252.10 [76078.876434]  ffff88033ac45c80
>> > Jul 14 13:49:16 10.0.252.10 0000000000000000
>> > Jul 14 13:49:16 10.0.252.10 0000000100000000
>> > Jul 14 13:49:16 10.0.252.10 ffff88030f8d8fc0
>> > Jul 14 13:49:16 10.0.252.10
>> > Jul 14 13:49:16 10.0.252.10 [76078.877001]  ffffffffa0120260
>> > Jul 14 13:49:16 10.0.252.10 ffff88030f8d8ff0
>> > Jul 14 13:49:16 10.0.252.10 ffff88033115ee40
>> > Jul 14 13:49:16 10.0.252.10 ffff8803394e4920
>> > Jul 14 13:49:16 10.0.252.10
>> > Jul 14 13:49:16 10.0.252.10 [76078.877564]  ffff8800b09f7e88
>> > Jul 14 13:49:16 10.0.252.10 ffffffff81809e2e
>> > Jul 14 13:49:16 10.0.252.10 ffff88031ef6a100
>> > Jul 14 13:49:16 10.0.252.10 0000000000000008
>> > Jul 14 13:49:16 10.0.252.10
>> > Jul 14 13:49:16 10.0.252.10 [76078.878128] Call Trace:
>> > Jul 14 13:49:16 10.0.252.10 [76078.878327]  [<ffffffff81809e2e>]
>> > sock_release+0x1a/0x78
>> > Jul 14 13:49:16 10.0.252.10 [76078.878528]  [<ffffffff81809e99>]
>> > sock_close+0xd/0x11
>> > Jul 14 13:49:16 10.0.252.10 [76078.878728]  [<ffffffff81150395>]
>> > __fput+0xdf/0x193
>> > Jul 14 13:49:16 10.0.252.10 [76078.878926]  [<ffffffff81150477>]
>> > ____fput+0x9/0xb
>> > Jul 14 13:49:16 10.0.252.10 [76078.879124]  [<ffffffff810cfa95>]
>> > task_work_run+0x85/0x9c
>> > Jul 14 13:49:16 10.0.252.10 [76078.879326]  [<ffffffff81002979>]
>> > do_notify_resume+0x40/0x4e
>> > Jul 14 13:49:16 10.0.252.10 [76078.879527]  [<ffffffff818a4a0a>]
>> > int_signal+0x12/0x17
>> > Jul 14 13:49:16 10.0.252.10 [76078.879726] Code:
>> > Jul 14 13:49:16 10.0.252.10 48
>> > Jul 14 13:49:16 10.0.252.10 8b
>> > Jul 14 13:49:16 10.0.252.10 83
>> > Jul 14 13:49:16 10.0.252.10 e0
>> > Jul 14 13:49:16 10.0.252.10 00
>> > Jul 14 13:49:16 10.0.252.10 00
>> > Jul 14 13:49:16 10.0.252.10 00
>> > Jul 14 13:49:16 10.0.252.10 a8
>> > Jul 14 13:49:16 10.0.252.10 01
>> > Jul 14 13:49:16 10.0.252.10 74
>> > Jul 14 13:49:16 10.0.252.10 12
>> > Jul 14 13:49:16 10.0.252.10 48
>> > Jul 14 13:49:16 10.0.252.10 89
>> > Jul 14 13:49:16 10.0.252.10 df
>> > Jul 14 13:49:16 10.0.252.10 e8
>> > Jul 14 13:49:16 10.0.252.10 87
>> > Jul 14 13:49:16 10.0.252.10 f9
>> > Jul 14 13:49:16 10.0.252.10 6e
>> > Jul 14 13:49:16 10.0.252.10 e1
>> > Jul 14 13:49:16 10.0.252.10 b8
>> > Jul 14 13:49:16 10.0.252.10 f7
>> > Jul 14 13:49:16 10.0.252.10 ff
>> > Jul 14 13:49:16 10.0.252.10 ff
>> > Jul 14 13:49:16 10.0.252.10 ff
>> > Jul 14 13:49:16 10.0.252.10 e9
>> > Jul 14 13:49:16 10.0.252.10 eb
>> > Jul 14 13:49:16 10.0.252.10 00
>> > Jul 14 13:49:16 10.0.252.10 00
>> > Jul 14 13:49:16 10.0.252.10 00
>> > Jul 14 13:49:16 10.0.252.10 8a
>> > Jul 14 13:49:16 10.0.252.10 43
>> > Jul 14 13:49:16 10.0.252.10 12
>> > Jul 14 13:49:16 10.0.252.10 a8
>> > Jul 14 13:49:16 10.0.252.10 0b
>> > Jul 14 13:49:16 10.0.252.10 74
>> > Jul 14 13:49:16 10.0.252.10 1c
>> > Jul 14 13:49:16 10.0.252.10 48
>> > Jul 14 13:49:16 10.0.252.10 8b
>> > Jul 14 13:49:16 10.0.252.10 83
>> > Jul 14 13:49:16 10.0.252.10 b0
>> > Jul 14 13:49:16 10.0.252.10 02
>> > Jul 14 13:49:16 10.0.252.10 00
>> > Jul 14 13:49:16 10.0.252.10 00
>> > Jul 14 13:49:16 10.0.252.10
>> > Jul 14 13:49:16 10.0.252.10 8b
>> > Jul 14 13:49:16 10.0.252.10 80
>> > Jul 14 13:49:16 10.0.252.10 f0
>> > Jul 14 13:49:16 10.0.252.10 03
>> > Jul 14 13:49:16 10.0.252.10 00
>> > Jul 14 13:49:16 10.0.252.10 00
>> > Jul 14 13:49:16 10.0.252.10 65
>> > Jul 14 13:49:16 10.0.252.10 ff
>> > Jul 14 13:49:16 10.0.252.10 08
>> > Jul 14 13:49:16 10.0.252.10 48
>> > Jul 14 13:49:16 10.0.252.10 c7
>> > Jul 14 13:49:16 10.0.252.10 83
>> > Jul 14 13:49:16 10.0.252.10 b0
>> > Jul 14 13:49:16 10.0.252.10 02
>> > Jul 14 13:49:16 10.0.252.10 00
>> > Jul 14 13:49:16 10.0.252.10 00
>> > Jul 14 13:49:16 10.0.252.10 00
>> > Jul 14 13:49:16 10.0.252.10 00
>> > Jul 14 13:49:16 10.0.252.10 00
>> > Jul 14 13:49:16 10.0.252.10 00
>> > Jul 14 13:49:16 10.0.252.10
>> > Jul 14 13:49:16 10.0.252.10 [76078.883913] RIP
>> > Jul 14 13:49:16 10.0.252.10 [<ffffffffa011e12a>]
>> > pppoe_release+0x56/0x142 [pppoe]
>> > Jul 14 13:49:16 10.0.252.10 [76078.884171]  RSP <ffff8800b09f7e28>
>> > Jul 14 13:49:16 10.0.252.10 [76078.884368] CR2: 00000000000003f0
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.867822] BUG: unable to
>> > handle kernel NULL pointer dereference at 00000000000003f0
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.868280] IP:
>> > [<ffffffffa011e12a>] pppoe_release+0x56/0x142 [pppoe]
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.868541] PGD 336e4a067 PUD
>> > 333f17067 PMD 0
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.868918] Oops: 0000 [#1] SMP
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.869226] Modules linked in:
>> > netconsole configfs coretemp sch_fq cls_fw act_police cls_u32
>> > sch_ingress sch_sfq sch_htb pppoe pppox ppp_generic slhc nf_nat_pptp
>> > nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre tun
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.873195] CPU: 3 PID: 2940
>> > Comm: accel-pppd Not tainted 4.1.0-build-0074 #7
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.873396] Hardware name: HP
>> > ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.873598] task:
>> > ffff8800b1886ba0 ti: ffff8800b09f4000 task.ti: ffff8800b09f4000
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.873929] RIP:
>> > 0010:[<ffffffffa011e12a>]  [<ffffffffa011e12a>]
>> > pppoe_release+0x56/0x142 [pppoe]
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.874317] RSP:
>> > 0018:ffff8800b09f7e28  EFLAGS: 00010202
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.874512] RAX:
>> > 0000000000000000 RBX: ffff88032a214400 RCX: 0000000000000000
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.874709] RDX:
>> > 000000000000000d RSI: 00000000fffffe01 RDI: ffffffff8180d6da
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.874906] RBP:
>> > ffff8800b09f7e68 R08: 0000000000000000 R09: 0000000000000000
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.875102] R10:
>> > ffff88031ef6a110 R11: 0000000000000293 R12: ffff88030f8d8fc0
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.875299] R13:
>> > ffff88030f8d8ff0 R14: ffff88033115ee40 R15: ffff8803394e4920
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.875499] FS:
>> > 00007f79b602c700(0000) GS:ffff880347460000(0000)
>> > knlGS:0000000000000000
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.875837] CS:  0010 DS: 0000
>> > ES: 0000 CR0: 0000000080050033
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.876036] CR2:
>> > 00000000000003f0 CR3: 0000000335425000 CR4: 00000000001407e0
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.876239] Stack:
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.876434]  ffff88033ac45c80
>> > 0000000000000000 0000000100000000 ffff88030f8d8fc0
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.877001]  ffffffffa0120260
>> > ffff88030f8d8ff0 ffff88033115ee40 ffff8803394e4920
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.877564]  ffff8800b09f7e88
>> > ffffffff81809e2e ffff88031ef6a100 0000000000000008
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.878128] Call Trace:
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.878327]
>> > [<ffffffff81809e2e>] sock_release+0x1a/0x78
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.878528]
>> > [<ffffffff81809e99>] sock_close+0xd/0x11
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.878728]
>> > [<ffffffff81150395>] __fput+0xdf/0x193
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.878926]
>> > [<ffffffff81150477>] ____fput+0x9/0xb
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.879124]
>> > [<ffffffff810cfa95>] task_work_run+0x85/0x9c
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.879326]
>> > [<ffffffff81002979>] do_notify_resume+0x40/0x4e
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.879527]
>> > [<ffffffff818a4a0a>] int_signal+0x12/0x17
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.879726] Code: 48 8b 83 e0
>> > 00 00 00 a8 01 74 12 48 89 df e8 87 f9 6e e1 b8 f7 ff ff ff e9 eb 00
>> > 00 00 8a 43 12 a8 0b 74 1c 48 8b 83 b0 02 00 00 <48> 8b 80 f0 03 00 00
>> > 65 ff 08 48 c7 83 b0 02 00 00 00 00 00 00
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.883913] RIP
>> > [<ffffffffa011e12a>] pppoe_release+0x56/0x142 [pppoe]
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.884171]  RSP
>> > <ffff8800b09f7e28>
>> > Jul 14 10:49:16 10.0.252.10 kernel: [76078.884368] CR2:
>> > 00000000000003f0
>> > Jul 14 13:49:16 10.0.252.10 [76078.884972] ---[ end trace
>> > 7fa41f8b4758f1fa ]---
>> > Jul 14 10:49:16 10.0.252.10 accel-pppd: pppoe: discard PADR packet
>> > (incorrect AC-Cookie)
>> > Jul 14 10:49:17 10.0.252.10 kernel: [76078.884972] ---[ end trace
>> > 7fa41f8b4758f1fa ]---
>> > Jul 14 13:49:17 10.0.252.10 [76078.936849] Kernel panic - not syncing:
>> > Fatal exception
>> > Jul 14 13:49:17 10.0.252.10 [76078.937054] Kernel Offset: disabled
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 4.1.0, kernel panic, pppoe_release
  2015-07-17 18:16     ` Denys Fedoryshchenko
@ 2015-09-10 15:56       ` Guillaume Nault
  2015-09-22  1:47         ` Denys Fedoryshchenko
  0 siblings, 1 reply; 9+ messages in thread
From: Guillaume Nault @ 2015-09-10 15:56 UTC (permalink / raw)
  To: Denys Fedoryshchenko
  Cc: Dan Williams, Netdev, ebiederm, davem, simon, develop

On Fri, Jul 17, 2015 at 09:16:14PM +0300, Denys Fedoryshchenko wrote:
> Probably my knowledge of kernel is not sufficient, but i will try few
> approaches.
> One of them to add to pppoe_unbind_sock_work:
> 
>         pppox_unbind_sock(sk);
>         +/* Signal the death of the socket. */
>         +sk->sk_state = PPPOX_DEAD;
>
I don't believe this will fix anything. pppox_unbind_sock() already
sets sk->sk_state when necessary.

> I will wait first, to make sure this patch was causing kernel panic (it
> needs 24h testing cycle), then i will try this fix.
> 
I suspect the problem goes with actions performed on the underlying
interface (MAC address, MTU or link state update). This triggers
pppoe_flush_dev(), which cleans up the device without announcing it
in sk->sk_state.

Can you pleas try the following patch?

---
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 3837ae3..2ed7506 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -313,7 +313,6 @@ static void pppoe_flush_dev(struct net_device *dev)
 			if (po->pppoe_dev == dev &&
 			    sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) {
 				pppox_unbind_sock(sk);
-				sk->sk_state = PPPOX_ZOMBIE;
 				sk->sk_state_change(sk);
 				po->pppoe_dev = NULL;
 				dev_put(dev);

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: 4.1.0, kernel panic, pppoe_release
  2015-09-10 15:56       ` Guillaume Nault
@ 2015-09-22  1:47         ` Denys Fedoryshchenko
  2015-09-25 14:38           ` Guillaume Nault
  0 siblings, 1 reply; 9+ messages in thread
From: Denys Fedoryshchenko @ 2015-09-22  1:47 UTC (permalink / raw)
  To: Guillaume Nault; +Cc: Dan Williams, Netdev, ebiederm, davem, simon, develop

Hi,
Sorry for late reply, was not able to push new kernel on pppoes without 
permissions (it's production servers), just got OK.

I am testing patch on another pppoe server with 9k users, for ~3 days, 
seems fine. I will test today
also on server that was experiencing crashes within 1 day.

On 2015-09-10 18:56, Guillaume Nault wrote:
> On Fri, Jul 17, 2015 at 09:16:14PM +0300, Denys Fedoryshchenko wrote:
>> Probably my knowledge of kernel is not sufficient, but i will try few
>> approaches.
>> One of them to add to pppoe_unbind_sock_work:
>> 
>>         pppox_unbind_sock(sk);
>>         +/* Signal the death of the socket. */
>>         +sk->sk_state = PPPOX_DEAD;
>> 
> I don't believe this will fix anything. pppox_unbind_sock() already
> sets sk->sk_state when necessary.
> 
>> I will wait first, to make sure this patch was causing kernel panic 
>> (it
>> needs 24h testing cycle), then i will try this fix.
>> 
> I suspect the problem goes with actions performed on the underlying
> interface (MAC address, MTU or link state update). This triggers
> pppoe_flush_dev(), which cleans up the device without announcing it
> in sk->sk_state.
> 
> Can you pleas try the following patch?
> 
> ---
> diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
> index 3837ae3..2ed7506 100644
> --- a/drivers/net/ppp/pppoe.c
> +++ b/drivers/net/ppp/pppoe.c
> @@ -313,7 +313,6 @@ static void pppoe_flush_dev(struct net_device *dev)
>  			if (po->pppoe_dev == dev &&
>  			    sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) 
> {
>  				pppox_unbind_sock(sk);
> -				sk->sk_state = PPPOX_ZOMBIE;
>  				sk->sk_state_change(sk);
>  				po->pppoe_dev = NULL;
>  				dev_put(dev);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 4.1.0, kernel panic, pppoe_release
  2015-09-22  1:47         ` Denys Fedoryshchenko
@ 2015-09-25 14:38           ` Guillaume Nault
  2015-09-25 15:02             ` Denys Fedoryshchenko
  0 siblings, 1 reply; 9+ messages in thread
From: Guillaume Nault @ 2015-09-25 14:38 UTC (permalink / raw)
  To: Denys Fedoryshchenko
  Cc: Dan Williams, Netdev, ebiederm, davem, simon, develop

On Tue, Sep 22, 2015 at 04:47:48AM +0300, Denys Fedoryshchenko wrote:
> Hi,
> Sorry for late reply, was not able to push new kernel on pppoes without
> permissions (it's production servers), just got OK.
> 
> I am testing patch on another pppoe server with 9k users, for ~3 days, seems
> fine. I will test today
> also on server that was experiencing crashes within 1 day.
>
Thanks for the feedback. I'm about to submit a fix. Should I add a
Tested-by tag for you?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 4.1.0, kernel panic, pppoe_release
  2015-09-25 14:38           ` Guillaume Nault
@ 2015-09-25 15:02             ` Denys Fedoryshchenko
  2015-09-25 19:02               ` Guillaume Nault
  0 siblings, 1 reply; 9+ messages in thread
From: Denys Fedoryshchenko @ 2015-09-25 15:02 UTC (permalink / raw)
  To: Guillaume Nault; +Cc: Dan Williams, Netdev, ebiederm, davem, simon, develop

On 2015-09-25 17:38, Guillaume Nault wrote:
> On Tue, Sep 22, 2015 at 04:47:48AM +0300, Denys Fedoryshchenko wrote:
>> Hi,
>> Sorry for late reply, was not able to push new kernel on pppoes 
>> without
>> permissions (it's production servers), just got OK.
>> 
>> I am testing patch on another pppoe server with 9k users, for ~3 days, 
>> seems
>> fine. I will test today
>> also on server that was experiencing crashes within 1 day.
>> 
> Thanks for the feedback. I'm about to submit a fix. Should I add a
> Tested-by tag for you?
On one of servers i got same crash as before, within hours. 9k users 
server also crashed after while, so it seems it doesn't help.
I will do some more tests tomorrow.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 4.1.0, kernel panic, pppoe_release
  2015-09-25 15:02             ` Denys Fedoryshchenko
@ 2015-09-25 19:02               ` Guillaume Nault
  0 siblings, 0 replies; 9+ messages in thread
From: Guillaume Nault @ 2015-09-25 19:02 UTC (permalink / raw)
  To: Denys Fedoryshchenko
  Cc: Dan Williams, Netdev, ebiederm, davem, simon, develop

On Fri, Sep 25, 2015 at 06:02:42PM +0300, Denys Fedoryshchenko wrote:
> On 2015-09-25 17:38, Guillaume Nault wrote:
> >On Tue, Sep 22, 2015 at 04:47:48AM +0300, Denys Fedoryshchenko wrote:
> >>Hi,
> >>Sorry for late reply, was not able to push new kernel on pppoes without
> >>permissions (it's production servers), just got OK.
> >>
> >>I am testing patch on another pppoe server with 9k users, for ~3 days,
> >>seems
> >>fine. I will test today
> >>also on server that was experiencing crashes within 1 day.
> >>
> >Thanks for the feedback. I'm about to submit a fix. Should I add a
> >Tested-by tag for you?
> On one of servers i got same crash as before, within hours. 9k users server
> also crashed after while, so it seems it doesn't help.
> I will do some more tests tomorrow.
Ok, this must be a different bug then. Do you have a trace of a crash
with the patched kernel?

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-09-25 19:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-14 10:57 4.1.0, kernel panic, pppoe_release Denys Fedoryshchenko
2015-07-17  9:24 ` Denys Fedoryshchenko
2015-07-17 15:36   ` Dan Williams
2015-07-17 18:16     ` Denys Fedoryshchenko
2015-09-10 15:56       ` Guillaume Nault
2015-09-22  1:47         ` Denys Fedoryshchenko
2015-09-25 14:38           ` Guillaume Nault
2015-09-25 15:02             ` Denys Fedoryshchenko
2015-09-25 19:02               ` Guillaume Nault

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.