All of lore.kernel.org
 help / color / mirror / Atom feed
* Urgent Bug Report Kernel crash 6.5.2
@ 2023-09-15  4:05 Martin Zaharinov
  2023-09-15  6:45 ` Eric Dumazet
  2023-09-15 23:00 ` Martin Zaharinov
  0 siblings, 2 replies; 35+ messages in thread
From: Martin Zaharinov @ 2023-09-15  4:05 UTC (permalink / raw)
  To: netdev
  Cc: Eric Dumazet, Paolo Abeni, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern

Hi All 
This is report from kernel 6.5.2 after 4 day up system hang and reboot after this error :



Sep 15 04:32:29 205.254.184.12 [399661.971344][   C31] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Sep 15 04:32:29 205.254.184.12 [399661.971470][   C31] BUG: unable to handle page fault for address: ffffa10c52d43058
Sep 15 04:32:29 205.254.184.12 [399661.971586][   C31] #PF: supervisor instruction fetch in kernel mode
Sep 15 04:32:29 205.254.184.12 [399661.971680][   C31] #PF: error_code(0x0011) - permissions violation
Sep 15 04:32:29 205.254.184.12 [399661.971775][   C31] PGD 12601067 P4D 12601067 PUD 80000002400001e3
Sep 15 04:32:29 205.254.184.12 [399661.971871][   C31] Oops: 0011 [#1] PREEMPT SMP
Sep 15 04:32:29 205.254.184.12 [399661.971963][   C31] CPU: 31 PID: 0 Comm: swapper/31 Tainted: G        W  O       6.5.2 #1
Sep 15 04:32:29 205.254.184.12 [399661.972079][   C31] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.3 10/28/2020
Sep 15 04:32:29 205.254.184.12 [399661.972197][   C31] RIP: 0010:0xffffa10c52d43058
Sep 15 04:32:29 205.254.184.12 [399661.972289][   C31] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 30 d4 52 0c a1 ff ff 00 00 00 00 00 00
Sep 15 04:32:29 205.254.184.12 [399661.972448][   C31] RSP: 0018:ffffad0e0097ccc8 EFLAGS: 00010282
Sep 15 04:32:29 205.254.184.12 [399661.972543][   C31] RAX: ffffa10c52d43058 RBX: ffffa10c52d43000 RCX: 0000000000000000
Sep 15 04:32:29 205.254.184.12 [399661.972659][   C31] RDX: 0000000000002712 RSI: 0000000000000246 RDI: ffffa10c52d43000
Sep 15 04:32:29 205.254.184.12 [399661.972774][   C31] RBP: ffffa10c52d43000 R08: 0000000127a83c46 R09: 0000000000004d8c
Sep 15 04:32:29 205.254.184.12 [399661.972889][   C31] R10: ffffe840ca0f7c00 R11: 0000000000000000 R12: ffffa10c8e764d80
Sep 15 04:32:29 205.254.184.12 [399661.973005][   C31] R13: ffffa10c92b4c760 R14: 0000000000000058 R15: ffffa10c92b4c600
Sep 15 04:32:29 205.254.184.12 [399661.973123][   C31] FS:  0000000000000000(0000) GS:ffffa1125fdc0000(0000) knlGS:0000000000000000
Sep 15 04:32:29 205.254.184.12 [399661.973244][   C31] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 15 04:32:29 205.254.184.12 [399661.973338][   C31] CR2: ffffa10c52d43058 CR3: 00000001059b8001 CR4: 00000000003706e0
Sep 15 04:32:29 205.254.184.12 [399661.973454][   C31] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 15 04:32:29 205.254.184.12 [399661.973569][   C31] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep 15 04:32:29 205.254.184.12 [399661.973684][   C31] Call Trace:
Sep 15 04:32:29 205.254.184.12 [399661.973773][   C31]  <IRQ>
Sep 15 04:32:29 205.254.184.12 [399661.973859][   C31]  ? __die+0xe4/0xf0
Sep 15 04:32:29 205.254.184.12 [399661.973949][   C31]  ? page_fault_oops+0x144/0x3e0
Sep 15 04:32:29 205.254.184.12 [399661.974043][   C31]  ? exc_page_fault+0x92/0xa0
Sep 15 04:32:29 205.254.184.12 [399661.974136][   C31]  ? asm_exc_page_fault+0x22/0x30
Sep 15 04:32:29 205.254.184.12 [399661.974228][   C31]  ? kfree_skb_reason+0x33/0xf0
Sep 15 04:32:29 205.254.184.12 [399661.974321][   C31]  ? tcp_mtu_probe+0x3a6/0x7b0
Sep 15 04:32:29 205.254.184.12 [399661.974416][   C31]  ? tcp_write_xmit+0x7fa/0x1410
Sep 15 04:32:29 205.254.184.12 [399661.974509][   C31]  ? __tcp_push_pending_frames+0x2d/0xb0
Sep 15 04:32:29 205.254.184.12 [399661.974603][   C31]  ? tcp_rcv_established+0x381/0x610
Sep 15 04:32:29 205.254.184.12 [399661.974695][   C31]  ? sk_filter_trim_cap+0xc6/0x1c0
Sep 15 04:32:29 205.254.184.12 [399661.974787][   C31]  ? tcp_v4_do_rcv+0x11f/0x1f0
Sep 15 04:32:29 205.254.184.12 [399661.974877][   C31]  ? tcp_v4_rcv+0xfa1/0x1010
Sep 15 04:32:29 205.254.184.12 [399661.974968][   C31]  ? ip_protocol_deliver_rcu+0x1b/0x270
Sep 15 04:32:29 205.254.184.12 [399661.975062][   C31]  ? ip_local_deliver_finish+0x6d/0x90
Sep 15 04:32:29 205.254.184.12 [399661.976257][   C31]  ? process_backlog+0x10c/0x230
Sep 15 04:32:29 205.254.184.12 [399661.976352][   C31]  ? __napi_poll+0x20/0x180
Sep 15 04:32:29 205.254.184.12 [399661.976442][   C31]  ? net_rx_action+0x2a4/0x390
Sep 15 04:32:29 205.254.184.12 [399661.976534][   C31]  ? __do_softirq+0xd0/0x202
Sep 15 04:32:29 205.254.184.12 [399661.976626][   C31]  ? do_softirq+0x3a/0x50
Sep 15 04:32:29 205.254.184.12 [399661.976718][   C31]  </IRQ>
Sep 15 04:32:29 205.254.184.12 [399661.976805][   C31]  <TASK>
Sep 15 04:32:29 205.254.184.12 [399661.976890][   C31]  ? flush_smp_call_function_queue+0x3f/0x50
Sep 15 04:32:29 205.254.184.12 [399661.976988][   C31]  ? do_idle+0x14d/0x210
Sep 15 04:32:29 205.254.184.12 [399661.977078][   C31]  ? cpu_startup_entry+0x14/0x20
Sep 15 04:32:29 205.254.184.12 [399661.977168][   C31]  ? start_secondary+0xe1/0xf0
Sep 15 04:32:29 205.254.184.12 [399661.977262][   C31]  ? secondary_startup_64_no_verify+0x167/0x16b
Sep 15 04:32:29 205.254.184.12 [399661.977359][   C31]  </TASK>
Sep 15 04:32:29 205.254.184.12 [399661.977448][   C31] Modules linked in: nft_limit nf_conntrack_netlink  pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos 
Sep 15 04:32:29 205.254.184.12 [399661.977720][   C31] CR2: ffffa10c52d43058
Sep 15 04:32:29 205.254.184.12 [399661.977809][   C31] ---[ end trace 0000000000000000 ]---
Sep 15 04:32:29 205.254.184.12 [399661.977901][   C31] RIP: 0010:0xffffa10c52d43058
Sep 15 04:32:29 205.254.184.12 [399661.977992][   C31] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 30 d4 52 0c a1 ff ff 00 00 00 00 00 00
Sep 15 04:32:29 205.254.184.12 [399661.978150][   C31] RSP: 0018:ffffad0e0097ccc8 EFLAGS: 00010282
Sep 15 04:32:29 205.254.184.12 [399661.978243][   C31] RAX: ffffa10c52d43058 RBX: ffffa10c52d43000 RCX: 0000000000000000
Sep 15 04:32:29 205.254.184.12 [399661.978358][   C31] RDX: 0000000000002712 RSI: 0000000000000246 RDI: ffffa10c52d43000
Sep 15 04:32:29 205.254.184.12 [399661.978472][   C31] RBP: ffffa10c52d43000 R08: 0000000127a83c46 R09: 0000000000004d8c
Sep 15 04:32:29 205.254.184.12 [399661.978587][   C31] R10: ffffe840ca0f7c00 R11: 0000000000000000 R12: ffffa10c8e764d80
Sep 15 04:32:29 205.254.184.12 [399661.978702][   C31] R13: ffffa10c92b4c760 R14: 0000000000000058 R15: ffffa10c92b4c600
Sep 15 04:32:29 205.254.184.12 [399661.978818][   C31] FS:  0000000000000000(0000) GS:ffffa1125fdc0000(0000) knlGS:0000000000000000
Sep 15 04:32:29 205.254.184.12 [399661.978940][   C31] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 15 04:32:29 205.254.184.12 [399661.979036][   C31] CR2: ffffa10c52d43058 CR3: 00000001059b8001 CR4: 00000000003706e0
Sep 15 04:32:29 205.254.184.12 [399661.979150][   C31] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 15 04:32:29 205.254.184.12 [399661.979265][   C31] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep 15 04:32:29 205.254.184.12 [399661.979381][   C31] Kernel panic - not syncing: Fatal exception in interrupt
Sep 15 04:32:29 205.254.184.12 [399662.084038][   C31] Kernel Offset: 0x1e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Sep 15 04:32:29 205.254.184.12 [399662.084162][   C31] Rebooting in 10 seconds..


Please if find fix update me .

m.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-15  4:05 Urgent Bug Report Kernel crash 6.5.2 Martin Zaharinov
@ 2023-09-15  6:45 ` Eric Dumazet
  2023-09-15 22:23   ` Martin Zaharinov
  2023-11-16 14:17   ` Martin Zaharinov
  2023-09-15 23:00 ` Martin Zaharinov
  1 sibling, 2 replies; 35+ messages in thread
From: Eric Dumazet @ 2023-09-15  6:45 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: netdev, Paolo Abeni, patchwork-bot+netdevbpf, Jakub Kicinski,
	Stephen Hemminger, kuba+netdrv, dsahern

On Fri, Sep 15, 2023 at 6:05 AM Martin Zaharinov <micron10@gmail.com> wrote:
>
> Hi All
> This is report from kernel 6.5.2 after 4 day up system hang and reboot after this error :
>
>
>
> Sep 15 04:32:29 205.254.184.12 [399661.971344][   C31] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> Sep 15 04:32:29 205.254.184.12 [399661.971470][   C31] BUG: unable to handle page fault for address: ffffa10c52d43058
> Sep 15 04:32:29 205.254.184.12 [399661.971586][   C31] #PF: supervisor instruction fetch in kernel mode
> Sep 15 04:32:29 205.254.184.12 [399661.971680][   C31] #PF: error_code(0x0011) - permissions violation
> Sep 15 04:32:29 205.254.184.12 [399661.971775][   C31] PGD 12601067 P4D 12601067 PUD 80000002400001e3
> Sep 15 04:32:29 205.254.184.12 [399661.971871][   C31] Oops: 0011 [#1] PREEMPT SMP
> Sep 15 04:32:29 205.254.184.12 [399661.971963][   C31] CPU: 31 PID: 0 Comm: swapper/31 Tainted: G        W  O       6.5.2 #1
> Sep 15 04:32:29 205.254.184.12 [399661.972079][   C31] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.3 10/28/2020
> Sep 15 04:32:29 205.254.184.12 [399661.972197][   C31] RIP: 0010:0xffffa10c52d43058
> Sep 15 04:32:29 205.254.184.12 [399661.972289][   C31] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 30 d4 52 0c a1 ff ff 00 00 00 00 00 00
> Sep 15 04:32:29 205.254.184.12 [399661.972448][   C31] RSP: 0018:ffffad0e0097ccc8 EFLAGS: 00010282
> Sep 15 04:32:29 205.254.184.12 [399661.972543][   C31] RAX: ffffa10c52d43058 RBX: ffffa10c52d43000 RCX: 0000000000000000
> Sep 15 04:32:29 205.254.184.12 [399661.972659][   C31] RDX: 0000000000002712 RSI: 0000000000000246 RDI: ffffa10c52d43000
> Sep 15 04:32:29 205.254.184.12 [399661.972774][   C31] RBP: ffffa10c52d43000 R08: 0000000127a83c46 R09: 0000000000004d8c
> Sep 15 04:32:29 205.254.184.12 [399661.972889][   C31] R10: ffffe840ca0f7c00 R11: 0000000000000000 R12: ffffa10c8e764d80
> Sep 15 04:32:29 205.254.184.12 [399661.973005][   C31] R13: ffffa10c92b4c760 R14: 0000000000000058 R15: ffffa10c92b4c600
> Sep 15 04:32:29 205.254.184.12 [399661.973123][   C31] FS:  0000000000000000(0000) GS:ffffa1125fdc0000(0000) knlGS:0000000000000000
> Sep 15 04:32:29 205.254.184.12 [399661.973244][   C31] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Sep 15 04:32:29 205.254.184.12 [399661.973338][   C31] CR2: ffffa10c52d43058 CR3: 00000001059b8001 CR4: 00000000003706e0
> Sep 15 04:32:29 205.254.184.12 [399661.973454][   C31] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep 15 04:32:29 205.254.184.12 [399661.973569][   C31] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Sep 15 04:32:29 205.254.184.12 [399661.973684][   C31] Call Trace:
> Sep 15 04:32:29 205.254.184.12 [399661.973773][   C31]  <IRQ>
> Sep 15 04:32:29 205.254.184.12 [399661.973859][   C31]  ? __die+0xe4/0xf0
> Sep 15 04:32:29 205.254.184.12 [399661.973949][   C31]  ? page_fault_oops+0x144/0x3e0
> Sep 15 04:32:29 205.254.184.12 [399661.974043][   C31]  ? exc_page_fault+0x92/0xa0
> Sep 15 04:32:29 205.254.184.12 [399661.974136][   C31]  ? asm_exc_page_fault+0x22/0x30
> Sep 15 04:32:29 205.254.184.12 [399661.974228][   C31]  ? kfree_skb_reason+0x33/0xf0
> Sep 15 04:32:29 205.254.184.12 [399661.974321][   C31]  ? tcp_mtu_probe+0x3a6/0x7b0
> Sep 15 04:32:29 205.254.184.12 [399661.974416][   C31]  ? tcp_write_xmit+0x7fa/0x1410
> Sep 15 04:32:29 205.254.184.12 [399661.974509][   C31]  ? __tcp_push_pending_frames+0x2d/0xb0
> Sep 15 04:32:29 205.254.184.12 [399661.974603][   C31]  ? tcp_rcv_established+0x381/0x610
> Sep 15 04:32:29 205.254.184.12 [399661.974695][   C31]  ? sk_filter_trim_cap+0xc6/0x1c0
> Sep 15 04:32:29 205.254.184.12 [399661.974787][   C31]  ? tcp_v4_do_rcv+0x11f/0x1f0
> Sep 15 04:32:29 205.254.184.12 [399661.974877][   C31]  ? tcp_v4_rcv+0xfa1/0x1010

Your reports are not usable. Please make sure to include symbols next time.

Please read these parts (and possibly complete files)

Documentation/admin-guide/bug-hunting.rst:55:quality of the stack
trace by using file:`scripts/decode_stacktrace.sh`.

Documentation/admin-guide/reporting-issues.rst:978:
[user@something ~]$ sudo dmesg |
./linux-5.10.5/scripts/decode_stacktrace.sh ./linux-5.10.5/vmlinux
Documentation/admin-guide/reporting-issues.rst:985:
[user@something ~]$ sudo dmesg |
./linux-5.10.5/scripts/decode_stacktrace.sh \



> Sep 15 04:32:29 205.254.184.12 [399661.974968][   C31]  ? ip_protocol_deliver_rcu+0x1b/0x270
> Sep 15 04:32:29 205.254.184.12 [399661.975062][   C31]  ? ip_local_deliver_finish+0x6d/0x90
> Sep 15 04:32:29 205.254.184.12 [399661.976257][   C31]  ? process_backlog+0x10c/0x230
> Sep 15 04:32:29 205.254.184.12 [399661.976352][   C31]  ? __napi_poll+0x20/0x180
> Sep 15 04:32:29 205.254.184.12 [399661.976442][   C31]  ? net_rx_action+0x2a4/0x390
> Sep 15 04:32:29 205.254.184.12 [399661.976534][   C31]  ? __do_softirq+0xd0/0x202
> Sep 15 04:32:29 205.254.184.12 [399661.976626][   C31]  ? do_softirq+0x3a/0x50
> Sep 15 04:32:29 205.254.184.12 [399661.976718][   C31]  </IRQ>
> Sep 15 04:32:29 205.254.184.12 [399661.976805][   C31]  <TASK>
> Sep 15 04:32:29 205.254.184.12 [399661.976890][   C31]  ? flush_smp_call_function_queue+0x3f/0x50
> Sep 15 04:32:29 205.254.184.12 [399661.976988][   C31]  ? do_idle+0x14d/0x210
> Sep 15 04:32:29 205.254.184.12 [399661.977078][   C31]  ? cpu_startup_entry+0x14/0x20
> Sep 15 04:32:29 205.254.184.12 [399661.977168][   C31]  ? start_secondary+0xe1/0xf0
> Sep 15 04:32:29 205.254.184.12 [399661.977262][   C31]  ? secondary_startup_64_no_verify+0x167/0x16b
> Sep 15 04:32:29 205.254.184.12 [399661.977359][   C31]  </TASK>
> Sep 15 04:32:29 205.254.184.12 [399661.977448][   C31] Modules linked in: nft_limit nf_conntrack_netlink  pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
> Sep 15 04:32:29 205.254.184.12 [399661.977720][   C31] CR2: ffffa10c52d43058
> Sep 15 04:32:29 205.254.184.12 [399661.977809][   C31] ---[ end trace 0000000000000000 ]---
> Sep 15 04:32:29 205.254.184.12 [399661.977901][   C31] RIP: 0010:0xffffa10c52d43058
> Sep 15 04:32:29 205.254.184.12 [399661.977992][   C31] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 30 d4 52 0c a1 ff ff 00 00 00 00 00 00
> Sep 15 04:32:29 205.254.184.12 [399661.978150][   C31] RSP: 0018:ffffad0e0097ccc8 EFLAGS: 00010282
> Sep 15 04:32:29 205.254.184.12 [399661.978243][   C31] RAX: ffffa10c52d43058 RBX: ffffa10c52d43000 RCX: 0000000000000000
> Sep 15 04:32:29 205.254.184.12 [399661.978358][   C31] RDX: 0000000000002712 RSI: 0000000000000246 RDI: ffffa10c52d43000
> Sep 15 04:32:29 205.254.184.12 [399661.978472][   C31] RBP: ffffa10c52d43000 R08: 0000000127a83c46 R09: 0000000000004d8c
> Sep 15 04:32:29 205.254.184.12 [399661.978587][   C31] R10: ffffe840ca0f7c00 R11: 0000000000000000 R12: ffffa10c8e764d80
> Sep 15 04:32:29 205.254.184.12 [399661.978702][   C31] R13: ffffa10c92b4c760 R14: 0000000000000058 R15: ffffa10c92b4c600
> Sep 15 04:32:29 205.254.184.12 [399661.978818][   C31] FS:  0000000000000000(0000) GS:ffffa1125fdc0000(0000) knlGS:0000000000000000
> Sep 15 04:32:29 205.254.184.12 [399661.978940][   C31] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Sep 15 04:32:29 205.254.184.12 [399661.979036][   C31] CR2: ffffa10c52d43058 CR3: 00000001059b8001 CR4: 00000000003706e0
> Sep 15 04:32:29 205.254.184.12 [399661.979150][   C31] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep 15 04:32:29 205.254.184.12 [399661.979265][   C31] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Sep 15 04:32:29 205.254.184.12 [399661.979381][   C31] Kernel panic - not syncing: Fatal exception in interrupt
> Sep 15 04:32:29 205.254.184.12 [399662.084038][   C31] Kernel Offset: 0x1e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> Sep 15 04:32:29 205.254.184.12 [399662.084162][   C31] Rebooting in 10 seconds..
>
>
> Please if find fix update me .
>
> m.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-15  6:45 ` Eric Dumazet
@ 2023-09-15 22:23   ` Martin Zaharinov
  2023-11-16 14:17   ` Martin Zaharinov
  1 sibling, 0 replies; 35+ messages in thread
From: Martin Zaharinov @ 2023-09-15 22:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Paolo Abeni, patchwork-bot+netdevbpf, Jakub Kicinski,
	Stephen Hemminger, kuba+netdrv, dsahern

Hi Eric

run decode script … but i think miss function name … check log : 


  15:	eb cf                	jmp    0xffffffffffffffe6
  17:	48 c7 c7 68 f6 e2 8e 	mov    $0xffffffff8ee2f668,%rdi
  1e:	c6 05 ac ae e6 00 01 	movb   $0x1,0xe6aeac(%rip)        # 0xe6aed1
  25:	e8 11 71 c7 ff       	call   0xffffffffffc7713b
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	eb df                	jmp    0xd
  2e:	cc                   	int3
  2f:	cc                   	int3
  30:	cc                   	int3
  31:	cc                   	int3
  32:	cc                   	int3
  33:	cc                   	int3
  34:	cc                   	int3
  35:	cc                   	int3
  36:	cc                   	int3
  37:	cc                   	int3
  38:	cc                   	int3
  39:	cc                   	int3
  3a:	cc                   	int3
  3b:	48 89 fa             	mov    %rdi,%rdx
  3e:	83                   	.byte 0x83
  3f:	e2                   	.byte 0xe2

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	eb df                	jmp    0xffffffffffffffe3
   4:	cc                   	int3
   5:	cc                   	int3
   6:	cc                   	int3
   7:	cc                   	int3
   8:	cc                   	int3
   9:	cc                   	int3
   a:	cc                   	int3
   b:	cc                   	int3
   c:	cc                   	int3
   d:	cc                   	int3
   e:	cc                   	int3
   f:	cc                   	int3
  10:	cc                   	int3
  11:	48 89 fa             	mov    %rdi,%rdx
  14:	83                   	.byte 0x83
  15:	e2                   	.byte 0xe2
[40915.531389] RSP: 0018:ffffa62680318de8 EFLAGS: 00010296
[40915.531487] RAX: 0000000000000019 RBX: ffff982f02950c40 RCX: 00000000fffbffff
[40915.531605] RDX: 00000000fffbffff RSI: 0000000000000001 RDI: 00000000ffffffea
[40915.531721] RBP: ffff982e467d2000 R08: 0000000000000000 R09: 00000000fffbffff
[40915.531839] R10: ffff98359d600000 R11: 0000000000000003 R12: ffff982f044e16c0
[40915.531956] R13: 0000000000000000 R14: 0000000000000258 R15: ffffa62680318f60
[40915.532075] FS:  0000000000000000(0000) GS:ffff98359fbc0000(0000) knlGS:0000000000000000
[40915.532195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[40915.532291] CR2: 00005593eb3ff078 CR3: 0000000179f6e001 CR4: 00000000003706e0
[40915.532409] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[40915.532526] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[40915.532645] Call Trace:
[40915.532736]  <IRQ>
[40915.532824] ? __warn (??:?)
[40915.532918] ? report_bug (??:?)
[40915.533011] ? handle_bug (traps.c:?)
[40915.533104] ? exc_invalid_op (??:?)
[40915.533198] ? asm_exc_invalid_op (??:?)
[40915.533294] ? rcuref_put_slowpath (??:?)
[40915.533389] ? rcuref_put_slowpath (??:?)
[40915.533482] dst_release (??:?)
[40915.533576] __dev_queue_xmit (??:?)
[40915.533671] ? eth_header (??:?)
[40915.533766] ip_finish_output2 (ip_output.c:?)
[40915.533863] process_backlog (dev.c:?)
[40915.533958] __napi_poll (dev.c:?)
[40915.534050] net_rx_action (dev.c:?)
[40915.534140] __do_softirq (??:?)
[40915.534233] do_softirq (??:?)
[40915.534326]  </IRQ>
[40915.534413]  <TASK>
[40915.534503] flush_smp_call_function_queue (??:?)
[40915.534597] do_idle (build_policy.c:?)
[40915.534687] cpu_startup_entry (??:?)
[40915.534778] start_secondary (smpboot.c:?)
[40915.534871] secondary_startup_64_no_verify (??:?)
[40915.534968]  </TASK>
[40915.535057] ---[ end trace 0000000000000000 ]—



For me may be problem is in this part : 

[40915.533863] process_backlog (dev.c:?)
[40915.533958] __napi_poll (dev.c:?)
[40915.534050] net_rx_action (dev.c:?)

this start after upgrade to kernel 6.3.x
with 6.2.x i dont have this problem.

m.

> On 15 Sep 2023, at 9:45, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Fri, Sep 15, 2023 at 6:05 AM Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> Hi All
>> This is report from kernel 6.5.2 after 4 day up system hang and reboot after this error :
>> 
>> 
>> 
>> Sep 15 04:32:29 205.254.184.12 [399661.971344][   C31] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
>> Sep 15 04:32:29 205.254.184.12 [399661.971470][   C31] BUG: unable to handle page fault for address: ffffa10c52d43058
>> Sep 15 04:32:29 205.254.184.12 [399661.971586][   C31] #PF: supervisor instruction fetch in kernel mode
>> Sep 15 04:32:29 205.254.184.12 [399661.971680][   C31] #PF: error_code(0x0011) - permissions violation
>> Sep 15 04:32:29 205.254.184.12 [399661.971775][   C31] PGD 12601067 P4D 12601067 PUD 80000002400001e3
>> Sep 15 04:32:29 205.254.184.12 [399661.971871][   C31] Oops: 0011 [#1] PREEMPT SMP
>> Sep 15 04:32:29 205.254.184.12 [399661.971963][   C31] CPU: 31 PID: 0 Comm: swapper/31 Tainted: G        W  O       6.5.2 #1
>> Sep 15 04:32:29 205.254.184.12 [399661.972079][   C31] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.3 10/28/2020
>> Sep 15 04:32:29 205.254.184.12 [399661.972197][   C31] RIP: 0010:0xffffa10c52d43058
>> Sep 15 04:32:29 205.254.184.12 [399661.972289][   C31] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 30 d4 52 0c a1 ff ff 00 00 00 00 00 00
>> Sep 15 04:32:29 205.254.184.12 [399661.972448][   C31] RSP: 0018:ffffad0e0097ccc8 EFLAGS: 00010282
>> Sep 15 04:32:29 205.254.184.12 [399661.972543][   C31] RAX: ffffa10c52d43058 RBX: ffffa10c52d43000 RCX: 0000000000000000
>> Sep 15 04:32:29 205.254.184.12 [399661.972659][   C31] RDX: 0000000000002712 RSI: 0000000000000246 RDI: ffffa10c52d43000
>> Sep 15 04:32:29 205.254.184.12 [399661.972774][   C31] RBP: ffffa10c52d43000 R08: 0000000127a83c46 R09: 0000000000004d8c
>> Sep 15 04:32:29 205.254.184.12 [399661.972889][   C31] R10: ffffe840ca0f7c00 R11: 0000000000000000 R12: ffffa10c8e764d80
>> Sep 15 04:32:29 205.254.184.12 [399661.973005][   C31] R13: ffffa10c92b4c760 R14: 0000000000000058 R15: ffffa10c92b4c600
>> Sep 15 04:32:29 205.254.184.12 [399661.973123][   C31] FS:  0000000000000000(0000) GS:ffffa1125fdc0000(0000) knlGS:0000000000000000
>> Sep 15 04:32:29 205.254.184.12 [399661.973244][   C31] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Sep 15 04:32:29 205.254.184.12 [399661.973338][   C31] CR2: ffffa10c52d43058 CR3: 00000001059b8001 CR4: 00000000003706e0
>> Sep 15 04:32:29 205.254.184.12 [399661.973454][   C31] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Sep 15 04:32:29 205.254.184.12 [399661.973569][   C31] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Sep 15 04:32:29 205.254.184.12 [399661.973684][   C31] Call Trace:
>> Sep 15 04:32:29 205.254.184.12 [399661.973773][   C31]  <IRQ>
>> Sep 15 04:32:29 205.254.184.12 [399661.973859][   C31]  ? __die+0xe4/0xf0
>> Sep 15 04:32:29 205.254.184.12 [399661.973949][   C31]  ? page_fault_oops+0x144/0x3e0
>> Sep 15 04:32:29 205.254.184.12 [399661.974043][   C31]  ? exc_page_fault+0x92/0xa0
>> Sep 15 04:32:29 205.254.184.12 [399661.974136][   C31]  ? asm_exc_page_fault+0x22/0x30
>> Sep 15 04:32:29 205.254.184.12 [399661.974228][   C31]  ? kfree_skb_reason+0x33/0xf0
>> Sep 15 04:32:29 205.254.184.12 [399661.974321][   C31]  ? tcp_mtu_probe+0x3a6/0x7b0
>> Sep 15 04:32:29 205.254.184.12 [399661.974416][   C31]  ? tcp_write_xmit+0x7fa/0x1410
>> Sep 15 04:32:29 205.254.184.12 [399661.974509][   C31]  ? __tcp_push_pending_frames+0x2d/0xb0
>> Sep 15 04:32:29 205.254.184.12 [399661.974603][   C31]  ? tcp_rcv_established+0x381/0x610
>> Sep 15 04:32:29 205.254.184.12 [399661.974695][   C31]  ? sk_filter_trim_cap+0xc6/0x1c0
>> Sep 15 04:32:29 205.254.184.12 [399661.974787][   C31]  ? tcp_v4_do_rcv+0x11f/0x1f0
>> Sep 15 04:32:29 205.254.184.12 [399661.974877][   C31]  ? tcp_v4_rcv+0xfa1/0x1010
> 
> Your reports are not usable. Please make sure to include symbols next time.
> 
> Please read these parts (and possibly complete files)
> 
> Documentation/admin-guide/bug-hunting.rst:55:quality of the stack
> trace by using file:`scripts/decode_stacktrace.sh`.
> 
> Documentation/admin-guide/reporting-issues.rst:978:
> [user@something ~]$ sudo dmesg |
> ./linux-5.10.5/scripts/decode_stacktrace.sh ./linux-5.10.5/vmlinux
> Documentation/admin-guide/reporting-issues.rst:985:
> [user@something ~]$ sudo dmesg |
> ./linux-5.10.5/scripts/decode_stacktrace.sh \
> 
> 
> 
>> Sep 15 04:32:29 205.254.184.12 [399661.974968][   C31]  ? ip_protocol_deliver_rcu+0x1b/0x270
>> Sep 15 04:32:29 205.254.184.12 [399661.975062][   C31]  ? ip_local_deliver_finish+0x6d/0x90
>> Sep 15 04:32:29 205.254.184.12 [399661.976257][   C31]  ? process_backlog+0x10c/0x230
>> Sep 15 04:32:29 205.254.184.12 [399661.976352][   C31]  ? __napi_poll+0x20/0x180
>> Sep 15 04:32:29 205.254.184.12 [399661.976442][   C31]  ? net_rx_action+0x2a4/0x390
>> Sep 15 04:32:29 205.254.184.12 [399661.976534][   C31]  ? __do_softirq+0xd0/0x202
>> Sep 15 04:32:29 205.254.184.12 [399661.976626][   C31]  ? do_softirq+0x3a/0x50
>> Sep 15 04:32:29 205.254.184.12 [399661.976718][   C31]  </IRQ>
>> Sep 15 04:32:29 205.254.184.12 [399661.976805][   C31]  <TASK>
>> Sep 15 04:32:29 205.254.184.12 [399661.976890][   C31]  ? flush_smp_call_function_queue+0x3f/0x50
>> Sep 15 04:32:29 205.254.184.12 [399661.976988][   C31]  ? do_idle+0x14d/0x210
>> Sep 15 04:32:29 205.254.184.12 [399661.977078][   C31]  ? cpu_startup_entry+0x14/0x20
>> Sep 15 04:32:29 205.254.184.12 [399661.977168][   C31]  ? start_secondary+0xe1/0xf0
>> Sep 15 04:32:29 205.254.184.12 [399661.977262][   C31]  ? secondary_startup_64_no_verify+0x167/0x16b
>> Sep 15 04:32:29 205.254.184.12 [399661.977359][   C31]  </TASK>
>> Sep 15 04:32:29 205.254.184.12 [399661.977448][   C31] Modules linked in: nft_limit nf_conntrack_netlink  pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
>> Sep 15 04:32:29 205.254.184.12 [399661.977720][   C31] CR2: ffffa10c52d43058
>> Sep 15 04:32:29 205.254.184.12 [399661.977809][   C31] ---[ end trace 0000000000000000 ]---
>> Sep 15 04:32:29 205.254.184.12 [399661.977901][   C31] RIP: 0010:0xffffa10c52d43058
>> Sep 15 04:32:29 205.254.184.12 [399661.977992][   C31] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 30 d4 52 0c a1 ff ff 00 00 00 00 00 00
>> Sep 15 04:32:29 205.254.184.12 [399661.978150][   C31] RSP: 0018:ffffad0e0097ccc8 EFLAGS: 00010282
>> Sep 15 04:32:29 205.254.184.12 [399661.978243][   C31] RAX: ffffa10c52d43058 RBX: ffffa10c52d43000 RCX: 0000000000000000
>> Sep 15 04:32:29 205.254.184.12 [399661.978358][   C31] RDX: 0000000000002712 RSI: 0000000000000246 RDI: ffffa10c52d43000
>> Sep 15 04:32:29 205.254.184.12 [399661.978472][   C31] RBP: ffffa10c52d43000 R08: 0000000127a83c46 R09: 0000000000004d8c
>> Sep 15 04:32:29 205.254.184.12 [399661.978587][   C31] R10: ffffe840ca0f7c00 R11: 0000000000000000 R12: ffffa10c8e764d80
>> Sep 15 04:32:29 205.254.184.12 [399661.978702][   C31] R13: ffffa10c92b4c760 R14: 0000000000000058 R15: ffffa10c92b4c600
>> Sep 15 04:32:29 205.254.184.12 [399661.978818][   C31] FS:  0000000000000000(0000) GS:ffffa1125fdc0000(0000) knlGS:0000000000000000
>> Sep 15 04:32:29 205.254.184.12 [399661.978940][   C31] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Sep 15 04:32:29 205.254.184.12 [399661.979036][   C31] CR2: ffffa10c52d43058 CR3: 00000001059b8001 CR4: 00000000003706e0
>> Sep 15 04:32:29 205.254.184.12 [399661.979150][   C31] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Sep 15 04:32:29 205.254.184.12 [399661.979265][   C31] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Sep 15 04:32:29 205.254.184.12 [399661.979381][   C31] Kernel panic - not syncing: Fatal exception in interrupt
>> Sep 15 04:32:29 205.254.184.12 [399662.084038][   C31] Kernel Offset: 0x1e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> Sep 15 04:32:29 205.254.184.12 [399662.084162][   C31] Rebooting in 10 seconds..
>> 
>> 
>> Please if find fix update me .
>> 
>> m.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-15  4:05 Urgent Bug Report Kernel crash 6.5.2 Martin Zaharinov
  2023-09-15  6:45 ` Eric Dumazet
@ 2023-09-15 23:00 ` Martin Zaharinov
  2023-09-15 23:11   ` Martin Zaharinov
  1 sibling, 1 reply; 35+ messages in thread
From: Martin Zaharinov @ 2023-09-15 23:00 UTC (permalink / raw)
  To: netdev
  Cc: Eric Dumazet, Paolo Abeni, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern

Ok fix 
one note this is kernel 6.5.3 …


see log now : 


[40915.530445] ------------[ cut here ]------------
[40915.530529] rcuref - imbalanced put()
[40915.530540] WARNING: CPU: 7 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
[40915.530698] Modules linked in: nf_conntrack_netlink nft_limit pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
[40915.530899] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G           O       6.5.3 #1
[40915.531018] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.3 10/28/2020
[40915.531137] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
[40915.531230] Code: 31 c0 eb e2 80 3d c6 ae e6 00 00 74 0a c7 03 00 00 00 e0 31 c0 eb cf 48 c7 c7 68 f6 e2 8e c6 05 ac ae e6 00 01 e8 11 71 c7 ff <0f> 0b eb df cc cc cc cc cc cc cc cc cc cc cc cc cc 48 89 fa 83 e2
All code
========
   0:	31 c0                	xor    %eax,%eax
   2:	eb e2                	jmp    0xffffffffffffffe6
   4:	80 3d c6 ae e6 00 00 	cmpb   $0x0,0xe6aec6(%rip)        # 0xe6aed1
   b:	74 0a                	je     0x17
   d:	c7 03 00 00 00 e0    	movl   $0xe0000000,(%rbx)
  13:	31 c0                	xor    %eax,%eax
  15:	eb cf                	jmp    0xffffffffffffffe6
  17:	48 c7 c7 68 f6 e2 8e 	mov    $0xffffffff8ee2f668,%rdi
  1e:	c6 05 ac ae e6 00 01 	movb   $0x1,0xe6aeac(%rip)        # 0xe6aed1
  25:	e8 11 71 c7 ff       	call   0xffffffffffc7713b
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	eb df                	jmp    0xd
  2e:	cc                   	int3
  2f:	cc                   	int3
  30:	cc                   	int3
  31:	cc                   	int3
  32:	cc                   	int3
  33:	cc                   	int3
  34:	cc                   	int3
  35:	cc                   	int3
  36:	cc                   	int3
  37:	cc                   	int3
  38:	cc                   	int3
  39:	cc                   	int3
  3a:	cc                   	int3
  3b:	48 89 fa             	mov    %rdi,%rdx
  3e:	83                   	.byte 0x83
  3f:	e2                   	.byte 0xe2

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	eb df                	jmp    0xffffffffffffffe3
   4:	cc                   	int3
   5:	cc                   	int3
   6:	cc                   	int3
   7:	cc                   	int3
   8:	cc                   	int3
   9:	cc                   	int3
   a:	cc                   	int3
   b:	cc                   	int3
   c:	cc                   	int3
   d:	cc                   	int3
   e:	cc                   	int3
   f:	cc                   	int3
  10:	cc                   	int3
  11:	48 89 fa             	mov    %rdi,%rdx
  14:	83                   	.byte 0x83
  15:	e2                   	.byte 0xe2
[40915.531389] RSP: 0018:ffffa62680318de8 EFLAGS: 00010296
[40915.531487] RAX: 0000000000000019 RBX: ffff982f02950c40 RCX: 00000000fffbffff
[40915.531605] RDX: 00000000fffbffff RSI: 0000000000000001 RDI: 00000000ffffffea
[40915.531721] RBP: ffff982e467d2000 R08: 0000000000000000 R09: 00000000fffbffff
[40915.531839] R10: ffff98359d600000 R11: 0000000000000003 R12: ffff982f044e16c0
[40915.531956] R13: 0000000000000000 R14: 0000000000000258 R15: ffffa62680318f60
[40915.532075] FS:  0000000000000000(0000) GS:ffff98359fbc0000(0000) knlGS:0000000000000000
[40915.532195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[40915.532291] CR2: 00005593eb3ff078 CR3: 0000000179f6e001 CR4: 00000000003706e0
[40915.532409] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[40915.532526] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[40915.532645] Call Trace:
[40915.532736]  <IRQ>
[40915.532824] ? __warn (kernel/panic.c:668)
[40915.532918] ? report_bug (lib/bug.c:223)
[40915.533011] ? handle_bug (arch/x86/kernel/traps.c:324)
[40915.533104] ? exc_invalid_op (arch/x86/kernel/traps.c:345 (discriminator 1))
[40915.533198] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
[40915.533294] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
[40915.533389] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
[40915.533482] dst_release (./include/linux/rcuref.h:151 net/core/dst.c:166)
[40915.533576] __dev_queue_xmit (net/core/dev.c:4138)
[40915.533671] ? eth_header (net/ethernet/eth.c:83)
[40915.533766] ip_finish_output2 (./include/net/neighbour.h:544 net/ipv4/ip_output.c:230)
[40915.533863] process_backlog (net/core/dev.c:5451 net/core/dev.c:5566 net/core/dev.c:5895)
[40915.533958] __napi_poll+0x20/0x180
[40915.534050] net_rx_action (net/core/dev.c:5839 net/core/dev.c:5860 net/core/dev.c:6684)
[40915.534140] __do_softirq (./arch/x86/include/asm/bitops.h:319 kernel/softirq.c:550)
[40915.534233] do_softirq (kernel/softirq.c:463 (discriminator 32))
[40915.534326]  </IRQ>
[40915.534413]  <TASK>
[40915.534503] flush_smp_call_function_queue (kernel/smp.c:563 (discriminator 1))
[40915.534597] do_idle (kernel/sched/idle.c:295)
[40915.534687] cpu_startup_entry (kernel/sched/idle.c:379 (discriminator 1))
[40915.534778] start_secondary (arch/x86/kernel/smpboot.c:326)
[40915.534871] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:441)
[40915.534968]  </TASK>
[40915.535057] ---[ end trace 0000000000000000 ]---

> On 15 Sep 2023, at 7:05, Martin Zaharinov <micron10@gmail.com> wrote:
> 
> Hi All 
> This is report from kernel 6.5.2 after 4 day up system hang and reboot after this error :
> 
> 
> 
> Sep 15 04:32:29 205.254.184.12 [399661.971344][   C31] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> Sep 15 04:32:29 205.254.184.12 [399661.971470][   C31] BUG: unable to handle page fault for address: ffffa10c52d43058
> Sep 15 04:32:29 205.254.184.12 [399661.971586][   C31] #PF: supervisor instruction fetch in kernel mode
> Sep 15 04:32:29 205.254.184.12 [399661.971680][   C31] #PF: error_code(0x0011) - permissions violation
> Sep 15 04:32:29 205.254.184.12 [399661.971775][   C31] PGD 12601067 P4D 12601067 PUD 80000002400001e3
> Sep 15 04:32:29 205.254.184.12 [399661.971871][   C31] Oops: 0011 [#1] PREEMPT SMP
> Sep 15 04:32:29 205.254.184.12 [399661.971963][   C31] CPU: 31 PID: 0 Comm: swapper/31 Tainted: G        W  O       6.5.2 #1
> Sep 15 04:32:29 205.254.184.12 [399661.972079][   C31] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.3 10/28/2020
> Sep 15 04:32:29 205.254.184.12 [399661.972197][   C31] RIP: 0010:0xffffa10c52d43058
> Sep 15 04:32:29 205.254.184.12 [399661.972289][   C31] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 30 d4 52 0c a1 ff ff 00 00 00 00 00 00
> Sep 15 04:32:29 205.254.184.12 [399661.972448][   C31] RSP: 0018:ffffad0e0097ccc8 EFLAGS: 00010282
> Sep 15 04:32:29 205.254.184.12 [399661.972543][   C31] RAX: ffffa10c52d43058 RBX: ffffa10c52d43000 RCX: 0000000000000000
> Sep 15 04:32:29 205.254.184.12 [399661.972659][   C31] RDX: 0000000000002712 RSI: 0000000000000246 RDI: ffffa10c52d43000
> Sep 15 04:32:29 205.254.184.12 [399661.972774][   C31] RBP: ffffa10c52d43000 R08: 0000000127a83c46 R09: 0000000000004d8c
> Sep 15 04:32:29 205.254.184.12 [399661.972889][   C31] R10: ffffe840ca0f7c00 R11: 0000000000000000 R12: ffffa10c8e764d80
> Sep 15 04:32:29 205.254.184.12 [399661.973005][   C31] R13: ffffa10c92b4c760 R14: 0000000000000058 R15: ffffa10c92b4c600
> Sep 15 04:32:29 205.254.184.12 [399661.973123][   C31] FS:  0000000000000000(0000) GS:ffffa1125fdc0000(0000) knlGS:0000000000000000
> Sep 15 04:32:29 205.254.184.12 [399661.973244][   C31] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Sep 15 04:32:29 205.254.184.12 [399661.973338][   C31] CR2: ffffa10c52d43058 CR3: 00000001059b8001 CR4: 00000000003706e0
> Sep 15 04:32:29 205.254.184.12 [399661.973454][   C31] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep 15 04:32:29 205.254.184.12 [399661.973569][   C31] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Sep 15 04:32:29 205.254.184.12 [399661.973684][   C31] Call Trace:
> Sep 15 04:32:29 205.254.184.12 [399661.973773][   C31]  <IRQ>
> Sep 15 04:32:29 205.254.184.12 [399661.973859][   C31]  ? __die+0xe4/0xf0
> Sep 15 04:32:29 205.254.184.12 [399661.973949][   C31]  ? page_fault_oops+0x144/0x3e0
> Sep 15 04:32:29 205.254.184.12 [399661.974043][   C31]  ? exc_page_fault+0x92/0xa0
> Sep 15 04:32:29 205.254.184.12 [399661.974136][   C31]  ? asm_exc_page_fault+0x22/0x30
> Sep 15 04:32:29 205.254.184.12 [399661.974228][   C31]  ? kfree_skb_reason+0x33/0xf0
> Sep 15 04:32:29 205.254.184.12 [399661.974321][   C31]  ? tcp_mtu_probe+0x3a6/0x7b0
> Sep 15 04:32:29 205.254.184.12 [399661.974416][   C31]  ? tcp_write_xmit+0x7fa/0x1410
> Sep 15 04:32:29 205.254.184.12 [399661.974509][   C31]  ? __tcp_push_pending_frames+0x2d/0xb0
> Sep 15 04:32:29 205.254.184.12 [399661.974603][   C31]  ? tcp_rcv_established+0x381/0x610
> Sep 15 04:32:29 205.254.184.12 [399661.974695][   C31]  ? sk_filter_trim_cap+0xc6/0x1c0
> Sep 15 04:32:29 205.254.184.12 [399661.974787][   C31]  ? tcp_v4_do_rcv+0x11f/0x1f0
> Sep 15 04:32:29 205.254.184.12 [399661.974877][   C31]  ? tcp_v4_rcv+0xfa1/0x1010
> Sep 15 04:32:29 205.254.184.12 [399661.974968][   C31]  ? ip_protocol_deliver_rcu+0x1b/0x270
> Sep 15 04:32:29 205.254.184.12 [399661.975062][   C31]  ? ip_local_deliver_finish+0x6d/0x90
> Sep 15 04:32:29 205.254.184.12 [399661.976257][   C31]  ? process_backlog+0x10c/0x230
> Sep 15 04:32:29 205.254.184.12 [399661.976352][   C31]  ? __napi_poll+0x20/0x180
> Sep 15 04:32:29 205.254.184.12 [399661.976442][   C31]  ? net_rx_action+0x2a4/0x390
> Sep 15 04:32:29 205.254.184.12 [399661.976534][   C31]  ? __do_softirq+0xd0/0x202
> Sep 15 04:32:29 205.254.184.12 [399661.976626][   C31]  ? do_softirq+0x3a/0x50
> Sep 15 04:32:29 205.254.184.12 [399661.976718][   C31]  </IRQ>
> Sep 15 04:32:29 205.254.184.12 [399661.976805][   C31]  <TASK>
> Sep 15 04:32:29 205.254.184.12 [399661.976890][   C31]  ? flush_smp_call_function_queue+0x3f/0x50
> Sep 15 04:32:29 205.254.184.12 [399661.976988][   C31]  ? do_idle+0x14d/0x210
> Sep 15 04:32:29 205.254.184.12 [399661.977078][   C31]  ? cpu_startup_entry+0x14/0x20
> Sep 15 04:32:29 205.254.184.12 [399661.977168][   C31]  ? start_secondary+0xe1/0xf0
> Sep 15 04:32:29 205.254.184.12 [399661.977262][   C31]  ? secondary_startup_64_no_verify+0x167/0x16b
> Sep 15 04:32:29 205.254.184.12 [399661.977359][   C31]  </TASK>
> Sep 15 04:32:29 205.254.184.12 [399661.977448][   C31] Modules linked in: nft_limit nf_conntrack_netlink  pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos 
> Sep 15 04:32:29 205.254.184.12 [399661.977720][   C31] CR2: ffffa10c52d43058
> Sep 15 04:32:29 205.254.184.12 [399661.977809][   C31] ---[ end trace 0000000000000000 ]---
> Sep 15 04:32:29 205.254.184.12 [399661.977901][   C31] RIP: 0010:0xffffa10c52d43058
> Sep 15 04:32:29 205.254.184.12 [399661.977992][   C31] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 30 d4 52 0c a1 ff ff 00 00 00 00 00 00
> Sep 15 04:32:29 205.254.184.12 [399661.978150][   C31] RSP: 0018:ffffad0e0097ccc8 EFLAGS: 00010282
> Sep 15 04:32:29 205.254.184.12 [399661.978243][   C31] RAX: ffffa10c52d43058 RBX: ffffa10c52d43000 RCX: 0000000000000000
> Sep 15 04:32:29 205.254.184.12 [399661.978358][   C31] RDX: 0000000000002712 RSI: 0000000000000246 RDI: ffffa10c52d43000
> Sep 15 04:32:29 205.254.184.12 [399661.978472][   C31] RBP: ffffa10c52d43000 R08: 0000000127a83c46 R09: 0000000000004d8c
> Sep 15 04:32:29 205.254.184.12 [399661.978587][   C31] R10: ffffe840ca0f7c00 R11: 0000000000000000 R12: ffffa10c8e764d80
> Sep 15 04:32:29 205.254.184.12 [399661.978702][   C31] R13: ffffa10c92b4c760 R14: 0000000000000058 R15: ffffa10c92b4c600
> Sep 15 04:32:29 205.254.184.12 [399661.978818][   C31] FS:  0000000000000000(0000) GS:ffffa1125fdc0000(0000) knlGS:0000000000000000
> Sep 15 04:32:29 205.254.184.12 [399661.978940][   C31] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Sep 15 04:32:29 205.254.184.12 [399661.979036][   C31] CR2: ffffa10c52d43058 CR3: 00000001059b8001 CR4: 00000000003706e0
> Sep 15 04:32:29 205.254.184.12 [399661.979150][   C31] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep 15 04:32:29 205.254.184.12 [399661.979265][   C31] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Sep 15 04:32:29 205.254.184.12 [399661.979381][   C31] Kernel panic - not syncing: Fatal exception in interrupt
> Sep 15 04:32:29 205.254.184.12 [399662.084038][   C31] Kernel Offset: 0x1e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> Sep 15 04:32:29 205.254.184.12 [399662.084162][   C31] Rebooting in 10 seconds..
> 
> 
> Please if find fix update me .
> 
> m.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-15 23:00 ` Martin Zaharinov
@ 2023-09-15 23:11   ` Martin Zaharinov
  2023-09-16  8:27     ` Paolo Abeni
  0 siblings, 1 reply; 35+ messages in thread
From: Martin Zaharinov @ 2023-09-15 23:11 UTC (permalink / raw)
  To: netdev
  Cc: Eric Dumazet, Paolo Abeni, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern

one more log:

Sep 12 07:37:29  [151563.298466][    C5] ------------[ cut here ]------------
Sep 12 07:37:29  [151563.298550][    C5] rcuref - imbalanced put()
Sep 12 07:37:29  [151563.298564][ C5] WARNING: CPU: 5 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
Sep 12 07:37:29  [151563.298724][    C5] Modules linked in: nft_limit nf_conntrack_netlink vlan_mon(O) pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: BNGBOOT(O)]
Sep 12 07:37:29  [151563.298894][    C5] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G           O       6.5.2 #1
Sep 12 07:37:29  [151563.298975][    C5] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.3 10/28/2020
Sep 12 07:37:29  [151563.299091][ C5] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
Sep 12 07:37:29  [151563.299185][ C5] Code: 31 c0 eb e2 80 3d c7 b8 e6 00 00 74 0a c7 03 00 00 00 e0 31 c0 eb cf 48 c7 c7 9b f5 e2 9f c6 05 ad b8 e6 00 01 e8 01 7b c7 ff <0f> 0b eb df cc cc cc cc cc cc cc cc cc cc cc cc cc 48 89 fa 83 e2
All code
========
   0:	31 c0                	xor    %eax,%eax
   2:	eb e2                	jmp    0xffffffffffffffe6
   4:	80 3d c7 b8 e6 00 00 	cmpb   $0x0,0xe6b8c7(%rip)        # 0xe6b8d2
   b:	74 0a                	je     0x17
   d:	c7 03 00 00 00 e0    	movl   $0xe0000000,(%rbx)
  13:	31 c0                	xor    %eax,%eax
  15:	eb cf                	jmp    0xffffffffffffffe6
  17:	48 c7 c7 9b f5 e2 9f 	mov    $0xffffffff9fe2f59b,%rdi
  1e:	c6 05 ad b8 e6 00 01 	movb   $0x1,0xe6b8ad(%rip)        # 0xe6b8d2
  25:	e8 01 7b c7 ff       	call   0xffffffffffc77b2b
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	eb df                	jmp    0xd
  2e:	cc                   	int3
  2f:	cc                   	int3
  30:	cc                   	int3
  31:	cc                   	int3
  32:	cc                   	int3
  33:	cc                   	int3
  34:	cc                   	int3
  35:	cc                   	int3
  36:	cc                   	int3
  37:	cc                   	int3
  38:	cc                   	int3
  39:	cc                   	int3
  3a:	cc                   	int3
  3b:	48 89 fa             	mov    %rdi,%rdx
  3e:	83                   	.byte 0x83
  3f:	e2                   	.byte 0xe2

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	eb df                	jmp    0xffffffffffffffe3
   4:	cc                   	int3
   5:	cc                   	int3
   6:	cc                   	int3
   7:	cc                   	int3
   8:	cc                   	int3
   9:	cc                   	int3
   a:	cc                   	int3
   b:	cc                   	int3
   c:	cc                   	int3
   d:	cc                   	int3
   e:	cc                   	int3
   f:	cc                   	int3
  10:	cc                   	int3
  11:	48 89 fa             	mov    %rdi,%rdx
  14:	83                   	.byte 0x83
  15:	e2                   	.byte 0xe2
Sep 12 07:37:29  [151563.299344][    C5] RSP: 0018:ffffad0e0033cde8 EFLAGS: 00010296
Sep 12 07:37:29  [151563.299440][    C5] RAX: 0000000000000019 RBX: ffffa10ba37ce100 RCX: 00000000fff7ffff
Sep 12 07:37:29  [151563.299558][    C5] RDX: 00000000fff7ffff RSI: 0000000000000001 RDI: 00000000ffffffea
Sep 12 07:37:29  [151563.299677][    C5] RBP: ffffa10b05c76000 R08: 0000000000000000 R09: 00000000fff7ffff
Sep 12 07:37:29  [151563.299796][    C5] R10: ffffa1125ae00000 R11: 0000000000000003 R12: ffffa10b5f1a4ec0
Sep 12 07:37:29  [151563.299914][    C5] R13: 0000000000000000 R14: 0000000000000258 R15: ffffad0e0033cf60
Sep 12 07:37:29  [151563.300030][    C5] FS:  0000000000000000(0000) GS:ffffa1125f740000(0000) knlGS:0000000000000000
Sep 12 07:37:29  [151563.300152][    C5] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 12 07:37:29  [151563.300248][    C5] CR2: 00007fade7f56d40 CR3: 000000010088e005 CR4: 00000000003706e0
Sep 12 07:37:29  [151563.300363][    C5] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 12 07:37:29  [151563.300478][    C5] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep 12 07:37:29  [151563.300593][    C5] Call Trace:
Sep 12 07:37:29  [151563.300683][    C5]  <IRQ>
Sep 12 07:37:29  [151563.300769][ C5] ? __warn (kernel/panic.c:668)
Sep 12 07:37:29  [151563.300861][ C5] ? report_bug (lib/bug.c:223)
Sep 12 07:37:29  [151563.300952][ C5] ? handle_bug (arch/x86/kernel/traps.c:324)
Sep 12 07:37:29  [151563.301043][ C5] ? exc_invalid_op (arch/x86/kernel/traps.c:345 (discriminator 1))
Sep 12 07:37:29  [151563.301134][ C5] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
Sep 12 07:37:29  [151563.301225][ C5] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
Sep 12 07:37:29  [151563.301319][ C5] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
Sep 12 07:37:29  [151563.301412][ C5] dst_release (./include/linux/rcuref.h:151 net/core/dst.c:166)
Sep 12 07:37:29  [151563.301502][ C5] __dev_queue_xmit (net/core/dev.c:4138)
Sep 12 07:37:29  [151563.301595][ C5] ? eth_header (net/ethernet/eth.c:83)
Sep 12 07:37:29  [151563.301686][ C5] ip_finish_output2 (./include/net/neighbour.h:327 ./include/net/sock.h:2251 net/ipv4/ip_output.c:228)
Sep 12 07:37:29  [151563.301778][ C5] process_backlog (net/core/dev.c:5451 net/core/dev.c:5566 net/core/dev.c:5895)
Sep 12 07:37:29  [151563.301871][ C5] __napi_poll+0x20/0x180
Sep 12 07:37:29  [151563.301964][ C5] net_rx_action (net/core/dev.c:5839 net/core/dev.c:5860 net/core/dev.c:6684)
Sep 12 07:37:29  [151563.302057][ C5] __do_softirq (./arch/x86/include/asm/bitops.h:319 kernel/softirq.c:550)
Sep 12 07:37:29  [151563.302150][ C5] do_softirq (kernel/softirq.c:463 (discriminator 32))
Sep 12 07:37:29  [151563.302240][    C5]  </IRQ>
Sep 12 07:37:29  [151563.302326][    C5]  <TASK>
Sep 12 07:37:29  [151563.302416][ C5] flush_smp_call_function_queue (kernel/smp.c:563 (discriminator 1))
Sep 12 07:37:29  [151563.302518][ C5] do_idle (kernel/sched/idle.c:295)
Sep 12 07:37:29  [151563.302612][ C5] cpu_startup_entry (kernel/sched/idle.c:379 (discriminator 1))
Sep 12 07:37:29  [151563.302707][ C5] start_secondary (arch/x86/kernel/smpboot.c:326)
Sep 12 07:37:29  [151563.302805][ C5] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:441)
Sep 12 07:37:29  [151563.302900][    C5]  </TASK>
Sep 12 07:37:29  [151563.302986][    C5] ---[ end trace 0000000000000000 ]---
Sep 15 04:32:29  [399661.971344][   C31] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Sep 15 04:32:29  [399661.971470][   C31] BUG: unable to handle page fault for address: ffffa10c52d43058
Sep 15 04:32:29  [399661.971586][   C31] #PF: supervisor instruction fetch in kernel mode
Sep 15 04:32:29  [399661.971680][   C31] #PF: error_code(0x0011) - permissions violation
Sep 15 04:32:29  [399661.971775][   C31] PGD 12601067 P4D 12601067 PUD 80000002400001e3
Sep 15 04:32:29  [399661.971871][   C31] Oops: 0011 [#1] PREEMPT SMP
Sep 15 04:32:29  [399661.971963][   C31] CPU: 31 PID: 0 Comm: swapper/31 Tainted: G        W  O       6.5.2 #1
Sep 15 04:32:29  [399661.972079][   C31] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.3 10/28/2020
Sep 15 04:32:29  [399661.972197][   C31] RIP: 0010:0xffffa10c52d43058
Sep 15 04:32:29  [399661.972289][ C31] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 30 d4 52 0c a1 ff ff 00 00 00 00 00 00
All code
========
	...
  30:	00 00                	add    %al,(%rax)
  32:	58                   	pop    %rax
  33:	30 d4                	xor    %dl,%ah
  35:*	52                   	push   %rdx		<-- trapping instruction
  36:	0c a1                	or     $0xa1,%al
  38:	ff                   	(bad)
  39:	ff 00                	incl   (%rax)
  3b:	00 00                	add    %al,(%rax)
  3d:	00 00                	add    %al,(%rax)
	...

Code starting with the faulting instruction
===========================================
	...
   8:	58                   	pop    %rax
   9:	30 d4                	xor    %dl,%ah
   b:	52                   	push   %rdx
   c:	0c a1                	or     $0xa1,%al
   e:	ff                   	(bad)
   f:	ff 00                	incl   (%rax)
  11:	00 00                	add    %al,(%rax)
  13:	00 00                	add    %al,(%rax)
	...
Sep 15 04:32:29  [399661.972448][   C31] RSP: 0018:ffffad0e0097ccc8 EFLAGS: 00010282
Sep 15 04:32:29  [399661.972543][   C31] RAX: ffffa10c52d43058 RBX: ffffa10c52d43000 RCX: 0000000000000000
Sep 15 04:32:29  [399661.972659][   C31] RDX: 0000000000002712 RSI: 0000000000000246 RDI: ffffa10c52d43000
Sep 15 04:32:29  [399661.972774][   C31] RBP: ffffa10c52d43000 R08: 0000000127a83c46 R09: 0000000000004d8c
Sep 15 04:32:29  [399661.972889][   C31] R10: ffffe840ca0f7c00 R11: 0000000000000000 R12: ffffa10c8e764d80
Sep 15 04:32:29  [399661.973005][   C31] R13: ffffa10c92b4c760 R14: 0000000000000058 R15: ffffa10c92b4c600
Sep 15 04:32:29  [399661.973123][   C31] FS:  0000000000000000(0000) GS:ffffa1125fdc0000(0000) knlGS:0000000000000000
Sep 15 04:32:29  [399661.973244][   C31] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 15 04:32:29  [399661.973338][   C31] CR2: ffffa10c52d43058 CR3: 00000001059b8001 CR4: 00000000003706e0
Sep 15 04:32:29  [399661.973454][   C31] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 15 04:32:29  [399661.973569][   C31] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep 15 04:32:29  [399661.973684][   C31] Call Trace:
Sep 15 04:32:29  [399661.973773][   C31]  <IRQ>
Sep 15 04:32:29  [399661.973859][ C31] ? __die (arch/x86/kernel/dumpstack.c:478 (discriminator 1) arch/x86/kernel/dumpstack.c:465 (discriminator 1) arch/x86/kernel/dumpstack.c:420 (discriminator 1) arch/x86/kernel/dumpstack.c:434 (discriminator 1))
Sep 15 04:32:29  [399661.973949][ C31] ? page_fault_oops (arch/x86/mm/fault.c:703)
Sep 15 04:32:29  [399661.974043][ C31] ? exc_page_fault (arch/x86/mm/fault.c:48 (discriminator 2) arch/x86/mm/fault.c:1479 (discriminator 2) arch/x86/mm/fault.c:1542 (discriminator 2))
Sep 15 04:32:29  [399661.974136][ C31] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570)
Sep 15 04:32:29  [399661.974228][ C31] ? kfree_skb_reason (net/core/skbuff.c:1006 net/core/skbuff.c:1022 net/core/skbuff.c:1058)
Sep 15 04:32:29  [399661.974321][ C31] ? tcp_mtu_probe (./include/net/sock.h:1627 (discriminator 1) net/ipv4/tcp_output.c:2338 (discriminator 1) net/ipv4/tcp_output.c:2463 (discriminator 1))
Sep 15 04:32:29  [399661.974416][ C31] ? tcp_write_xmit (net/ipv4/tcp_output.c:2678)
Sep 15 04:32:29  [399661.974509][ C31] ? __tcp_push_pending_frames (net/ipv4/tcp_output.c:2940 (discriminator 1))
Sep 15 04:32:29  [399661.974603][ C31] ? tcp_rcv_established (net/ipv4/tcp_input.c:5626 net/ipv4/tcp_input.c:5620 net/ipv4/tcp_input.c:6066)
Sep 15 04:32:29  [399661.974695][ C31] ? sk_filter_trim_cap (./include/linux/rcupdate.h:781 net/core/filter.c:157)
Sep 15 04:32:29  [399661.974787][ C31] ? tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1728)
Sep 15 04:32:29  [399661.974877][ C31] ? tcp_v4_rcv (./include/net/tcp.h:2342 (discriminator 1) net/ipv4/tcp_ipv4.c:2147 (discriminator 1))
Sep 15 04:32:29  [399661.974968][ C31] ? ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205)
Sep 15 04:32:29  [399661.975062][ C31] ? ip_local_deliver_finish (net/ipv4/ip_input.c:233 (discriminator 1))
Sep 15 04:32:29  [399661.976257][ C31] ? process_backlog (net/core/dev.c:5451 net/core/dev.c:5566 net/core/dev.c:5895)
Sep 15 04:32:29  [399661.976352][ C31] ? __napi_poll+0x20/0x180
Sep 15 04:32:29  [399661.976442][ C31] ? net_rx_action (net/core/dev.c:5839 net/core/dev.c:5860 net/core/dev.c:6684)
Sep 15 04:32:29  [399661.976534][ C31] ? __do_softirq (./arch/x86/include/asm/bitops.h:319 kernel/softirq.c:550)
Sep 15 04:32:29  [399661.976626][ C31] ? do_softirq (kernel/softirq.c:463 (discriminator 32))
Sep 15 04:32:29  [399661.976718][   C31]  </IRQ>
Sep 15 04:32:29  [399661.976805][   C31]  <TASK>
Sep 15 04:32:29  [399661.976890][ C31] ? flush_smp_call_function_queue (kernel/smp.c:563 (discriminator 1))
Sep 15 04:32:29  [399661.976988][ C31] ? do_idle (kernel/sched/idle.c:295)
Sep 15 04:32:29  [399661.977078][ C31] ? cpu_startup_entry (kernel/sched/idle.c:379 (discriminator 1))
Sep 15 04:32:29  [399661.977168][ C31] ? start_secondary (arch/x86/kernel/smpboot.c:326)
Sep 15 04:32:29  [399661.977262][ C31] ? secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:441)
Sep 15 04:32:29  [399661.977359][   C31]  </TASK>
Sep 15 04:32:29  [399661.977448][   C31] Modules linked in: nft_limit nf_conntrack_netlink vlan_mon(O) pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: BNGBOOT(O)]
Sep 15 04:32:29  [399661.977720][   C31] CR2: ffffa10c52d43058
Sep 15 04:32:29  [399661.977809][   C31] ---[ end trace 0000000000000000 ]---
Sep 15 04:32:29  [399661.977901][   C31] RIP: 0010:0xffffa10c52d43058
Sep 15 04:32:29  [399661.977992][ C31] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 30 d4 52 0c a1 ff ff 00 00 00 00 00 00
All code
========
	...
  30:	00 00                	add    %al,(%rax)
  32:	58                   	pop    %rax
  33:	30 d4                	xor    %dl,%ah
  35:*	52                   	push   %rdx		<-- trapping instruction
  36:	0c a1                	or     $0xa1,%al
  38:	ff                   	(bad)
  39:	ff 00                	incl   (%rax)
  3b:	00 00                	add    %al,(%rax)
  3d:	00 00                	add    %al,(%rax)
	...

Code starting with the faulting instruction
===========================================
	...
   8:	58                   	pop    %rax
   9:	30 d4                	xor    %dl,%ah
   b:	52                   	push   %rdx
   c:	0c a1                	or     $0xa1,%al
   e:	ff                   	(bad)
   f:	ff 00                	incl   (%rax)
  11:	00 00                	add    %al,(%rax)
  13:	00 00                	add    %al,(%rax)
	...
Sep 15 04:32:29  [399661.978150][   C31] RSP: 0018:ffffad0e0097ccc8 EFLAGS: 00010282
Sep 15 04:32:29  [399661.978243][   C31] RAX: ffffa10c52d43058 RBX: ffffa10c52d43000 RCX: 0000000000000000
Sep 15 04:32:29  [399661.978358][   C31] RDX: 0000000000002712 RSI: 0000000000000246 RDI: ffffa10c52d43000
Sep 15 04:32:29  [399661.978472][   C31] RBP: ffffa10c52d43000 R08: 0000000127a83c46 R09: 0000000000004d8c
Sep 15 04:32:29  [399661.978587][   C31] R10: ffffe840ca0f7c00 R11: 0000000000000000 R12: ffffa10c8e764d80
Sep 15 04:32:29  [399661.978702][   C31] R13: ffffa10c92b4c760 R14: 0000000000000058 R15: ffffa10c92b4c600
Sep 15 04:32:29  [399661.978818][   C31] FS:  0000000000000000(0000) GS:ffffa1125fdc0000(0000) knlGS:0000000000000000
Sep 15 04:32:29  [399661.978940][   C31] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 15 04:32:29  [399661.979036][   C31] CR2: ffffa10c52d43058 CR3: 00000001059b8001 CR4: 00000000003706e0
Sep 15 04:32:29  [399661.979150][   C31] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 15 04:32:29  [399661.979265][   C31] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep 15 04:32:29  [399661.979381][   C31] Kernel panic - not syncing: Fatal exception in interrupt
Sep 15 04:32:29  [399662.084038][   C31] Kernel Offset: 0x1e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Sep 15 04:32:29  [399662.084162][   C31] Rebooting in 10 seconds..

> On 16 Sep 2023, at 2:00, Martin Zaharinov <micron10@gmail.com> wrote:
> 
> Ok fix 
> one note this is kernel 6.5.3 …
> 
> 
> see log now : 
> 
> 
> [40915.530445] ------------[ cut here ]------------
> [40915.530529] rcuref - imbalanced put()
> [40915.530540] WARNING: CPU: 7 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> [40915.530698] Modules linked in: nf_conntrack_netlink nft_limit pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
> [40915.530899] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G           O       6.5.3 #1
> [40915.531018] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.3 10/28/2020
> [40915.531137] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> [40915.531230] Code: 31 c0 eb e2 80 3d c6 ae e6 00 00 74 0a c7 03 00 00 00 e0 31 c0 eb cf 48 c7 c7 68 f6 e2 8e c6 05 ac ae e6 00 01 e8 11 71 c7 ff <0f> 0b eb df cc cc cc cc cc cc cc cc cc cc cc cc cc 48 89 fa 83 e2
> All code
> ========
>   0: 31 c0                 xor    %eax,%eax
>   2: eb e2                 jmp    0xffffffffffffffe6
>   4: 80 3d c6 ae e6 00 00 cmpb   $0x0,0xe6aec6(%rip)        # 0xe6aed1
>   b: 74 0a                 je     0x17
>   d: c7 03 00 00 00 e0     movl   $0xe0000000,(%rbx)
>  13: 31 c0                 xor    %eax,%eax
>  15: eb cf                 jmp    0xffffffffffffffe6
>  17: 48 c7 c7 68 f6 e2 8e mov    $0xffffffff8ee2f668,%rdi
>  1e: c6 05 ac ae e6 00 01 movb   $0x1,0xe6aeac(%rip)        # 0xe6aed1
>  25: e8 11 71 c7 ff        call   0xffffffffffc7713b
>  2a:* 0f 0b                 ud2     <-- trapping instruction
>  2c: eb df                 jmp    0xd
>  2e: cc                    int3
>  2f: cc                    int3
>  30: cc                    int3
>  31: cc                    int3
>  32: cc                    int3
>  33: cc                    int3
>  34: cc                    int3
>  35: cc                    int3
>  36: cc                    int3
>  37: cc                    int3
>  38: cc                    int3
>  39: cc                    int3
>  3a: cc                    int3
>  3b: 48 89 fa              mov    %rdi,%rdx
>  3e: 83                    .byte 0x83
>  3f: e2                    .byte 0xe2
> 
> Code starting with the faulting instruction
> ===========================================
>   0: 0f 0b                 ud2
>   2: eb df                 jmp    0xffffffffffffffe3
>   4: cc                    int3
>   5: cc                    int3
>   6: cc                    int3
>   7: cc                    int3
>   8: cc                    int3
>   9: cc                    int3
>   a: cc                    int3
>   b: cc                    int3
>   c: cc                    int3
>   d: cc                    int3
>   e: cc                    int3
>   f: cc                    int3
>  10: cc                    int3
>  11: 48 89 fa              mov    %rdi,%rdx
>  14: 83                    .byte 0x83
>  15: e2                    .byte 0xe2
> [40915.531389] RSP: 0018:ffffa62680318de8 EFLAGS: 00010296
> [40915.531487] RAX: 0000000000000019 RBX: ffff982f02950c40 RCX: 00000000fffbffff
> [40915.531605] RDX: 00000000fffbffff RSI: 0000000000000001 RDI: 00000000ffffffea
> [40915.531721] RBP: ffff982e467d2000 R08: 0000000000000000 R09: 00000000fffbffff
> [40915.531839] R10: ffff98359d600000 R11: 0000000000000003 R12: ffff982f044e16c0
> [40915.531956] R13: 0000000000000000 R14: 0000000000000258 R15: ffffa62680318f60
> [40915.532075] FS:  0000000000000000(0000) GS:ffff98359fbc0000(0000) knlGS:0000000000000000
> [40915.532195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [40915.532291] CR2: 00005593eb3ff078 CR3: 0000000179f6e001 CR4: 00000000003706e0
> [40915.532409] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [40915.532526] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [40915.532645] Call Trace:
> [40915.532736]  <IRQ>
> [40915.532824] ? __warn (kernel/panic.c:668)
> [40915.532918] ? report_bug (lib/bug.c:223)
> [40915.533011] ? handle_bug (arch/x86/kernel/traps.c:324)
> [40915.533104] ? exc_invalid_op (arch/x86/kernel/traps.c:345 (discriminator 1))
> [40915.533198] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
> [40915.533294] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> [40915.533389] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> [40915.533482] dst_release (./include/linux/rcuref.h:151 net/core/dst.c:166)
> [40915.533576] __dev_queue_xmit (net/core/dev.c:4138)
> [40915.533671] ? eth_header (net/ethernet/eth.c:83)
> [40915.533766] ip_finish_output2 (./include/net/neighbour.h:544 net/ipv4/ip_output.c:230)
> [40915.533863] process_backlog (net/core/dev.c:5451 net/core/dev.c:5566 net/core/dev.c:5895)
> [40915.533958] __napi_poll+0x20/0x180
> [40915.534050] net_rx_action (net/core/dev.c:5839 net/core/dev.c:5860 net/core/dev.c:6684)
> [40915.534140] __do_softirq (./arch/x86/include/asm/bitops.h:319 kernel/softirq.c:550)
> [40915.534233] do_softirq (kernel/softirq.c:463 (discriminator 32))
> [40915.534326]  </IRQ>
> [40915.534413]  <TASK>
> [40915.534503] flush_smp_call_function_queue (kernel/smp.c:563 (discriminator 1))
> [40915.534597] do_idle (kernel/sched/idle.c:295)
> [40915.534687] cpu_startup_entry (kernel/sched/idle.c:379 (discriminator 1))
> [40915.534778] start_secondary (arch/x86/kernel/smpboot.c:326)
> [40915.534871] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:441)
> [40915.534968]  </TASK>
> [40915.535057] ---[ end trace 0000000000000000 ]---
> 
>> On 15 Sep 2023, at 7:05, Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> Hi All 
>> This is report from kernel 6.5.2 after 4 day up system hang and reboot after this error :
>> 
>> 
>> 
>> Sep 15 04:32:29 205.254.184.12 [399661.971344][   C31] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
>> Sep 15 04:32:29 205.254.184.12 [399661.971470][   C31] BUG: unable to handle page fault for address: ffffa10c52d43058
>> Sep 15 04:32:29 205.254.184.12 [399661.971586][   C31] #PF: supervisor instruction fetch in kernel mode
>> Sep 15 04:32:29 205.254.184.12 [399661.971680][   C31] #PF: error_code(0x0011) - permissions violation
>> Sep 15 04:32:29 205.254.184.12 [399661.971775][   C31] PGD 12601067 P4D 12601067 PUD 80000002400001e3
>> Sep 15 04:32:29 205.254.184.12 [399661.971871][   C31] Oops: 0011 [#1] PREEMPT SMP
>> Sep 15 04:32:29 205.254.184.12 [399661.971963][   C31] CPU: 31 PID: 0 Comm: swapper/31 Tainted: G        W  O       6.5.2 #1
>> Sep 15 04:32:29 205.254.184.12 [399661.972079][   C31] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.3 10/28/2020
>> Sep 15 04:32:29 205.254.184.12 [399661.972197][   C31] RIP: 0010:0xffffa10c52d43058
>> Sep 15 04:32:29 205.254.184.12 [399661.972289][   C31] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 30 d4 52 0c a1 ff ff 00 00 00 00 00 00
>> Sep 15 04:32:29 205.254.184.12 [399661.972448][   C31] RSP: 0018:ffffad0e0097ccc8 EFLAGS: 00010282
>> Sep 15 04:32:29 205.254.184.12 [399661.972543][   C31] RAX: ffffa10c52d43058 RBX: ffffa10c52d43000 RCX: 0000000000000000
>> Sep 15 04:32:29 205.254.184.12 [399661.972659][   C31] RDX: 0000000000002712 RSI: 0000000000000246 RDI: ffffa10c52d43000
>> Sep 15 04:32:29 205.254.184.12 [399661.972774][   C31] RBP: ffffa10c52d43000 R08: 0000000127a83c46 R09: 0000000000004d8c
>> Sep 15 04:32:29 205.254.184.12 [399661.972889][   C31] R10: ffffe840ca0f7c00 R11: 0000000000000000 R12: ffffa10c8e764d80
>> Sep 15 04:32:29 205.254.184.12 [399661.973005][   C31] R13: ffffa10c92b4c760 R14: 0000000000000058 R15: ffffa10c92b4c600
>> Sep 15 04:32:29 205.254.184.12 [399661.973123][   C31] FS:  0000000000000000(0000) GS:ffffa1125fdc0000(0000) knlGS:0000000000000000
>> Sep 15 04:32:29 205.254.184.12 [399661.973244][   C31] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Sep 15 04:32:29 205.254.184.12 [399661.973338][   C31] CR2: ffffa10c52d43058 CR3: 00000001059b8001 CR4: 00000000003706e0
>> Sep 15 04:32:29 205.254.184.12 [399661.973454][   C31] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Sep 15 04:32:29 205.254.184.12 [399661.973569][   C31] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Sep 15 04:32:29 205.254.184.12 [399661.973684][   C31] Call Trace:
>> Sep 15 04:32:29 205.254.184.12 [399661.973773][   C31]  <IRQ>
>> Sep 15 04:32:29 205.254.184.12 [399661.973859][   C31]  ? __die+0xe4/0xf0
>> Sep 15 04:32:29 205.254.184.12 [399661.973949][   C31]  ? page_fault_oops+0x144/0x3e0
>> Sep 15 04:32:29 205.254.184.12 [399661.974043][   C31]  ? exc_page_fault+0x92/0xa0
>> Sep 15 04:32:29 205.254.184.12 [399661.974136][   C31]  ? asm_exc_page_fault+0x22/0x30
>> Sep 15 04:32:29 205.254.184.12 [399661.974228][   C31]  ? kfree_skb_reason+0x33/0xf0
>> Sep 15 04:32:29 205.254.184.12 [399661.974321][   C31]  ? tcp_mtu_probe+0x3a6/0x7b0
>> Sep 15 04:32:29 205.254.184.12 [399661.974416][   C31]  ? tcp_write_xmit+0x7fa/0x1410
>> Sep 15 04:32:29 205.254.184.12 [399661.974509][   C31]  ? __tcp_push_pending_frames+0x2d/0xb0
>> Sep 15 04:32:29 205.254.184.12 [399661.974603][   C31]  ? tcp_rcv_established+0x381/0x610
>> Sep 15 04:32:29 205.254.184.12 [399661.974695][   C31]  ? sk_filter_trim_cap+0xc6/0x1c0
>> Sep 15 04:32:29 205.254.184.12 [399661.974787][   C31]  ? tcp_v4_do_rcv+0x11f/0x1f0
>> Sep 15 04:32:29 205.254.184.12 [399661.974877][   C31]  ? tcp_v4_rcv+0xfa1/0x1010
>> Sep 15 04:32:29 205.254.184.12 [399661.974968][   C31]  ? ip_protocol_deliver_rcu+0x1b/0x270
>> Sep 15 04:32:29 205.254.184.12 [399661.975062][   C31]  ? ip_local_deliver_finish+0x6d/0x90
>> Sep 15 04:32:29 205.254.184.12 [399661.976257][   C31]  ? process_backlog+0x10c/0x230
>> Sep 15 04:32:29 205.254.184.12 [399661.976352][   C31]  ? __napi_poll+0x20/0x180
>> Sep 15 04:32:29 205.254.184.12 [399661.976442][   C31]  ? net_rx_action+0x2a4/0x390
>> Sep 15 04:32:29 205.254.184.12 [399661.976534][   C31]  ? __do_softirq+0xd0/0x202
>> Sep 15 04:32:29 205.254.184.12 [399661.976626][   C31]  ? do_softirq+0x3a/0x50
>> Sep 15 04:32:29 205.254.184.12 [399661.976718][   C31]  </IRQ>
>> Sep 15 04:32:29 205.254.184.12 [399661.976805][   C31]  <TASK>
>> Sep 15 04:32:29 205.254.184.12 [399661.976890][   C31]  ? flush_smp_call_function_queue+0x3f/0x50
>> Sep 15 04:32:29 205.254.184.12 [399661.976988][   C31]  ? do_idle+0x14d/0x210
>> Sep 15 04:32:29 205.254.184.12 [399661.977078][   C31]  ? cpu_startup_entry+0x14/0x20
>> Sep 15 04:32:29 205.254.184.12 [399661.977168][   C31]  ? start_secondary+0xe1/0xf0
>> Sep 15 04:32:29 205.254.184.12 [399661.977262][   C31]  ? secondary_startup_64_no_verify+0x167/0x16b
>> Sep 15 04:32:29 205.254.184.12 [399661.977359][   C31]  </TASK>
>> Sep 15 04:32:29 205.254.184.12 [399661.977448][   C31] Modules linked in: nft_limit nf_conntrack_netlink  pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos 
>> Sep 15 04:32:29 205.254.184.12 [399661.977720][   C31] CR2: ffffa10c52d43058
>> Sep 15 04:32:29 205.254.184.12 [399661.977809][   C31] ---[ end trace 0000000000000000 ]---
>> Sep 15 04:32:29 205.254.184.12 [399661.977901][   C31] RIP: 0010:0xffffa10c52d43058
>> Sep 15 04:32:29 205.254.184.12 [399661.977992][   C31] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 30 d4 52 0c a1 ff ff 00 00 00 00 00 00
>> Sep 15 04:32:29 205.254.184.12 [399661.978150][   C31] RSP: 0018:ffffad0e0097ccc8 EFLAGS: 00010282
>> Sep 15 04:32:29 205.254.184.12 [399661.978243][   C31] RAX: ffffa10c52d43058 RBX: ffffa10c52d43000 RCX: 0000000000000000
>> Sep 15 04:32:29 205.254.184.12 [399661.978358][   C31] RDX: 0000000000002712 RSI: 0000000000000246 RDI: ffffa10c52d43000
>> Sep 15 04:32:29 205.254.184.12 [399661.978472][   C31] RBP: ffffa10c52d43000 R08: 0000000127a83c46 R09: 0000000000004d8c
>> Sep 15 04:32:29 205.254.184.12 [399661.978587][   C31] R10: ffffe840ca0f7c00 R11: 0000000000000000 R12: ffffa10c8e764d80
>> Sep 15 04:32:29 205.254.184.12 [399661.978702][   C31] R13: ffffa10c92b4c760 R14: 0000000000000058 R15: ffffa10c92b4c600
>> Sep 15 04:32:29 205.254.184.12 [399661.978818][   C31] FS:  0000000000000000(0000) GS:ffffa1125fdc0000(0000) knlGS:0000000000000000
>> Sep 15 04:32:29 205.254.184.12 [399661.978940][   C31] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Sep 15 04:32:29 205.254.184.12 [399661.979036][   C31] CR2: ffffa10c52d43058 CR3: 00000001059b8001 CR4: 00000000003706e0
>> Sep 15 04:32:29 205.254.184.12 [399661.979150][   C31] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Sep 15 04:32:29 205.254.184.12 [399661.979265][   C31] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Sep 15 04:32:29 205.254.184.12 [399661.979381][   C31] Kernel panic - not syncing: Fatal exception in interrupt
>> Sep 15 04:32:29 205.254.184.12 [399662.084038][   C31] Kernel Offset: 0x1e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> Sep 15 04:32:29 205.254.184.12 [399662.084162][   C31] Rebooting in 10 seconds..
>> 
>> 
>> Please if find fix update me .
>> 
>> m.
> 
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-15 23:11   ` Martin Zaharinov
@ 2023-09-16  8:27     ` Paolo Abeni
       [not found]       ` <CALidq=UR=3rOHZczCnb1bEhbt9So60UZ5y60Cdh4aP41FkB5Tw@mail.gmail.com>
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Abeni @ 2023-09-16  8:27 UTC (permalink / raw)
  To: Martin Zaharinov, netdev
  Cc: Eric Dumazet, patchwork-bot+netdevbpf, Jakub Kicinski,
	Stephen Hemminger, kuba+netdrv, dsahern

On Sat, 2023-09-16 at 02:11 +0300, Martin Zaharinov wrote:
> one more log:
> 
> Sep 12 07:37:29  [151563.298466][    C5] ------------[ cut here ]------------
> Sep 12 07:37:29  [151563.298550][    C5] rcuref - imbalanced put()
> Sep 12 07:37:29  [151563.298564][ C5] WARNING: CPU: 5 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> Sep 12 07:37:29  [151563.298724][    C5] Modules linked in: nft_limit nf_conntrack_netlink vlan_mon(O) pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: BNGBOOT(O)]
> Sep 12 07:37:29  [151563.298894][    C5] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G           O       6.5.2 #1


You have out-of-tree modules taint in all the report you shared. Please
try to reproduce the issue with such taint, thanks!

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
       [not found]       ` <CALidq=UR=3rOHZczCnb1bEhbt9So60UZ5y60Cdh4aP41FkB5Tw@mail.gmail.com>
@ 2023-09-17 11:35         ` Martin Zaharinov
  2023-09-17 11:40         ` Martin Zaharinov
  1 sibling, 0 replies; 35+ messages in thread
From: Martin Zaharinov @ 2023-09-17 11:35 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Eric Dumazet, patchwork-bot+netdevbpf, Jakub Kicinski,
	Stephen Hemminger, kuba+netdrv, dsahern

Hi Paolo and Eric

See this is latest crash from kernel 6.5.3 without external moduls….


first is crash report , second is with decode: 


Sep 17 11:43:11  [127675.391688][    C2] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Sep 17 11:43:11  [127675.391780][    C2] BUG: unable to handle page fault for address: ffff9bd9ff20f858
Sep 17 11:43:11  [127675.391859][    C2] #PF: supervisor instruction fetch in kernel mode
Sep 17 11:43:11  [127675.391937][    C2] #PF: error_code(0x0011) - permissions violation
Sep 17 11:43:11  [127675.392014][    C2] PGD 1a601067 P4D 1a601067 PUD 147b05063 PMD 800000023f2001e3
Sep 17 11:43:11  [127675.392099][    C2] Oops: 0011 [#1] PREEMPT SMP
Sep 17 11:43:11  [127675.392173][    C2] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G           O       6.5.3 #1
Sep 17 11:43:11  [127675.392257][    C2] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.3 10/28/2020
Sep 17 11:43:11  [127675.392338][    C2] RIP: 0010:0xffff9bd9ff20f858
Sep 17 11:43:11  [127675.392413][    C2] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 f8 20 ff d9 9b ff ff 00 00 00 00 00 00
Sep 17 11:43:11  [127675.392540][    C2] RSP: 0018:ffffadfe0007ccc8 EFLAGS: 00010282
Sep 17 11:43:11  [127675.392635][    C2] RAX: ffff9bd9ff20f858 RBX: ffff9bd9ff20f800 RCX: 0000000000000000
Sep 17 11:43:11  [127675.392753][    C2] RDX: 0000000000002711 RSI: 0000000000000246 RDI: ffff9bd9ff20f800
Sep 17 11:43:11  [127675.392871][    C2] RBP: ffff9bd9ff20f800 R08: 000000010ca6060f R09: 00000000000079f2
Sep 17 11:43:11  [127675.392988][    C2] R10: ffffd88b47077c00 R11: 0000000000000000 R12: ffff9bd9bb6ca1c0
Sep 17 11:43:11  [127675.393107][    C2] R13: ffff9bd9b9013760 R14: 0000000000000053 R15: ffff9bd9b9013600
Sep 17 11:43:11  [127675.393226][    C2] FS:  0000000000000000(0000) GS:ffff9be01f680000(0000) knlGS:0000000000000000
Sep 17 11:43:11  [127675.393347][    C2] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 17 11:43:11  [127675.393445][    C2] CR2: ffff9bd9ff20f858 CR3: 0000000234668001 CR4: 00000000003706e0
Sep 17 11:43:11  [127675.393562][    C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 17 11:43:11  [127675.393677][    C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep 17 11:43:11  [127675.393794][    C2] Call Trace:
Sep 17 11:43:11  [127675.393886][    C2]  <IRQ>
Sep 17 11:43:11  [127675.393973][    C2]  ? __die+0xe4/0xf0
Sep 17 11:43:11  [127675.394065][    C2]  ? page_fault_oops+0x144/0x3e0
Sep 17 11:43:11  [127675.394157][    C2]  ? exc_page_fault+0x92/0xa0
Sep 17 11:43:11  [127675.394251][    C2]  ? asm_exc_page_fault+0x22/0x30
Sep 17 11:43:11  [127675.394347][    C2]  ? kfree_skb_reason+0x33/0xf0
Sep 17 11:43:11  [127675.394443][    C2]  ? tcp_mtu_probe+0x3a6/0x7b0
Sep 17 11:43:11  [127675.394539][    C2]  ? tcp_write_xmit+0x7fa/0x1410
Sep 17 11:43:11  [127675.394634][    C2]  ? __tcp_push_pending_frames+0x2d/0xb0
Sep 17 11:43:11  [127675.394727][    C2]  ? tcp_rcv_established+0x205/0x610
Sep 17 11:43:11  [127675.394822][    C2]  ? sk_filter_trim_cap+0xc6/0x1c0
Sep 17 11:43:11  [127675.394914][    C2]  ? tcp_v4_do_rcv+0x11f/0x1f0
Sep 17 11:43:11  [127675.395007][    C2]  ? tcp_v4_rcv+0xfa1/0x1010
Sep 17 11:43:11  [127675.395100][    C2]  ? ip_protocol_deliver_rcu+0x1b/0x270
Sep 17 11:43:11  [127675.395196][    C2]  ? ip_local_deliver_finish+0x6d/0x90
Sep 17 11:43:11  [127675.395289][    C2]  ? process_backlog+0x10c/0x230
Sep 17 11:43:11  [127675.395386][    C2]  ? __napi_poll+0x20/0x180
Sep 17 11:43:11  [127675.395478][    C2]  ? net_rx_action+0x2a4/0x390
Sep 17 11:43:11  [127675.395572][    C2]  ? __do_softirq+0xd0/0x202
Sep 17 11:43:11  [127675.395666][    C2]  ? do_softirq+0x3a/0x50
Sep 17 11:43:11  [127675.395760][    C2]  </IRQ>
Sep 17 11:43:11  [127675.395849][    C2]  <TASK>
Sep 17 11:43:11  [127675.395939][    C2]  ? flush_smp_call_function_queue+0x3f/0x50
Sep 17 11:43:11  [127675.396039][    C2]  ? do_idle+0x14d/0x210
Sep 17 11:43:11  [127675.396132][    C2]  ? cpu_startup_entry+0x14/0x20
Sep 17 11:43:11  [127675.396224][    C2]  ? start_secondary+0xe1/0xf0
Sep 17 11:43:11  [127675.396318][    C2]  ? secondary_startup_64_no_verify+0x167/0x16b
Sep 17 11:43:11  [127675.396417][    C2]  </TASK>
Sep 17 11:43:11  [127675.396504][    C2] Modules linked in: nf_conntrack_netlink nft_limit pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
Sep 17 11:43:11  [127675.396775][    C2] CR2: ffff9bd9ff20f858
Sep 17 11:43:11  [127675.396868][    C2] ---[ end trace 0000000000000000 ]---
Sep 17 11:43:11  [127675.396961][    C2] RIP: 0010:0xffff9bd9ff20f858
Sep 17 11:43:11  [127675.397052][    C2] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 f8 20 ff d9 9b ff ff 00 00 00 00 00 00
Sep 17 11:43:11  [127675.397211][    C2] RSP: 0018:ffffadfe0007ccc8 EFLAGS: 00010282
Sep 17 11:43:11  [127675.397305][    C2] RAX: ffff9bd9ff20f858 RBX: ffff9bd9ff20f800 RCX: 0000000000000000
Sep 17 11:43:11  [127675.397419][    C2] RDX: 0000000000002711 RSI: 0000000000000246 RDI: ffff9bd9ff20f800
Sep 17 11:43:11  [127675.397535][    C2] RBP: ffff9bd9ff20f800 R08: 000000010ca6060f R09: 00000000000079f2
Sep 17 11:43:11  [127675.397651][    C2] R10: ffffd88b47077c00 R11: 0000000000000000 R12: ffff9bd9bb6ca1c0
Sep 17 11:43:11  [127675.397767][    C2] R13: ffff9bd9b9013760 R14: 0000000000000053 R15: ffff9bd9b9013600
Sep 17 11:43:11  [127675.397886][    C2] FS:  0000000000000000(0000) GS:ffff9be01f680000(0000) knlGS:0000000000000000
Sep 17 11:43:11  [127675.398006][    C2] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 17 11:43:11  [127675.398101][    C2] CR2: ffff9bd9ff20f858 CR3: 0000000234668001 CR4: 00000000003706e0
Sep 17 11:43:11  [127675.398217][    C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 17 11:43:11  [127675.398334][    C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep 17 11:43:11  [127675.398451][    C2] Kernel panic - not syncing: Fatal exception in interrupt
Sep 17 11:43:11  [127675.503611][    C2] Kernel Offset: 0x20000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Sep 17 11:43:11  [127675.503734][    C2] Rebooting in 10 seconds..





Second with decode: 



Sep 17 11:43:11  [127675.391688][    C2] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Sep 17 11:43:11  [127675.391780][    C2] BUG: unable to handle page fault for address: ffff9bd9ff20f858
Sep 17 11:43:11  [127675.391859][    C2] #PF: supervisor instruction fetch in kernel mode
Sep 17 11:43:11  [127675.391937][    C2] #PF: error_code(0x0011) - permissions violation
Sep 17 11:43:11  [127675.392014][    C2] PGD 1a601067 P4D 1a601067 PUD 147b05063 PMD 800000023f2001e3
Sep 17 11:43:11  [127675.392099][    C2] Oops: 0011 [#1] PREEMPT SMP
Sep 17 11:43:11  [127675.392173][    C2] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G           O       6.5.3 #1
Sep 17 11:43:11  [127675.392257][    C2] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.3 10/28/2020
Sep 17 11:43:11  [127675.392338][    C2] RIP: 0010:0xffff9bd9ff20f858
Sep 17 11:43:11 [127675.392413][ C2] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 f8 20 ff d9 9b ff ff 00 00 00 00 00 00
All code
========
	...
  30:	00 00                	add    %al,(%rax)
  32:	58                   	pop    %rax
  33:*	f8                   	clc    		<-- trapping instruction
  34:	20 ff                	and    %bh,%bh
  36:	d9 9b ff ff 00 00    	fstps  0xffff(%rbx)
  3c:	00 00                	add    %al,(%rax)
	...

Code starting with the faulting instruction
===========================================
	...
   8:	58                   	pop    %rax
   9:	f8                   	clc
   a:	20 ff                	and    %bh,%bh
   c:	d9 9b ff ff 00 00    	fstps  0xffff(%rbx)
  12:	00 00                	add    %al,(%rax)
	...
Sep 17 11:43:11  [127675.392540][    C2] RSP: 0018:ffffadfe0007ccc8 EFLAGS: 00010282
Sep 17 11:43:11  [127675.392635][    C2] RAX: ffff9bd9ff20f858 RBX: ffff9bd9ff20f800 RCX: 0000000000000000
Sep 17 11:43:11  [127675.392753][    C2] RDX: 0000000000002711 RSI: 0000000000000246 RDI: ffff9bd9ff20f800
Sep 17 11:43:11  [127675.392871][    C2] RBP: ffff9bd9ff20f800 R08: 000000010ca6060f R09: 00000000000079f2
Sep 17 11:43:11  [127675.392988][    C2] R10: ffffd88b47077c00 R11: 0000000000000000 R12: ffff9bd9bb6ca1c0
Sep 17 11:43:11  [127675.393107][    C2] R13: ffff9bd9b9013760 R14: 0000000000000053 R15: ffff9bd9b9013600
Sep 17 11:43:11  [127675.393226][    C2] FS:  0000000000000000(0000) GS:ffff9be01f680000(0000) knlGS:0000000000000000
Sep 17 11:43:11  [127675.393347][    C2] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 17 11:43:11  [127675.393445][    C2] CR2: ffff9bd9ff20f858 CR3: 0000000234668001 CR4: 00000000003706e0
Sep 17 11:43:11  [127675.393562][    C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 17 11:43:11  [127675.393677][    C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep 17 11:43:11  [127675.393794][    C2] Call Trace:
Sep 17 11:43:11  [127675.393886][    C2]  <IRQ>
Sep 17 11:43:11 [127675.393973][ C2] ? __die (arch/x86/kernel/dumpstack.c:478 (discriminator 1) arch/x86/kernel/dumpstack.c:465 (discriminator 1) arch/x86/kernel/dumpstack.c:420 (discriminator 1) arch/x86/kernel/dumpstack.c:434 (discriminator 1))
Sep 17 11:43:11 [127675.394065][ C2] ? page_fault_oops (arch/x86/mm/fault.c:703)
Sep 17 11:43:11 [127675.394157][ C2] ? exc_page_fault (arch/x86/mm/fault.c:48 (discriminator 2) arch/x86/mm/fault.c:1479 (discriminator 2) arch/x86/mm/fault.c:1542 (discriminator 2))
Sep 17 11:43:11 [127675.394251][ C2] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570)
Sep 17 11:43:11 [127675.394347][ C2] ? kfree_skb_reason (net/core/skbuff.c:1006 net/core/skbuff.c:1022 net/core/skbuff.c:1058)
Sep 17 11:43:11 [127675.394443][ C2] ? tcp_mtu_probe (./include/net/sock.h:1627 (discriminator 1) net/ipv4/tcp_output.c:2338 (discriminator 1) net/ipv4/tcp_output.c:2463 (discriminator 1))
Sep 17 11:43:11 [127675.394539][ C2] ? tcp_write_xmit (net/ipv4/tcp_output.c:2678)
Sep 17 11:43:11 [127675.394634][ C2] ? __tcp_push_pending_frames (net/ipv4/tcp_output.c:2940 (discriminator 1))
Sep 17 11:43:11 [127675.394727][ C2] ? tcp_rcv_established (./include/net/tcp.h:2033 net/ipv4/tcp_input.c:5545 net/ipv4/tcp_input.c:6065)
Sep 17 11:43:11 [127675.394822][ C2] ? sk_filter_trim_cap (./include/linux/rcupdate.h:781 net/core/filter.c:157)
Sep 17 11:43:11 [127675.394914][ C2] ? tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1728)
Sep 17 11:43:11 [127675.395007][ C2] ? tcp_v4_rcv (./include/net/tcp.h:2342 (discriminator 1) net/ipv4/tcp_ipv4.c:2147 (discriminator 1))
Sep 17 11:43:11 [127675.395100][ C2] ? ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205)
Sep 17 11:43:11 [127675.395196][ C2] ? ip_local_deliver_finish (net/ipv4/ip_input.c:233 (discriminator 1))
Sep 17 11:43:11 [127675.395289][ C2] ? process_backlog (net/core/dev.c:5451 net/core/dev.c:5566 net/core/dev.c:5895)
Sep 17 11:43:11 [127675.395386][ C2] ? __napi_poll+0x20/0x180
Sep 17 11:43:11 [127675.395478][ C2] ? net_rx_action (net/core/dev.c:5839 net/core/dev.c:5860 net/core/dev.c:6684)
Sep 17 11:43:11 [127675.395572][ C2] ? __do_softirq (./arch/x86/include/asm/bitops.h:319 kernel/softirq.c:550)
Sep 17 11:43:11 [127675.395666][ C2] ? do_softirq (kernel/softirq.c:463 (discriminator 32))
Sep 17 11:43:11  [127675.395760][    C2]  </IRQ>
Sep 17 11:43:11  [127675.395849][    C2]  <TASK>
Sep 17 11:43:11 [127675.395939][ C2] ? flush_smp_call_function_queue (kernel/smp.c:563 (discriminator 1))
Sep 17 11:43:11 [127675.396039][ C2] ? do_idle (kernel/sched/idle.c:295)
Sep 17 11:43:11 [127675.396132][ C2] ? cpu_startup_entry (kernel/sched/idle.c:379 (discriminator 1))
Sep 17 11:43:11 [127675.396224][ C2] ? start_secondary (arch/x86/kernel/smpboot.c:326)
Sep 17 11:43:11 [127675.396318][ C2] ? secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:441)
Sep 17 11:43:11  [127675.396417][    C2]  </TASK>
Sep 17 11:43:11  [127675.396504][    C2] Modules linked in: nf_conntrack_netlink nft_limit pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
Sep 17 11:43:11  [127675.396775][    C2] CR2: ffff9bd9ff20f858
Sep 17 11:43:11  [127675.396868][    C2] ---[ end trace 0000000000000000 ]---
Sep 17 11:43:11  [127675.396961][    C2] RIP: 0010:0xffff9bd9ff20f858
Sep 17 11:43:11 [127675.397052][ C2] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 f8 20 ff d9 9b ff ff 00 00 00 00 00 00
All code
========
	...
  30:	00 00                	add    %al,(%rax)
  32:	58                   	pop    %rax
  33:*	f8                   	clc    		<-- trapping instruction
  34:	20 ff                	and    %bh,%bh
  36:	d9 9b ff ff 00 00    	fstps  0xffff(%rbx)
  3c:	00 00                	add    %al,(%rax)
	...

Code starting with the faulting instruction
===========================================
	...
   8:	58                   	pop    %rax
   9:	f8                   	clc
   a:	20 ff                	and    %bh,%bh
   c:	d9 9b ff ff 00 00    	fstps  0xffff(%rbx)
  12:	00 00                	add    %al,(%rax)
	...
Sep 17 11:43:11  [127675.397211][    C2] RSP: 0018:ffffadfe0007ccc8 EFLAGS: 00010282
Sep 17 11:43:11  [127675.397305][    C2] RAX: ffff9bd9ff20f858 RBX: ffff9bd9ff20f800 RCX: 0000000000000000
Sep 17 11:43:11  [127675.397419][    C2] RDX: 0000000000002711 RSI: 0000000000000246 RDI: ffff9bd9ff20f800
Sep 17 11:43:11  [127675.397535][    C2] RBP: ffff9bd9ff20f800 R08: 000000010ca6060f R09: 00000000000079f2
Sep 17 11:43:11  [127675.397651][    C2] R10: ffffd88b47077c00 R11: 0000000000000000 R12: ffff9bd9bb6ca1c0
Sep 17 11:43:11  [127675.397767][    C2] R13: ffff9bd9b9013760 R14: 0000000000000053 R15: ffff9bd9b9013600
Sep 17 11:43:11  [127675.397886][    C2] FS:  0000000000000000(0000) GS:ffff9be01f680000(0000) knlGS:0000000000000000
Sep 17 11:43:11  [127675.398006][    C2] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 17 11:43:11  [127675.398101][    C2] CR2: ffff9bd9ff20f858 CR3: 0000000234668001 CR4: 00000000003706e0
Sep 17 11:43:11  [127675.398217][    C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 17 11:43:11  [127675.398334][    C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep 17 11:43:11  [127675.398451][    C2] Kernel panic - not syncing: Fatal exception in interrupt
Sep 17 11:43:11  [127675.503611][    C2] Kernel Offset: 0x20000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Sep 17 11:43:11  [127675.503734][    C2] Rebooting in 10 seconds..



P.S.

upload kernel on 5 machine with diff hw and make same on every one .


Best regrads,
m.

> On 16 Sep 2023, at 12:04, Martin Zaharinov <micron10@gmail.com> wrote:
> 
> Hi Paolo
> 
> in first report machine dont have out of tree module
> 
> this bug is come after move from kernel 6.2 to 6.3
> 
> m.
> 
> On Sat, Sep 16, 2023, 11:27 Paolo Abeni <pabeni@redhat.com> wrote:
> On Sat, 2023-09-16 at 02:11 +0300, Martin Zaharinov wrote:
> > one more log:
> > 
> > Sep 12 07:37:29  [151563.298466][    C5] ------------[ cut here ]------------
> > Sep 12 07:37:29  [151563.298550][    C5] rcuref - imbalanced put()
> > Sep 12 07:37:29  [151563.298564][ C5] WARNING: CPU: 5 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> > Sep 12 07:37:29  [151563.298724][    C5] Modules linked in: nft_limit nf_conntrack_netlink vlan_mon(O) pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: BNGBOOT(O)]
> > Sep 12 07:37:29  [151563.298894][    C5] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G           O       6.5.2 #1
> 
> 
> You have out-of-tree modules taint in all the report you shared. Please
> try to reproduce the issue with such taint, thanks!
> 
> Paolo
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
       [not found]       ` <CALidq=UR=3rOHZczCnb1bEhbt9So60UZ5y60Cdh4aP41FkB5Tw@mail.gmail.com>
  2023-09-17 11:35         ` Martin Zaharinov
@ 2023-09-17 11:40         ` Martin Zaharinov
  2023-09-17 11:55           ` Martin Zaharinov
  1 sibling, 1 reply; 35+ messages in thread
From: Martin Zaharinov @ 2023-09-17 11:40 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Eric Dumazet, patchwork-bot+netdevbpf, Jakub Kicinski,
	Stephen Hemminger, kuba+netdrv, dsahern, Florian Westphal,
	Paolo Abeni, Pablo Neira Ayuso

One more in changelog for kernel 6.5 : https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.5

I see have many bug reports with : 

Sep 17 11:43:11  [127675.395289][    C2]  ? process_backlog+0x10c/0x230
Sep 17 11:43:11  [127675.395386][    C2]  ? __napi_poll+0x20/0x180
Sep 17 11:43:11  [127675.395478][    C2]  ? net_rx_action+0x2a4/0x390


In all server have simple nftables rulls , ethernet card is intel xl710 or 82599. its a very simple config.

m.	




> On 16 Sep 2023, at 12:04, Martin Zaharinov <micron10@gmail.com> wrote:
> 
> Hi Paolo
> 
> in first report machine dont have out of tree module
> 
> this bug is come after move from kernel 6.2 to 6.3
> 
> m.
> 
> On Sat, Sep 16, 2023, 11:27 Paolo Abeni <pabeni@redhat.com> wrote:
> On Sat, 2023-09-16 at 02:11 +0300, Martin Zaharinov wrote:
> > one more log:
> > 
> > Sep 12 07:37:29  [151563.298466][    C5] ------------[ cut here ]------------
> > Sep 12 07:37:29  [151563.298550][    C5] rcuref - imbalanced put()
> > Sep 12 07:37:29  [151563.298564][ C5] WARNING: CPU: 5 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> > Sep 12 07:37:29  [151563.298724][    C5] Modules linked in: nft_limit nf_conntrack_netlink vlan_mon(O) pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: BNGBOOT(O)]
> > Sep 12 07:37:29  [151563.298894][    C5] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G           O       6.5.2 #1
> 
> 
> You have out-of-tree modules taint in all the report you shared. Please
> try to reproduce the issue with such taint, thanks!
> 
> Paolo
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-17 11:40         ` Martin Zaharinov
@ 2023-09-17 11:55           ` Martin Zaharinov
  2023-09-17 12:04             ` Holger Hoffstätte
                               ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Martin Zaharinov @ 2023-09-17 11:55 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Eric Dumazet, patchwork-bot+netdevbpf, Jakub Kicinski,
	Stephen Hemminger, kuba+netdrv, dsahern, Florian Westphal,
	Pablo Neira Ayuso

Hi Eric
is it possible bug to come from this patch : https://patchwork.kernel.org/project/netdevbpf/cover/20230911170531.828100-1-edumazet@google.com/ 


m.

> On 17 Sep 2023, at 14:40, Martin Zaharinov <micron10@gmail.com> wrote:
> 
> One more in changelog for kernel 6.5 : https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.5
> 
> I see have many bug reports with : 
> 
> Sep 17 11:43:11  [127675.395289][    C2]  ? process_backlog+0x10c/0x230
> Sep 17 11:43:11  [127675.395386][    C2]  ? __napi_poll+0x20/0x180
> Sep 17 11:43:11  [127675.395478][    C2]  ? net_rx_action+0x2a4/0x390
> 
> 
> In all server have simple nftables rulls , ethernet card is intel xl710 or 82599. its a very simple config.
> 
> m. 
> 
> 
> 
> 
>> On 16 Sep 2023, at 12:04, Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> Hi Paolo
>> 
>> in first report machine dont have out of tree module
>> 
>> this bug is come after move from kernel 6.2 to 6.3
>> 
>> m.
>> 
>> On Sat, Sep 16, 2023, 11:27 Paolo Abeni <pabeni@redhat.com> wrote:
>> On Sat, 2023-09-16 at 02:11 +0300, Martin Zaharinov wrote:
>>> one more log:
>>> 
>>> Sep 12 07:37:29  [151563.298466][    C5] ------------[ cut here ]------------
>>> Sep 12 07:37:29  [151563.298550][    C5] rcuref - imbalanced put()
>>> Sep 12 07:37:29  [151563.298564][ C5] WARNING: CPU: 5 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
>>> Sep 12 07:37:29  [151563.298724][    C5] Modules linked in: nft_limit nf_conntrack_netlink vlan_mon(O) pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: BNGBOOT(O)]
>>> Sep 12 07:37:29  [151563.298894][    C5] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G           O       6.5.2 #1
>> 
>> 
>> You have out-of-tree modules taint in all the report you shared. Please
>> try to reproduce the issue with such taint, thanks!
>> 
>> Paolo
>> 
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-17 11:55           ` Martin Zaharinov
@ 2023-09-17 12:04             ` Holger Hoffstätte
  2023-09-18  8:09             ` Eric Dumazet
  2023-09-19 20:09             ` Martin Zaharinov
  2 siblings, 0 replies; 35+ messages in thread
From: Holger Hoffstätte @ 2023-09-17 12:04 UTC (permalink / raw)
  To: netdev

On Sun, 17 Sep 2023 14:55:25 +0300, Martin Zaharinov wrote:

> Hi Eric
> is it possible bug to come from this patch : https://patchwork.kernel.org/project/netdevbpf/cover/20230911170531.828100-1-edumazet@google.com/ 

No, because

1) those patches are not in any released kernel
2) they work fine

Holger


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-17 11:55           ` Martin Zaharinov
  2023-09-17 12:04             ` Holger Hoffstätte
@ 2023-09-18  8:09             ` Eric Dumazet
  2023-09-19 20:09             ` Martin Zaharinov
  2 siblings, 0 replies; 35+ messages in thread
From: Eric Dumazet @ 2023-09-18  8:09 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: Paolo Abeni, netdev, patchwork-bot+netdevbpf, Jakub Kicinski,
	Stephen Hemminger, kuba+netdrv, dsahern, Florian Westphal,
	Pablo Neira Ayuso

On Sun, Sep 17, 2023 at 1:55 PM Martin Zaharinov <micron10@gmail.com> wrote:
>
> Hi Eric
> is it possible bug to come from this patch : https://patchwork.kernel.org/project/netdevbpf/cover/20230911170531.828100-1-edumazet@google.com/
>
>

Everything is possible, but this is not in 6.5 kernels.

I would suggest you start a bisection.

> m.
>
> > On 17 Sep 2023, at 14:40, Martin Zaharinov <micron10@gmail.com> wrote:
> >
> > One more in changelog for kernel 6.5 : https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.5
> >
> > I see have many bug reports with :
> >
> > Sep 17 11:43:11  [127675.395289][    C2]  ? process_backlog+0x10c/0x230
> > Sep 17 11:43:11  [127675.395386][    C2]  ? __napi_poll+0x20/0x180
> > Sep 17 11:43:11  [127675.395478][    C2]  ? net_rx_action+0x2a4/0x390
> >
> >
> > In all server have simple nftables rulls , ethernet card is intel xl710 or 82599. its a very simple config.
> >
> > m.
> >
> >
> >
> >
> >> On 16 Sep 2023, at 12:04, Martin Zaharinov <micron10@gmail.com> wrote:
> >>
> >> Hi Paolo
> >>
> >> in first report machine dont have out of tree module
> >>
> >> this bug is come after move from kernel 6.2 to 6.3
> >>
> >> m.
> >>
> >> On Sat, Sep 16, 2023, 11:27 Paolo Abeni <pabeni@redhat.com> wrote:
> >> On Sat, 2023-09-16 at 02:11 +0300, Martin Zaharinov wrote:
> >>> one more log:
> >>>
> >>> Sep 12 07:37:29  [151563.298466][    C5] ------------[ cut here ]------------
> >>> Sep 12 07:37:29  [151563.298550][    C5] rcuref - imbalanced put()
> >>> Sep 12 07:37:29  [151563.298564][ C5] WARNING: CPU: 5 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> >>> Sep 12 07:37:29  [151563.298724][    C5] Modules linked in: nft_limit nf_conntrack_netlink vlan_mon(O) pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: BNGBOOT(O)]
> >>> Sep 12 07:37:29  [151563.298894][    C5] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G           O       6.5.2 #1
> >>
> >>
> >> You have out-of-tree modules taint in all the report you shared. Please
> >> try to reproduce the issue with such taint, thanks!
> >>
> >> Paolo
> >>
> >
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-17 11:55           ` Martin Zaharinov
  2023-09-17 12:04             ` Holger Hoffstätte
  2023-09-18  8:09             ` Eric Dumazet
@ 2023-09-19 20:09             ` Martin Zaharinov
  2023-09-20  3:59               ` Eric Dumazet
  2 siblings, 1 reply; 35+ messages in thread
From: Martin Zaharinov @ 2023-09-19 20:09 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Eric Dumazet, patchwork-bot+netdevbpf, Jakub Kicinski,
	Stephen Hemminger, kuba+netdrv, dsahern, Florian Westphal,
	Pablo Neira Ayuso

Hi Eric

Yes this patch is not come in 6.5 kernel and queue for 6.6 i test but not ok for now.

One more i find same error have in old kernel 6.4.8  , update to kernel 6.5.4 and same error is come .

Like this is hard to catch bug

see logs :


[1462610.861373] ------------[ cut here ]------------
[1462610.861480] rcuref - imbalanced put()
[1462610.861491] WARNING: CPU: 22 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath+0x5f/0x70
[1462610.861718] Modules linked in: nft_limit nf_conntrack_netlink  pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
[1462610.862004] CPU: 22 PID: 0 Comm: swapper/22 Tainted: G           O       6.4.8 #1
[1462610.863244] Hardware name: Supermicro Super Server/X10SRW-F, BIOS 3.4 06/05/2021
[1462610.863368] RIP: 0010:rcuref_put_slowpath+0x5f/0x70
[1462610.863469] Code: 31 c0 eb e2 80 3d 02 cd e6 00 00 74 0a c7 03 00 00 00 e0 31 c0 eb cf 48 c7 c7 7f 68 e5 a4 c6 05 e8 cc e6 00 01 e8 e1 ab c7 ff <0f> 0b eb df cc cc cc cc cc cc cc cc cc cc cc cc cc 48 89 fa 83 e2
[1462610.863637] RSP: 0018:ffffaee60070cc38 EFLAGS: 00010292
[1462610.863736] RAX: 0000000000000019 RBX: ffffa1cdc35e5780 RCX: 00000000fff7ffff
[1462610.863857] RDX: 00000000fff7ffff RSI: 0000000000000001 RDI: 00000000ffffffea
[1462610.864129] RBP: ffffa1cf6aeb8de8 R08: 0000000000000000 R09: 00000000fff7ffff
[1462610.864250] R10: ffffa1d51b000000 R11: 0000000000000003 R12: ffffa1cdc35e5740
[1462610.864370] R13: ffffa1cdc35e57a8 R14: ffffa1d51fda9008 R15: 00000000ade2eb6e
[1462610.864489] FS:  0000000000000000(0000) GS:ffffa1d51fd80000(0000) knlGS:0000000000000000
[1462610.864615] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1462610.864713] CR2: 00007f057b8ad000 CR3: 0000000141881003 CR4: 00000000001706e0
[1462610.864833] Call Trace:
[1462610.864928]  <IRQ>
[1462610.865021]  ? __warn+0x6c/0x130
[1462610.865124]  ? report_bug+0x1e4/0x260
[1462610.865223]  ? handle_bug+0x36/0x70
[1462610.865318]  ? exc_invalid_op+0x17/0x1a0
[1462610.865414]  ? asm_exc_invalid_op+0x16/0x20
[1462610.865517]  ? rcuref_put_slowpath+0x5f/0x70
[1462610.865618]  ? rcuref_put_slowpath+0x5f/0x70
[1462610.865719]  dst_release+0x2c/0x60
[1462610.865817]  rt_cache_route+0xbd/0xf0
[1462610.865913]  rt_set_nexthop.isra.0+0x1b6/0x440
[1462610.866008]  ip_route_input_slow+0x90e/0xc60
[1462610.866116]  ? nf_conntrack_udp_packet+0x16c/0x230 [nf_conntrack]
[1462610.866229]  ip_route_input_noref+0xed/0x100
[1462610.866328]  ip_rcv_finish_core.isra.0+0xb1/0x410
[1462610.866425]  ip_rcv+0xed/0x130
[1462610.866522]  ? ip_rcv_core.constprop.0+0x350/0x350
[1462610.866621]  process_backlog+0x10c/0x230
[1462610.866719]  __napi_poll+0x20/0x180
[1462610.866818]  net_rx_action+0x2a4/0x390
[1462610.866921]  __do_softirq+0xd0/0x202
[1462610.867020]  do_softirq+0x58/0x80
[1462610.867116]  </IRQ>
[1462610.867206]  <TASK>
[1462610.867298]  flush_smp_call_function_queue+0x3f/0x60
[1462610.867403]  do_idle+0x14d/0x210
[1462610.867500]  cpu_startup_entry+0x14/0x20
[1462610.867602]  start_secondary+0xec/0xf0
[1462610.867701]  secondary_startup_64_no_verify+0xf9/0xfb
[1462610.867799]  </TASK>
[1462610.867891] ---[ end trace 0000000000000000 ]—


And this si 6.5.4 : 

[39651.441371] ------------[ cut here ]------------
[39651.441455] rcuref - imbalanced put()
[39651.441470] WARNING: CPU: 12 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath+0x5f/0x70
[39651.441633] Modules linked in: nft_limit pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp igb i2c_algo_bit i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
[39651.441805] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G           O       6.5.3 #1
[39651.441911] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C612D8, BIOS P2.30 04/30/2018
[39651.442035] RIP: 0010:rcuref_put_slowpath+0x5f/0x70
[39651.442131] Code: 31 c0 eb e2 80 3d 86 ae e6 00 00 74 0a c7 03 00 00 00 e0 31 c0 eb cf 48 c7 c7 68 f6 e2 9a c6 05 6c ae e6 00 01 e8 11 71 c7 ff <0f> 0b eb df cc cc cc cc cc cc cc cc cc cc cc cc cc 48 89 fa 83 e2
[39651.442294] RSP: 0018:ffffbb9a404b4de8 EFLAGS: 00010296
[39651.442390] RAX: 0000000000000019 RBX: ffffa13ac9a32640 RCX: 00000000fff7ffff
[39651.442513] RDX: 00000000fff7ffff RSI: 0000000000000001 RDI: 00000000ffffffea
[39651.442630] RBP: ffffa13a44a04000 R08: 0000000000000000 R09: 00000000fff7ffff
[39651.442748] R10: ffffa1419ae00000 R11: 0000000000000003 R12: ffffa13ab640bec0
[39651.442866] R13: 0000000000000000 R14: 0000000000000010 R15: ffffbb9a404b4f60
[39651.442985] FS:  0000000000000000(0000) GS:ffffa1419f900000(0000) knlGS:0000000000000000
[39651.443106] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39651.443201] CR2: 0000564f9e23f6e0 CR3: 000000010bcea002 CR4: 00000000003706e0
[39651.443319] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[39651.443438] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[39651.443558] Call Trace:
[39651.443647]  <IRQ>
[39651.443736]  ? __warn+0x6c/0x130
[39651.443829]  ? report_bug+0x1e4/0x260
[39651.443924]  ? handle_bug+0x36/0x70
[39651.444016]  ? exc_invalid_op+0x17/0x1a0
[39651.444109]  ? asm_exc_invalid_op+0x16/0x20
[39651.444202]  ? rcuref_put_slowpath+0x5f/0x70
[39651.444297]  ? rcuref_put_slowpath+0x5f/0x70
[39651.444391]  dst_release+0x2c/0x60
[39651.444487]  __dev_queue_xmit+0x56c/0xbd0
[39651.444582]  ? nf_hook_slow+0x36/0xa0
[39651.444675]  ip_finish_output2+0x27b/0x520
[39651.444770]  process_backlog+0x10c/0x230
[39651.444866]  __napi_poll+0x20/0x180
[39651.444961]  net_rx_action+0x2a4/0x390
[39651.445055]  __do_softirq+0xd0/0x202
[39651.445148]  do_softirq+0x3a/0x50
[39651.445241]  </IRQ>
[39651.445329]  <TASK>
[39651.445416]  flush_smp_call_function_queue+0x3f/0x50
[39651.445516]  do_idle+0x14d/0x210
[39651.445609]  cpu_startup_entry+0x14/0x20
[39651.445702]  start_secondary+0xe1/0xf0
[39651.445797]  secondary_startup_64_no_verify+0x167/0x16b
[39651.445893]  </TASK>
[39651.445982] ---[ end trace 0000000000000000 ]—


best regards,
Martin

> On 17 Sep 2023, at 14:55, Martin Zaharinov <micron10@gmail.com> wrote:
> 
> Hi Eric
> is it possible bug to come from this patch : https://patchwork.kernel.org/project/netdevbpf/cover/20230911170531.828100-1-edumazet@google.com/ 
> 
> 
> m.
> 
>> On 17 Sep 2023, at 14:40, Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> One more in changelog for kernel 6.5 : https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.5
>> 
>> I see have many bug reports with : 
>> 
>> Sep 17 11:43:11  [127675.395289][    C2]  ? process_backlog+0x10c/0x230
>> Sep 17 11:43:11  [127675.395386][    C2]  ? __napi_poll+0x20/0x180
>> Sep 17 11:43:11  [127675.395478][    C2]  ? net_rx_action+0x2a4/0x390
>> 
>> 
>> In all server have simple nftables rulls , ethernet card is intel xl710 or 82599. its a very simple config.
>> 
>> m. 
>> 
>> 
>> 
>> 
>>> On 16 Sep 2023, at 12:04, Martin Zaharinov <micron10@gmail.com> wrote:
>>> 
>>> Hi Paolo
>>> 
>>> in first report machine dont have out of tree module
>>> 
>>> this bug is come after move from kernel 6.2 to 6.3
>>> 
>>> m.
>>> 
>>> On Sat, Sep 16, 2023, 11:27 Paolo Abeni <pabeni@redhat.com> wrote:
>>> On Sat, 2023-09-16 at 02:11 +0300, Martin Zaharinov wrote:
>>>> one more log:
>>>> 
>>>> Sep 12 07:37:29  [151563.298466][    C5] ------------[ cut here ]------------
>>>> Sep 12 07:37:29  [151563.298550][    C5] rcuref - imbalanced put()
>>>> Sep 12 07:37:29  [151563.298564][ C5] WARNING: CPU: 5 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
>>>> Sep 12 07:37:29  [151563.298724][    C5] Modules linked in: nft_limit nf_conntrack_netlink vlan_mon(O) pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: BNGBOOT(O)]
>>>> Sep 12 07:37:29  [151563.298894][    C5] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G           O       6.5.2 #1
>>> 
>>> 
>>> You have out-of-tree modules taint in all the report you shared. Please
>>> try to reproduce the issue with such taint, thanks!
>>> 
>>> Paolo
>>> 
>> 
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-19 20:09             ` Martin Zaharinov
@ 2023-09-20  3:59               ` Eric Dumazet
  2023-09-20  6:05                 ` Martin Zaharinov
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Dumazet @ 2023-09-20  3:59 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: Paolo Abeni, netdev, patchwork-bot+netdevbpf, Jakub Kicinski,
	Stephen Hemminger, kuba+netdrv, dsahern, Florian Westphal,
	Pablo Neira Ayuso

On Tue, Sep 19, 2023 at 10:09 PM Martin Zaharinov <micron10@gmail.com> wrote:
>
> Hi Eric
>
> Yes this patch is not come in 6.5 kernel and queue for 6.6 i test but not ok for now.

"not ok for now" ? What does this mean?
Pointing out patches that are not related to your issue is a waste of time.
If this was to bring my attention, this is a bad strategy, because I
will probably not read your future emails.

>
> One more i find same error have in old kernel 6.4.8  , update to kernel 6.5.4 and same error is come .
>
> Like this is hard to catch bug
>
> see logs :
>
>
> [1462610.861373] ------------[ cut here ]------------
> [1462610.861480] rcuref - imbalanced put()
> [1462610.861491] WARNING: CPU: 22 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath+0x5f/0x70
> [1462610.861718] Modules linked in: nft_limit nf_conntrack_netlink  pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
> [1462610.862004] CPU: 22 PID: 0 Comm: swapper/22 Tainted: G           O       6.4.8 #1
> [1462610.863244] Hardware name: Supermicro Super Server/X10SRW-F, BIOS 3.4 06/05/2021
> [1462610.863368] RIP: 0010:rcuref_put_slowpath+0x5f/0x70
> [1462610.863469] Code: 31 c0 eb e2 80 3d 02 cd e6 00 00 74 0a c7 03 00 00 00 e0 31 c0 eb cf 48 c7 c7 7f 68 e5 a4 c6 05 e8 cc e6 00 01 e8 e1 ab c7 ff <0f> 0b eb df cc cc cc cc cc cc cc cc cc cc cc cc cc 48 89 fa 83 e2
> [1462610.863637] RSP: 0018:ffffaee60070cc38 EFLAGS: 00010292
> [1462610.863736] RAX: 0000000000000019 RBX: ffffa1cdc35e5780 RCX: 00000000fff7ffff
> [1462610.863857] RDX: 00000000fff7ffff RSI: 0000000000000001 RDI: 00000000ffffffea
> [1462610.864129] RBP: ffffa1cf6aeb8de8 R08: 0000000000000000 R09: 00000000fff7ffff
> [1462610.864250] R10: ffffa1d51b000000 R11: 0000000000000003 R12: ffffa1cdc35e5740
> [1462610.864370] R13: ffffa1cdc35e57a8 R14: ffffa1d51fda9008 R15: 00000000ade2eb6e
> [1462610.864489] FS:  0000000000000000(0000) GS:ffffa1d51fd80000(0000) knlGS:0000000000000000
> [1462610.864615] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [1462610.864713] CR2: 00007f057b8ad000 CR3: 0000000141881003 CR4: 00000000001706e0
> [1462610.864833] Call Trace:
> [1462610.864928]  <IRQ>
> [1462610.865021]  ? __warn+0x6c/0x130
> [1462610.865124]  ? report_bug+0x1e4/0x260
> [1462610.865223]  ? handle_bug+0x36/0x70
> [1462610.865318]  ? exc_invalid_op+0x17/0x1a0
> [1462610.865414]  ? asm_exc_invalid_op+0x16/0x20
> [1462610.865517]  ? rcuref_put_slowpath+0x5f/0x70
> [1462610.865618]  ? rcuref_put_slowpath+0x5f/0x70
> [1462610.865719]  dst_release+0x2c/0x60
> [1462610.865817]  rt_cache_route+0xbd/0xf0
> [1462610.865913]  rt_set_nexthop.isra.0+0x1b6/0x440
> [1462610.866008]  ip_route_input_slow+0x90e/0xc60
> [1462610.866116]  ? nf_conntrack_udp_packet+0x16c/0x230 [nf_conntrack]
> [1462610.866229]  ip_route_input_noref+0xed/0x100
> [1462610.866328]  ip_rcv_finish_core.isra.0+0xb1/0x410
> [1462610.866425]  ip_rcv+0xed/0x130
> [1462610.866522]  ? ip_rcv_core.constprop.0+0x350/0x350
> [1462610.866621]  process_backlog+0x10c/0x230
> [1462610.866719]  __napi_poll+0x20/0x180
> [1462610.866818]  net_rx_action+0x2a4/0x390
> [1462610.866921]  __do_softirq+0xd0/0x202
> [1462610.867020]  do_softirq+0x58/0x80
> [1462610.867116]  </IRQ>
> [1462610.867206]  <TASK>
> [1462610.867298]  flush_smp_call_function_queue+0x3f/0x60
> [1462610.867403]  do_idle+0x14d/0x210
> [1462610.867500]  cpu_startup_entry+0x14/0x20
> [1462610.867602]  start_secondary+0xec/0xf0
> [1462610.867701]  secondary_startup_64_no_verify+0xf9/0xfb
> [1462610.867799]  </TASK>
> [1462610.867891] ---[ end trace 0000000000000000 ]—
>
>
> And this si 6.5.4 :
>
> [39651.441371] ------------[ cut here ]------------
> [39651.441455] rcuref - imbalanced put()
> [39651.441470] WARNING: CPU: 12 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath+0x5f/0x70
> [39651.441633] Modules linked in: nft_limit pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp igb i2c_algo_bit i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
> [39651.441805] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G           O       6.5.3 #1
> [39651.441911] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C612D8, BIOS P2.30 04/30/2018
> [39651.442035] RIP: 0010:rcuref_put_slowpath+0x5f/0x70
> [39651.442131] Code: 31 c0 eb e2 80 3d 86 ae e6 00 00 74 0a c7 03 00 00 00 e0 31 c0 eb cf 48 c7 c7 68 f6 e2 9a c6 05 6c ae e6 00 01 e8 11 71 c7 ff <0f> 0b eb df cc cc cc cc cc cc cc cc cc cc cc cc cc 48 89 fa 83 e2
> [39651.442294] RSP: 0018:ffffbb9a404b4de8 EFLAGS: 00010296
> [39651.442390] RAX: 0000000000000019 RBX: ffffa13ac9a32640 RCX: 00000000fff7ffff
> [39651.442513] RDX: 00000000fff7ffff RSI: 0000000000000001 RDI: 00000000ffffffea
> [39651.442630] RBP: ffffa13a44a04000 R08: 0000000000000000 R09: 00000000fff7ffff
> [39651.442748] R10: ffffa1419ae00000 R11: 0000000000000003 R12: ffffa13ab640bec0
> [39651.442866] R13: 0000000000000000 R14: 0000000000000010 R15: ffffbb9a404b4f60
> [39651.442985] FS:  0000000000000000(0000) GS:ffffa1419f900000(0000) knlGS:0000000000000000
> [39651.443106] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [39651.443201] CR2: 0000564f9e23f6e0 CR3: 000000010bcea002 CR4: 00000000003706e0
> [39651.443319] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [39651.443438] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [39651.443558] Call Trace:
> [39651.443647]  <IRQ>
> [39651.443736]  ? __warn+0x6c/0x130
> [39651.443829]  ? report_bug+0x1e4/0x260
> [39651.443924]  ? handle_bug+0x36/0x70
> [39651.444016]  ? exc_invalid_op+0x17/0x1a0
> [39651.444109]  ? asm_exc_invalid_op+0x16/0x20
> [39651.444202]  ? rcuref_put_slowpath+0x5f/0x70
> [39651.444297]  ? rcuref_put_slowpath+0x5f/0x70
> [39651.444391]  dst_release+0x2c/0x60
> [39651.444487]  __dev_queue_xmit+0x56c/0xbd0
> [39651.444582]  ? nf_hook_slow+0x36/0xa0
> [39651.444675]  ip_finish_output2+0x27b/0x520
> [39651.444770]  process_backlog+0x10c/0x230
> [39651.444866]  __napi_poll+0x20/0x180
> [39651.444961]  net_rx_action+0x2a4/0x390
> [39651.445055]  __do_softirq+0xd0/0x202
> [39651.445148]  do_softirq+0x3a/0x50
> [39651.445241]  </IRQ>
> [39651.445329]  <TASK>
> [39651.445416]  flush_smp_call_function_queue+0x3f/0x50
> [39651.445516]  do_idle+0x14d/0x210
> [39651.445609]  cpu_startup_entry+0x14/0x20
> [39651.445702]  start_secondary+0xe1/0xf0
> [39651.445797]  secondary_startup_64_no_verify+0x167/0x16b
> [39651.445893]  </TASK>
> [39651.445982] ---[ end trace 0000000000000000 ]—
>
>
> best regards,
> Martin

You keep sending traces without symbols, nobody here will even look at them.

Again, your best route is a bisection.

>
> > On 17 Sep 2023, at 14:55, Martin Zaharinov <micron10@gmail.com> wrote:
> >
> > Hi Eric
> > is it possible bug to come from this patch : https://patchwork.kernel.org/project/netdevbpf/cover/20230911170531.828100-1-edumazet@google.com/
> >
> >
> > m.
> >
> >> On 17 Sep 2023, at 14:40, Martin Zaharinov <micron10@gmail.com> wrote:
> >>
> >> One more in changelog for kernel 6.5 : https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.5
> >>
> >> I see have many bug reports with :
> >>
> >> Sep 17 11:43:11  [127675.395289][    C2]  ? process_backlog+0x10c/0x230
> >> Sep 17 11:43:11  [127675.395386][    C2]  ? __napi_poll+0x20/0x180
> >> Sep 17 11:43:11  [127675.395478][    C2]  ? net_rx_action+0x2a4/0x390
> >>
> >>
> >> In all server have simple nftables rulls , ethernet card is intel xl710 or 82599. its a very simple config.
> >>
> >> m.
> >>
> >>
> >>
> >>
> >>> On 16 Sep 2023, at 12:04, Martin Zaharinov <micron10@gmail.com> wrote:
> >>>
> >>> Hi Paolo
> >>>
> >>> in first report machine dont have out of tree module
> >>>
> >>> this bug is come after move from kernel 6.2 to 6.3
> >>>
> >>> m.
> >>>
> >>> On Sat, Sep 16, 2023, 11:27 Paolo Abeni <pabeni@redhat.com> wrote:
> >>> On Sat, 2023-09-16 at 02:11 +0300, Martin Zaharinov wrote:
> >>>> one more log:
> >>>>
> >>>> Sep 12 07:37:29  [151563.298466][    C5] ------------[ cut here ]------------
> >>>> Sep 12 07:37:29  [151563.298550][    C5] rcuref - imbalanced put()
> >>>> Sep 12 07:37:29  [151563.298564][ C5] WARNING: CPU: 5 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> >>>> Sep 12 07:37:29  [151563.298724][    C5] Modules linked in: nft_limit nf_conntrack_netlink vlan_mon(O) pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: BNGBOOT(O)]
> >>>> Sep 12 07:37:29  [151563.298894][    C5] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G           O       6.5.2 #1
> >>>
> >>>
> >>> You have out-of-tree modules taint in all the report you shared. Please
> >>> try to reproduce the issue with such taint, thanks!
> >>>
> >>> Paolo
> >>>
> >>
> >
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-20  3:59               ` Eric Dumazet
@ 2023-09-20  6:05                 ` Martin Zaharinov
  2023-09-20  6:16                   ` Bagas Sanjaya
  0 siblings, 1 reply; 35+ messages in thread
From: Martin Zaharinov @ 2023-09-20  6:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paolo Abeni, netdev, patchwork-bot+netdevbpf, Jakub Kicinski,
	Stephen Hemminger, kuba+netdrv, dsahern, Florian Westphal,
	Pablo Neira Ayuso

Hi Eric

> On 20 Sep 2023, at 6:59, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Tue, Sep 19, 2023 at 10:09 PM Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> Hi Eric
>> 
>> Yes this patch is not come in 6.5 kernel and queue for 6.6 i test but not ok for now.
> 
> "not ok for now" ? What does this mean?
> Pointing out patches that are not related to your issue is a waste of time.
> If this was to bring my attention, this is a bad strategy, because I
> will probably not read your future emails.
> 

I'm sorry, I didn't speak correctly.
patch is very good but for kernel 6.6.
I enjoy your kernel improvements. 
And thanks for that !!


>> 
>> One more i find same error have in old kernel 6.4.8  , update to kernel 6.5.4 and same error is come .
>> 
>> Like this is hard to catch bug
>> 
>> see logs :
>> 
>> 
>> [1462610.861373] ------------[ cut here ]------------
>> [1462610.861480] rcuref - imbalanced put()
>> [1462610.861491] WARNING: CPU: 22 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath+0x5f/0x70
>> [1462610.861718] Modules linked in: nft_limit nf_conntrack_netlink  pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
>> [1462610.862004] CPU: 22 PID: 0 Comm: swapper/22 Tainted: G           O       6.4.8 #1
>> [1462610.863244] Hardware name: Supermicro Super Server/X10SRW-F, BIOS 3.4 06/05/2021
>> [1462610.863368] RIP: 0010:rcuref_put_slowpath+0x5f/0x70
>> [1462610.863469] Code: 31 c0 eb e2 80 3d 02 cd e6 00 00 74 0a c7 03 00 00 00 e0 31 c0 eb cf 48 c7 c7 7f 68 e5 a4 c6 05 e8 cc e6 00 01 e8 e1 ab c7 ff <0f> 0b eb df cc cc cc cc cc cc cc cc cc cc cc cc cc 48 89 fa 83 e2
>> [1462610.863637] RSP: 0018:ffffaee60070cc38 EFLAGS: 00010292
>> [1462610.863736] RAX: 0000000000000019 RBX: ffffa1cdc35e5780 RCX: 00000000fff7ffff
>> [1462610.863857] RDX: 00000000fff7ffff RSI: 0000000000000001 RDI: 00000000ffffffea
>> [1462610.864129] RBP: ffffa1cf6aeb8de8 R08: 0000000000000000 R09: 00000000fff7ffff
>> [1462610.864250] R10: ffffa1d51b000000 R11: 0000000000000003 R12: ffffa1cdc35e5740
>> [1462610.864370] R13: ffffa1cdc35e57a8 R14: ffffa1d51fda9008 R15: 00000000ade2eb6e
>> [1462610.864489] FS:  0000000000000000(0000) GS:ffffa1d51fd80000(0000) knlGS:0000000000000000
>> [1462610.864615] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [1462610.864713] CR2: 00007f057b8ad000 CR3: 0000000141881003 CR4: 00000000001706e0
>> [1462610.864833] Call Trace:
>> [1462610.864928]  <IRQ>
>> [1462610.865021]  ? __warn+0x6c/0x130
>> [1462610.865124]  ? report_bug+0x1e4/0x260
>> [1462610.865223]  ? handle_bug+0x36/0x70
>> [1462610.865318]  ? exc_invalid_op+0x17/0x1a0
>> [1462610.865414]  ? asm_exc_invalid_op+0x16/0x20
>> [1462610.865517]  ? rcuref_put_slowpath+0x5f/0x70
>> [1462610.865618]  ? rcuref_put_slowpath+0x5f/0x70
>> [1462610.865719]  dst_release+0x2c/0x60
>> [1462610.865817]  rt_cache_route+0xbd/0xf0
>> [1462610.865913]  rt_set_nexthop.isra.0+0x1b6/0x440
>> [1462610.866008]  ip_route_input_slow+0x90e/0xc60
>> [1462610.866116]  ? nf_conntrack_udp_packet+0x16c/0x230 [nf_conntrack]
>> [1462610.866229]  ip_route_input_noref+0xed/0x100
>> [1462610.866328]  ip_rcv_finish_core.isra.0+0xb1/0x410
>> [1462610.866425]  ip_rcv+0xed/0x130
>> [1462610.866522]  ? ip_rcv_core.constprop.0+0x350/0x350
>> [1462610.866621]  process_backlog+0x10c/0x230
>> [1462610.866719]  __napi_poll+0x20/0x180
>> [1462610.866818]  net_rx_action+0x2a4/0x390
>> [1462610.866921]  __do_softirq+0xd0/0x202
>> [1462610.867020]  do_softirq+0x58/0x80
>> [1462610.867116]  </IRQ>
>> [1462610.867206]  <TASK>
>> [1462610.867298]  flush_smp_call_function_queue+0x3f/0x60
>> [1462610.867403]  do_idle+0x14d/0x210
>> [1462610.867500]  cpu_startup_entry+0x14/0x20
>> [1462610.867602]  start_secondary+0xec/0xf0
>> [1462610.867701]  secondary_startup_64_no_verify+0xf9/0xfb
>> [1462610.867799]  </TASK>
>> [1462610.867891] ---[ end trace 0000000000000000 ]—
>> 
>> 
>> And this si 6.5.4 :
>> 
>> [39651.441371] ------------[ cut here ]------------
>> [39651.441455] rcuref - imbalanced put()
>> [39651.441470] WARNING: CPU: 12 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath+0x5f/0x70
>> [39651.441633] Modules linked in: nft_limit pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp igb i2c_algo_bit i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
>> [39651.441805] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G           O       6.5.3 #1
>> [39651.441911] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C612D8, BIOS P2.30 04/30/2018
>> [39651.442035] RIP: 0010:rcuref_put_slowpath+0x5f/0x70
>> [39651.442131] Code: 31 c0 eb e2 80 3d 86 ae e6 00 00 74 0a c7 03 00 00 00 e0 31 c0 eb cf 48 c7 c7 68 f6 e2 9a c6 05 6c ae e6 00 01 e8 11 71 c7 ff <0f> 0b eb df cc cc cc cc cc cc cc cc cc cc cc cc cc 48 89 fa 83 e2
>> [39651.442294] RSP: 0018:ffffbb9a404b4de8 EFLAGS: 00010296
>> [39651.442390] RAX: 0000000000000019 RBX: ffffa13ac9a32640 RCX: 00000000fff7ffff
>> [39651.442513] RDX: 00000000fff7ffff RSI: 0000000000000001 RDI: 00000000ffffffea
>> [39651.442630] RBP: ffffa13a44a04000 R08: 0000000000000000 R09: 00000000fff7ffff
>> [39651.442748] R10: ffffa1419ae00000 R11: 0000000000000003 R12: ffffa13ab640bec0
>> [39651.442866] R13: 0000000000000000 R14: 0000000000000010 R15: ffffbb9a404b4f60
>> [39651.442985] FS:  0000000000000000(0000) GS:ffffa1419f900000(0000) knlGS:0000000000000000
>> [39651.443106] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [39651.443201] CR2: 0000564f9e23f6e0 CR3: 000000010bcea002 CR4: 00000000003706e0
>> [39651.443319] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [39651.443438] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [39651.443558] Call Trace:
>> [39651.443647]  <IRQ>
>> [39651.443736]  ? __warn+0x6c/0x130
>> [39651.443829]  ? report_bug+0x1e4/0x260
>> [39651.443924]  ? handle_bug+0x36/0x70
>> [39651.444016]  ? exc_invalid_op+0x17/0x1a0
>> [39651.444109]  ? asm_exc_invalid_op+0x16/0x20
>> [39651.444202]  ? rcuref_put_slowpath+0x5f/0x70
>> [39651.444297]  ? rcuref_put_slowpath+0x5f/0x70
>> [39651.444391]  dst_release+0x2c/0x60
>> [39651.444487]  __dev_queue_xmit+0x56c/0xbd0
>> [39651.444582]  ? nf_hook_slow+0x36/0xa0
>> [39651.444675]  ip_finish_output2+0x27b/0x520
>> [39651.444770]  process_backlog+0x10c/0x230
>> [39651.444866]  __napi_poll+0x20/0x180
>> [39651.444961]  net_rx_action+0x2a4/0x390
>> [39651.445055]  __do_softirq+0xd0/0x202
>> [39651.445148]  do_softirq+0x3a/0x50
>> [39651.445241]  </IRQ>
>> [39651.445329]  <TASK>
>> [39651.445416]  flush_smp_call_function_queue+0x3f/0x50
>> [39651.445516]  do_idle+0x14d/0x210
>> [39651.445609]  cpu_startup_entry+0x14/0x20
>> [39651.445702]  start_secondary+0xe1/0xf0
>> [39651.445797]  secondary_startup_64_no_verify+0x167/0x16b
>> [39651.445893]  </TASK>
>> [39651.445982] ---[ end trace 0000000000000000 ]—
>> 
>> 
>> best regards,
>> Martin
> 
> You keep sending traces without symbols, nobody here will even look at them.
> 


Here is trace with symbols : 

[39651.441371] ------------[ cut here ]------------
[39651.441455] rcuref - imbalanced put()
[39651.441470] WARNING: CPU: 12 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
[39651.441633] Modules linked in: nft_limit pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp igb i2c_algo_bit i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
[39651.441805] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G           O       6.5.3 #1
[39651.441911] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C612D8, BIOS P2.30 04/30/2018
[39651.442035] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
[39651.442131] Code: 31 c0 eb e2 80 3d 86 ae e6 00 00 74 0a c7 03 00 00 00 e0 31 c0 eb cf 48 c7 c7 68 f6 e2 9a c6 05 6c ae e6 00 01 e8 11 71 c7 ff <0f> 0b eb df cc cc cc cc cc cc cc cc cc cc cc cc cc 48 89 fa 83 e2
All code
========
   0:	31 c0                	xor    %eax,%eax
   2:	eb e2                	jmp    0xffffffffffffffe6
   4:	80 3d 86 ae e6 00 00 	cmpb   $0x0,0xe6ae86(%rip)        # 0xe6ae91
   b:	74 0a                	je     0x17
   d:	c7 03 00 00 00 e0    	movl   $0xe0000000,(%rbx)
  13:	31 c0                	xor    %eax,%eax
  15:	eb cf                	jmp    0xffffffffffffffe6
  17:	48 c7 c7 68 f6 e2 9a 	mov    $0xffffffff9ae2f668,%rdi
  1e:	c6 05 6c ae e6 00 01 	movb   $0x1,0xe6ae6c(%rip)        # 0xe6ae91
  25:	e8 11 71 c7 ff       	call   0xffffffffffc7713b
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	eb df                	jmp    0xd
  2e:	cc                   	int3
  2f:	cc                   	int3
  30:	cc                   	int3
  31:	cc                   	int3
  32:	cc                   	int3
  33:	cc                   	int3
  34:	cc                   	int3
  35:	cc                   	int3
  36:	cc                   	int3
  37:	cc                   	int3
  38:	cc                   	int3
  39:	cc                   	int3
  3a:	cc                   	int3
  3b:	48 89 fa             	mov    %rdi,%rdx
  3e:	83                   	.byte 0x83
  3f:	e2                   	.byte 0xe2

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	eb df                	jmp    0xffffffffffffffe3
   4:	cc                   	int3
   5:	cc                   	int3
   6:	cc                   	int3
   7:	cc                   	int3
   8:	cc                   	int3
   9:	cc                   	int3
   a:	cc                   	int3
   b:	cc                   	int3
   c:	cc                   	int3
   d:	cc                   	int3
   e:	cc                   	int3
   f:	cc                   	int3
  10:	cc                   	int3
  11:	48 89 fa             	mov    %rdi,%rdx
  14:	83                   	.byte 0x83
  15:	e2                   	.byte 0xe2
[39651.442294] RSP: 0018:ffffbb9a404b4de8 EFLAGS: 00010296
[39651.442390] RAX: 0000000000000019 RBX: ffffa13ac9a32640 RCX: 00000000fff7ffff
[39651.442513] RDX: 00000000fff7ffff RSI: 0000000000000001 RDI: 00000000ffffffea
[39651.442630] RBP: ffffa13a44a04000 R08: 0000000000000000 R09: 00000000fff7ffff
[39651.442748] R10: ffffa1419ae00000 R11: 0000000000000003 R12: ffffa13ab640bec0
[39651.442866] R13: 0000000000000000 R14: 0000000000000010 R15: ffffbb9a404b4f60
[39651.442985] FS:  0000000000000000(0000) GS:ffffa1419f900000(0000) knlGS:0000000000000000
[39651.443106] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39651.443201] CR2: 0000564f9e23f6e0 CR3: 000000010bcea002 CR4: 00000000003706e0
[39651.443319] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[39651.443438] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[39651.443558] Call Trace:
[39651.443647]  <IRQ>
[39651.443736] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
[39651.443829] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[39651.443924] ? handle_bug (arch/x86/kernel/traps.c:324)
[39651.444016] ? exc_invalid_op (arch/x86/kernel/traps.c:345 (discriminator 1))
[39651.444109] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
[39651.444202] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
[39651.444297] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
[39651.444391] dst_release (./arch/x86/include/asm/preempt.h:95 ./include/linux/rcuref.h:151 net/core/dst.c:166)
[39651.444487] __dev_queue_xmit (./include/net/dst.h:283 net/core/dev.c:4158)
[39651.444582] ? nf_hook_slow (./include/linux/netfilter.h:143 net/netfilter/core.c:626)
[39651.444675] ip_finish_output2 (./include/linux/netdevice.h:3088 ./include/net/neighbour.h:528 ./include/net/neighbour.h:542 net/ipv4/ip_output.c:230)
[39651.444770] process_backlog (./include/linux/rcupdate.h:781 net/core/dev.c:5896)
[39651.444866] __napi_poll (net/core/dev.c:6461)
[39651.444961] net_rx_action (net/core/dev.c:6530 net/core/dev.c:6661)
[39651.445055] __do_softirq (./arch/x86/include/asm/preempt.h:27 kernel/softirq.c:564)
[39651.445148] do_softirq (kernel/softirq.c:463 (discriminator 32) kernel/softirq.c:450 (discriminator 32))
[39651.445241]  </IRQ>
[39651.445329]  <TASK>
[39651.445416] flush_smp_call_function_queue (./arch/x86/include/asm/irqflags.h:134 (discriminator 1) kernel/smp.c:570 (discriminator 1))
[39651.445516] do_idle (kernel/sched/idle.c:314)
[39651.445609] cpu_startup_entry (kernel/sched/idle.c:378)
[39651.445702] start_secondary (arch/x86/kernel/smpboot.c:326)
[39651.445797] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:441)
[39651.445893]  </TASK>
[39651.445982] ---[ end trace 0000000000000000 ]---



> Again, your best route is a bisection.

For now its not possible to make bisection , its hard to change kernel on running machine …

is there another way to catch from where is come this bug message.

Best regards,
Martin 





^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-20  6:05                 ` Martin Zaharinov
@ 2023-09-20  6:16                   ` Bagas Sanjaya
  2023-09-20  7:03                     ` Martin Zaharinov
  0 siblings, 1 reply; 35+ messages in thread
From: Bagas Sanjaya @ 2023-09-20  6:16 UTC (permalink / raw)
  To: Martin Zaharinov, Eric Dumazet
  Cc: Paolo Abeni, netdev, patchwork-bot+netdevbpf, Jakub Kicinski,
	Stephen Hemminger, kuba+netdrv, dsahern, Florian Westphal,
	Pablo Neira Ayuso

[-- Attachment #1: Type: text/plain, Size: 501 bytes --]

On Wed, Sep 20, 2023 at 09:05:10AM +0300, Martin Zaharinov wrote:
> > On 20 Sep 2023, at 6:59, Eric Dumazet <edumazet@google.com> wrote:
> > Again, your best route is a bisection.
> 
> For now its not possible to make bisection , its hard to change kernel on running machine …
> 

You have to do bisection, unfortunately. There is many guides there on
Internet. Or you can read Documentation/admin-guide/bug-bisect.rst.

Bye!

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-20  6:16                   ` Bagas Sanjaya
@ 2023-09-20  7:03                     ` Martin Zaharinov
  2023-09-20  7:25                       ` Eric Dumazet
  0 siblings, 1 reply; 35+ messages in thread
From: Martin Zaharinov @ 2023-09-20  7:03 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: Eric Dumazet, Paolo Abeni, netdev, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern,
	Florian Westphal, Pablo Neira Ayuso

Hi

Ok on first see all is look come after in kernel 6.4 add : atomics: Provide rcuref - scalable reference counting  ( https://www.spinics.net/lists/linux-tip-commits/msg62042.html )

I check all running machine with kernel 6.4.2 is minimal and have same bug report.

i have fell machine with kernel 6.3.9 and not see problems there .

and the problem may be is allocate in this part : 

[39651.444202] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
[39651.444297] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
[39651.444391] dst_release (./arch/x86/include/asm/preempt.h:95 ./include/linux/rcuref.h:151 net/core/dst.c:166)
[39651.444487] __dev_queue_xmit (./include/net/dst.h:283 net/core/dev.c:4158)
[39651.444582] ? nf_hook_slow (./include/linux/netfilter.h:143 net/netfilter/core.c:626)

may be changes in dst.c make problem , I'm guessing at the moment. 

but in real with kernel 6.3 all is fine for now.

dst.c changes 6.3.9 > 6.5.4 :

--- linux-6.3.9/net/core/dst.c	2023-06-21 14:02:19.000000000 +0000
+++ linux-6.5.4/net/core/dst.c	2023-09-19 10:30:30.000000000 +0000
@@ -66,7 +66,8 @@ void dst_init(struct dst_entry *dst, str
 	dst->tclassid = 0;
 #endif
 	dst->lwtstate = NULL;
-	atomic_set(&dst->__refcnt, initial_ref);
+	rcuref_init(&dst->__rcuref, initial_ref);
+	INIT_LIST_HEAD(&dst->rt_uncached);
 	dst->__use = 0;
 	dst->lastuse = jiffies;
 	dst->flags = flags;
@@ -162,31 +163,15 @@ EXPORT_SYMBOL(dst_dev_put);

 void dst_release(struct dst_entry *dst)
 {
-	if (dst) {
-		int newrefcnt;
-
-		newrefcnt = atomic_dec_return(&dst->__refcnt);
-		if (WARN_ONCE(newrefcnt < 0, "dst_release underflow"))
-			net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
-					     __func__, dst, newrefcnt);
-		if (!newrefcnt)
-			call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu);
-	}
+	if (dst && rcuref_put(&dst->__rcuref))
+		call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu);
 }
 EXPORT_SYMBOL(dst_release);

 void dst_release_immediate(struct dst_entry *dst)
 {
-	if (dst) {
-		int newrefcnt;
-
-		newrefcnt = atomic_dec_return(&dst->__refcnt);
-		if (WARN_ONCE(newrefcnt < 0, "dst_release_immediate underflow"))
-			net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
-					     __func__, dst, newrefcnt);
-		if (!newrefcnt)
-			dst_destroy(dst);
-	}
+	if (dst && rcuref_put(&dst->__rcuref))
+		dst_destroy(dst);
 }
 EXPORT_SYMBOL(dst_release_immediate);



> On 20 Sep 2023, at 9:16, Bagas Sanjaya <bagasdotme@gmail.com> wrote:
> 
> On Wed, Sep 20, 2023 at 09:05:10AM +0300, Martin Zaharinov wrote:
>>> On 20 Sep 2023, at 6:59, Eric Dumazet <edumazet@google.com> wrote:
>>> Again, your best route is a bisection.
>> 
>> For now its not possible to make bisection , its hard to change kernel on running machine …
>> 
> 
> You have to do bisection, unfortunately. There is many guides there on
> Internet. Or you can read Documentation/admin-guide/bug-bisect.rst.
> 
> Bye!
> 
> -- 
> An old man doll... just what I always wanted! - Clara


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-20  7:03                     ` Martin Zaharinov
@ 2023-09-20  7:25                       ` Eric Dumazet
  2023-09-20  7:29                         ` Eric Dumazet
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Dumazet @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: Bagas Sanjaya, Paolo Abeni, netdev, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern,
	Florian Westphal, Pablo Neira Ayuso

On Wed, Sep 20, 2023 at 9:04 AM Martin Zaharinov <micron10@gmail.com> wrote:
>
> Hi
>
> Ok on first see all is look come after in kernel 6.4 add : atomics: Provide rcuref - scalable reference counting  ( https://www.spinics.net/lists/linux-tip-commits/msg62042.html )
>
> I check all running machine with kernel 6.4.2 is minimal and have same bug report.
>
> i have fell machine with kernel 6.3.9 and not see problems there .
>
> and the problem may be is allocate in this part :
>
> [39651.444202] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> [39651.444297] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> [39651.444391] dst_release (./arch/x86/include/asm/preempt.h:95 ./include/linux/rcuref.h:151 net/core/dst.c:166)
> [39651.444487] __dev_queue_xmit (./include/net/dst.h:283 net/core/dev.c:4158)
> [39651.444582] ? nf_hook_slow (./include/linux/netfilter.h:143 net/netfilter/core.c:626)
>
> may be changes in dst.c make problem , I'm guessing at the moment.
>
> but in real with kernel 6.3 all is fine for now.
>
> dst.c changes 6.3.9 > 6.5.4 :

Then start a real bisection. This is going to be the last time I say it.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-20  7:25                       ` Eric Dumazet
@ 2023-09-20  7:29                         ` Eric Dumazet
  2023-09-20  7:32                           ` Martin Zaharinov
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Dumazet @ 2023-09-20  7:29 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: Bagas Sanjaya, Paolo Abeni, netdev, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern,
	Florian Westphal, Pablo Neira Ayuso

On Wed, Sep 20, 2023 at 9:25 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Sep 20, 2023 at 9:04 AM Martin Zaharinov <micron10@gmail.com> wrote:
> >
> > Hi
> >
> > Ok on first see all is look come after in kernel 6.4 add : atomics: Provide rcuref - scalable reference counting  ( https://www.spinics.net/lists/linux-tip-commits/msg62042.html )
> >
> > I check all running machine with kernel 6.4.2 is minimal and have same bug report.
> >
> > i have fell machine with kernel 6.3.9 and not see problems there .
> >
> > and the problem may be is allocate in this part :
> >
> > [39651.444202] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> > [39651.444297] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> > [39651.444391] dst_release (./arch/x86/include/asm/preempt.h:95 ./include/linux/rcuref.h:151 net/core/dst.c:166)
> > [39651.444487] __dev_queue_xmit (./include/net/dst.h:283 net/core/dev.c:4158)
> > [39651.444582] ? nf_hook_slow (./include/linux/netfilter.h:143 net/netfilter/core.c:626)
> >
> > may be changes in dst.c make problem , I'm guessing at the moment.
> >
> > but in real with kernel 6.3 all is fine for now.
> >
> > dst.c changes 6.3.9 > 6.5.4 :
>
> Then start a real bisection. This is going to be the last time I say it.

Or stick to an older kernel for your production, and wait for others
to find the issue.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-20  7:29                         ` Eric Dumazet
@ 2023-09-20  7:32                           ` Martin Zaharinov
  2023-09-21  7:50                             ` Bagas Sanjaya
  0 siblings, 1 reply; 35+ messages in thread
From: Martin Zaharinov @ 2023-09-20  7:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Bagas Sanjaya, Paolo Abeni, netdev, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern,
	Florian Westphal, Pablo Neira Ayuso

I will make this yes .

And will wait if any find fix in future release.

Thanks for your time Eric

m.



> On 20 Sep 2023, at 10:29, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Wed, Sep 20, 2023 at 9:25 AM Eric Dumazet <edumazet@google.com> wrote:
>> 
>> On Wed, Sep 20, 2023 at 9:04 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>> 
>>> Hi
>>> 
>>> Ok on first see all is look come after in kernel 6.4 add : atomics: Provide rcuref - scalable reference counting  ( https://www.spinics.net/lists/linux-tip-commits/msg62042.html )
>>> 
>>> I check all running machine with kernel 6.4.2 is minimal and have same bug report.
>>> 
>>> i have fell machine with kernel 6.3.9 and not see problems there .
>>> 
>>> and the problem may be is allocate in this part :
>>> 
>>> [39651.444202] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
>>> [39651.444297] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
>>> [39651.444391] dst_release (./arch/x86/include/asm/preempt.h:95 ./include/linux/rcuref.h:151 net/core/dst.c:166)
>>> [39651.444487] __dev_queue_xmit (./include/net/dst.h:283 net/core/dev.c:4158)
>>> [39651.444582] ? nf_hook_slow (./include/linux/netfilter.h:143 net/netfilter/core.c:626)
>>> 
>>> may be changes in dst.c make problem , I'm guessing at the moment.
>>> 
>>> but in real with kernel 6.3 all is fine for now.
>>> 
>>> dst.c changes 6.3.9 > 6.5.4 :
>> 
>> Then start a real bisection. This is going to be the last time I say it.
> 
> Or stick to an older kernel for your production, and wait for others
> to find the issue.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-20  7:32                           ` Martin Zaharinov
@ 2023-09-21  7:50                             ` Bagas Sanjaya
  2023-09-21  8:13                               ` Martin Zaharinov
  0 siblings, 1 reply; 35+ messages in thread
From: Bagas Sanjaya @ 2023-09-21  7:50 UTC (permalink / raw)
  To: Martin Zaharinov, Eric Dumazet
  Cc: Paolo Abeni, netdev, patchwork-bot+netdevbpf, Jakub Kicinski,
	Stephen Hemminger, kuba+netdrv, dsahern, Florian Westphal,
	Pablo Neira Ayuso

On 20/09/2023 14:32, Martin Zaharinov wrote:
> I will make this yes .
> 
> And will wait if any find fix in future release.
> 

Please don't top-post; reply inline with appropriate context instead.

Martin, what prevents you from doing bisection as Eric requested again?
If you only have production systems, why can't you afford to have
testing ones? Why not turning one of your prod machines to be testing
and bisect from there?

Sorry for inconvenience.

-- 
An old man doll... just what I always wanted! - Clara


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-21  7:50                             ` Bagas Sanjaya
@ 2023-09-21  8:13                               ` Martin Zaharinov
  2023-09-22  3:06                                 ` Bagas Sanjaya
  0 siblings, 1 reply; 35+ messages in thread
From: Martin Zaharinov @ 2023-09-21  8:13 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: Eric Dumazet, Paolo Abeni, netdev, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern,
	Florian Westphal, Pablo Neira Ayuso

Hi Bagas,


Its not easy to make this on production, have too many users on it.

i make checks and find with kernel 6.3.12-6.5.13 all is fine.
on first machine that i have with kernel 6.4 and still work run kernel 6.4.2 and have problem.

in my investigation problem is start after migration to kernel 6.4.x 

in 6.4 kernel is add rcuref : 

https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.4 

commit bc9d3a9f2afca189a6ae40225b6985e3c775375e
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Thu Mar 23 21:55:32 2023 +0100

net: dst: Switch to rcuref_t reference counting

Under high contention dst_entry::__refcnt becomes a significant bottleneck.

atomic_inc_not_zero() is implemented with a cmpxchg() loop, which goes into
high retry rates on contention.

Switch the reference count to rcuref_t which results in a significant
performance gain. Rename the reference count member to __rcuref to reflect
the change.

The gain depends on the micro-architecture and the number of concurrent
operations and has been measured in the range of +25% to +130% with a
localhost memtier/memcached benchmark which amplifies the problem
massively.

Running the memtier/memcached benchmark over a real (1Gb) network
connection the conversion on top of the false sharing fix for struct
dst_entry::__refcnt results in a total gain in the 2%-5% range over the
upstream baseline.

Reported-by: Wangyang Guo <wangyang.guo@intel.com>
Reported-by: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230307125538.989175656@linutronix.de
Link: https://lore.kernel.org/r/20230323102800.215027837@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


and i think problem is here : 

--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -66,7 +66,7 @@ void dst_init(struct dst_entry *dst, str
dst->tclassid = 0;
#endif
dst->lwtstate = NULL;
- atomic_set(&dst->__refcnt, initial_ref);
+ rcuref_init(&dst->__refcnt, initial_ref);
dst->__use = 0;
dst->lastuse = jiffies;
dst->flags = flags;
@@ -162,31 +162,15 @@ EXPORT_SYMBOL(dst_dev_put);

void dst_release(struct dst_entry *dst)
{
- if (dst) {
- int newrefcnt;
-
- newrefcnt = atomic_dec_return(&dst->__refcnt);
- if (WARN_ONCE(newrefcnt < 0, "dst_release underflow"))
- net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
- __func__, dst, newrefcnt);
- if (!newrefcnt)
- call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu);
- }
+ if (dst && rcuref_put(&dst->__refcnt))
+ call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu);
}
EXPORT_SYMBOL(dst_release);

void dst_release_immediate(struct dst_entry *dst)
{
- if (dst) {
- int newrefcnt;
-
- newrefcnt = atomic_dec_return(&dst->__refcnt);
- if (WARN_ONCE(newrefcnt < 0, "dst_release_immediate underflow"))
- net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
- __func__, dst, newrefcnt);
- if (!newrefcnt)
- dst_destroy(dst);
- }
+ if (dst && rcuref_put(&dst->__refcnt))
+ dst_destroy(dst);
}
EXPORT_SYMBOL(dst_release_immediate);


but this is my thinking


Martin


> On 21 Sep 2023, at 10:50, Bagas Sanjaya <bagasdotme@gmail.com> wrote:
> 
> On 20/09/2023 14:32, Martin Zaharinov wrote:
>> I will make this yes .
>> 
>> And will wait if any find fix in future release.
>> 
> 
> Please don't top-post; reply inline with appropriate context instead.
> 
> Martin, what prevents you from doing bisection as Eric requested again?
> If you only have production systems, why can't you afford to have
> testing ones? Why not turning one of your prod machines to be testing
> and bisect from there?
> 
> Sorry for inconvenience.
> 
> -- 
> An old man doll... just what I always wanted! - Clara
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-21  8:13                               ` Martin Zaharinov
@ 2023-09-22  3:06                                 ` Bagas Sanjaya
  2023-09-22  9:50                                   ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 35+ messages in thread
From: Bagas Sanjaya @ 2023-09-22  3:06 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: Eric Dumazet, Paolo Abeni, netdev, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern,
	Florian Westphal, Pablo Neira Ayuso, Thorsten Leemhuis,
	Wangyang Guo, Arjan Van De Ven, Thomas Gleixner,
	Linux Regressions

[-- Attachment #1: Type: text/plain, Size: 3969 bytes --]

On Thu, Sep 21, 2023 at 11:13:55AM +0300, Martin Zaharinov wrote:
> Hi Bagas,
> 
> 
> Its not easy to make this on production, have too many users on it.
> 
> i make checks and find with kernel 6.3.12-6.5.13 all is fine.
> on first machine that i have with kernel 6.4 and still work run kernel 6.4.2 and have problem.
> 
> in my investigation problem is start after migration to kernel 6.4.x 
> 
> in 6.4 kernel is add rcuref : 
> 
> https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.4 
> 
> commit bc9d3a9f2afca189a6ae40225b6985e3c775375e
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date: Thu Mar 23 21:55:32 2023 +0100
> 
> net: dst: Switch to rcuref_t reference counting

Is it the culprit you look for? Had you done the bisection and it points
the culprit to that commit

> 
> Under high contention dst_entry::__refcnt becomes a significant bottleneck.
> 
> atomic_inc_not_zero() is implemented with a cmpxchg() loop, which goes into
> high retry rates on contention.
> 
> Switch the reference count to rcuref_t which results in a significant
> performance gain. Rename the reference count member to __rcuref to reflect
> the change.
> 
> The gain depends on the micro-architecture and the number of concurrent
> operations and has been measured in the range of +25% to +130% with a
> localhost memtier/memcached benchmark which amplifies the problem
> massively.
> 
> Running the memtier/memcached benchmark over a real (1Gb) network
> connection the conversion on top of the false sharing fix for struct
> dst_entry::__refcnt results in a total gain in the 2%-5% range over the
> upstream baseline.
> 
> Reported-by: Wangyang Guo <wangyang.guo@intel.com>
> Reported-by: Arjan Van De Ven <arjan.van.de.ven@intel.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Link: https://lore.kernel.org/r/20230307125538.989175656@linutronix.de
> Link: https://lore.kernel.org/r/20230323102800.215027837@linutronix.de
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> 
> 
> and i think problem is here : 
> 
> --- a/net/core/dst.c
> +++ b/net/core/dst.c
> @@ -66,7 +66,7 @@ void dst_init(struct dst_entry *dst, str
> dst->tclassid = 0;
> #endif
> dst->lwtstate = NULL;
> - atomic_set(&dst->__refcnt, initial_ref);
> + rcuref_init(&dst->__refcnt, initial_ref);
> dst->__use = 0;
> dst->lastuse = jiffies;
> dst->flags = flags;
> @@ -162,31 +162,15 @@ EXPORT_SYMBOL(dst_dev_put);
> 
> void dst_release(struct dst_entry *dst)
> {
> - if (dst) {
> - int newrefcnt;
> -
> - newrefcnt = atomic_dec_return(&dst->__refcnt);
> - if (WARN_ONCE(newrefcnt < 0, "dst_release underflow"))
> - net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
> - __func__, dst, newrefcnt);
> - if (!newrefcnt)
> - call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu);
> - }
> + if (dst && rcuref_put(&dst->__refcnt))
> + call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu);
> }
> EXPORT_SYMBOL(dst_release);
> 
> void dst_release_immediate(struct dst_entry *dst)
> {
> - if (dst) {
> - int newrefcnt;
> -
> - newrefcnt = atomic_dec_return(&dst->__refcnt);
> - if (WARN_ONCE(newrefcnt < 0, "dst_release_immediate underflow"))
> - net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
> - __func__, dst, newrefcnt);
> - if (!newrefcnt)
> - dst_destroy(dst);
> - }
> + if (dst && rcuref_put(&dst->__refcnt))
> + dst_destroy(dst);
> }
> EXPORT_SYMBOL(dst_release_immediate);
> 
> 
> but this is my thinking
> 

What do you think that above causes your regression?

Confused...

[To Thorsten: I'm unsure if the reporter do the bisection and suddenly he found
the culprit commit. Should I add it to regzbot? I had dealt with this reporter
before when he reported nginx regression and he didn't respond with bisection
to the point that I had to mark it as inconclusive (see regzbot dashboard).
What advice can you provide to him?]

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-22  3:06                                 ` Bagas Sanjaya
@ 2023-09-22  9:50                                   ` Linux regression tracking (Thorsten Leemhuis)
  2023-09-22 11:09                                     ` Bagas Sanjaya
  0 siblings, 1 reply; 35+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-09-22  9:50 UTC (permalink / raw)
  To: Bagas Sanjaya, Martin Zaharinov
  Cc: Eric Dumazet, Paolo Abeni, netdev, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern,
	Florian Westphal, Pablo Neira Ayuso, Wangyang Guo,
	Arjan Van De Ven, Thomas Gleixner, Linux Regressions

On 22.09.23 05:06, Bagas Sanjaya wrote:
> On Thu, Sep 21, 2023 at 11:13:55AM +0300, Martin Zaharinov wrote:
>>
>> Its not easy to make this on production, have too many users on it.
>>
>> i make checks and find with kernel 6.3.12-6.5.13 all is fine.
>> on first machine that i have with kernel 6.4 and still work run kernel 6.4.2 and have problem.

This is confusing and hard to follow. You want to describe more
carefully which kernels worked (avoid ranges, as I doubt you have tested
everything between 6.3.12-6.5.13) and try to avoid complexity (you seem
to have two machines? if everything works on one, don't even bring it up
except maybe as a side note)

>> in my investigation problem is start after migration to kernel 6.4.x 
>>
>> in 6.4 kernel is add rcuref : 
>>
>> https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.4 
>>
>> commit bc9d3a9f2afca189a6ae40225b6985e3c775375e
>> Author: Thomas Gleixner <tglx@linutronix.de>
>> Date: Thu Mar 23 21:55:32 2023 +0100
>>
>> net: dst: Switch to rcuref_t reference counting
> 
> Is it the culprit you look for? Had you done the bisection and it points
> the culprit to that commit

Martin, if you suspect this to be the culprit try to revert it on top of
the latest kernel; if the problem then goes away it likely is the cause.

> [...]
>> but this is my thinking
> 
> What do you think that above causes your regression?
> 
> Confused...
> 
> [To Thorsten: I'm unsure if the reporter do the bisection and suddenly he found
> the culprit commit. Should I add it to regzbot?

For now: no, things are too confusing and without knowing the culprit I
guess nobody will look into this unless we are extremely lucky.

Ciao, Thorsten




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-22  9:50                                   ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-09-22 11:09                                     ` Bagas Sanjaya
  0 siblings, 0 replies; 35+ messages in thread
From: Bagas Sanjaya @ 2023-09-22 11:09 UTC (permalink / raw)
  To: Linux regressions mailing list, Martin Zaharinov,
	Linux Kernel Mailing List
  Cc: Eric Dumazet, Paolo Abeni, netdev, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern,
	Florian Westphal, Pablo Neira Ayuso, Wangyang Guo,
	Arjan Van De Ven, Thomas Gleixner

On 22/09/2023 16:50, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 22.09.23 05:06, Bagas Sanjaya wrote:
>> [To Thorsten: I'm unsure if the reporter do the bisection and suddenly he found
>> the culprit commit. Should I add it to regzbot?
> 
> For now: no, things are too confusing and without knowing the culprit I
> guess nobody will look into this unless we are extremely lucky.
> 

OK, thanks!

-- 
An old man doll... just what I always wanted! - Clara


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-09-15  6:45 ` Eric Dumazet
  2023-09-15 22:23   ` Martin Zaharinov
@ 2023-11-16 14:17   ` Martin Zaharinov
  2023-12-06 22:26     ` Martin Zaharinov
  1 sibling, 1 reply; 35+ messages in thread
From: Martin Zaharinov @ 2023-11-16 14:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Paolo Abeni, patchwork-bot+netdevbpf, Jakub Kicinski,
	Stephen Hemminger, kuba+netdrv, dsahern

Hi All

report same problem with kernel 6.6.1 - i think problem is in rcu but … if have options to add people from RCU here.

See report : 



[141229.505339] ------------[ cut here ]------------
[141229.505492] rcuref - imbalanced put()
[141229.505504] WARNING: CPU: 8 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
[141229.505821] Modules linked in: xsk_diag unix_diag iptable_filter xt_TCPMSS iptable_mangle xt_addrtype xt_nat xt_MASQUERADE iptable_nat ip_tables netconsole coretemp e1000 ixgbe mdio pppoe pppox sha1_ssse3 sha1_generic ppp_mppe libarc4 ppp_generic slhc nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
[141229.506349] CPU: 8 PID: 0 Comm: swapper/8 Tainted: G           O       6.6.1 #1
[141229.506527] Hardware name: Persy Super Server/X11DDW-L, BIOS 4.0 07/11/2023
[141229.506701] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
[141229.506843] Code: 31 c0 eb e2 80 3d ef 4e e6 00 00 74 0a c7 03 00 00 00 e0 31 c0 eb cf 48 c7 c7 07 99 e3 97 c6 05 d5 4e e6 00 01 e8 d1 1f c7 ff <0f> 0b eb df cc cc cc cc cc cc cc cc cc cc cc cc cc 48 89 fa 83 e2
All code
========
   0:	31 c0                	xor    %eax,%eax
   2:	eb e2                	jmp    0xffffffffffffffe6
   4:	80 3d ef 4e e6 00 00 	cmpb   $0x0,0xe64eef(%rip)        # 0xe64efa
   b:	74 0a                	je     0x17
   d:	c7 03 00 00 00 e0    	movl   $0xe0000000,(%rbx)
  13:	31 c0                	xor    %eax,%eax
  15:	eb cf                	jmp    0xffffffffffffffe6
  17:	48 c7 c7 07 99 e3 97 	mov    $0xffffffff97e39907,%rdi
  1e:	c6 05 d5 4e e6 00 01 	movb   $0x1,0xe64ed5(%rip)        # 0xe64efa
  25:	e8 d1 1f c7 ff       	call   0xffffffffffc71ffb
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	eb df                	jmp    0xd
  2e:	cc                   	int3
  2f:	cc                   	int3
  30:	cc                   	int3
  31:	cc                   	int3
  32:	cc                   	int3
  33:	cc                   	int3
  34:	cc                   	int3
  35:	cc                   	int3
  36:	cc                   	int3
  37:	cc                   	int3
  38:	cc                   	int3
  39:	cc                   	int3
  3a:	cc                   	int3
  3b:	48 89 fa             	mov    %rdi,%rdx
  3e:	83                   	.byte 0x83
  3f:	e2                   	.byte 0xe2

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	eb df                	jmp    0xffffffffffffffe3
   4:	cc                   	int3
   5:	cc                   	int3
   6:	cc                   	int3
   7:	cc                   	int3
   8:	cc                   	int3
   9:	cc                   	int3
   a:	cc                   	int3
   b:	cc                   	int3
   c:	cc                   	int3
   d:	cc                   	int3
   e:	cc                   	int3
   f:	cc                   	int3
  10:	cc                   	int3
  11:	48 89 fa             	mov    %rdi,%rdx
  14:	83                   	.byte 0x83
  15:	e2                   	.byte 0xe2
[141229.507086] RSP: 0018:ffffa444449e0978 EFLAGS: 00010296
[141229.507229] RAX: 0000000000000019 RBX: ffff9b54866a4100 RCX: 00000000fff7ffff
[141229.507404] RDX: 00000000fff7ffff RSI: 0000000000000001 RDI: 00000000ffffffea
[141229.507577] RBP: ffff9b53e57b1ec0 R08: 0000000000000000 R09: 00000000fff7ffff
[141229.507751] R10: ffff9b62db200000 R11: 0000000000000003 R12: ffff9b5b0595c000
[141229.507929] R13: ffff9b5b09c32200 R14: ffff9b5b09e29a00 R15: ffff9b5b0557e080
[141229.508101] FS:  0000000000000000(0000) GS:ffff9b62dfa00000(0000) knlGS:0000000000000000
[141229.508279] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[141229.508425] CR2: 00007fbadced6a80 CR3: 000000096f014002 CR4: 00000000003706e0
[141229.508599] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[141229.508773] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[141229.508947] Call Trace:
[141229.509079]  <IRQ>
[141229.509206] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
[141229.509342] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[141229.509482] ? handle_bug (arch/x86/kernel/traps.c:237)
[141229.509617] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
[141229.509751] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
[141229.509892] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
[141229.510028] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
[141229.510164] dst_release (./arch/x86/include/asm/preempt.h:95 ./include/linux/rcuref.h:151 net/core/dst.c:166)
[141229.510302] __dev_queue_xmit (./include/net/dst.h:283 net/core/dev.c:4324)
[141229.510441] vlan_dev_hard_start_xmit (net/8021q/vlan_dev.c:130)
[141229.510584] dev_hard_start_xmit (./include/linux/netdevice.h:4904 net/core/dev.c:3573 net/core/dev.c:3589)
[141229.510722] __dev_queue_xmit (./include/linux/netdevice.h:3278 (discriminator 25) net/core/dev.c:4370 (discriminator 25))
[141229.510862] ? eth_header (net/ethernet/eth.c:85)
[141229.510998] ip_finish_output2 (./include/net/neighbour.h:542 (discriminator 2) net/ipv4/ip_output.c:233 (discriminator 2))
[141229.511135] ip_sabotage_in (net/bridge/br_netfilter_hooks.c:881 net/bridge/br_netfilter_hooks.c:866)
[141229.511269] nf_hook_slow (./include/linux/netfilter.h:144 net/netfilter/core.c:626)
[141229.511406] ip_rcv (./include/linux/netfilter.h:259 ./include/linux/netfilter.h:302 net/ipv4/ip_input.c:569)
[141229.511540] ? ip_rcv_core.constprop.0 (net/ipv4/ip_input.c:436)
[141229.511678] netif_receive_skb (net/core/dev.c:5552 net/core/dev.c:5666 net/core/dev.c:5752 net/core/dev.c:5811)
[141229.511814] br_handle_frame_finish (net/bridge/br_input.c:216)
[141229.511954] ? br_pass_frame_up (net/bridge/br_input.c:75)
[141229.512092] br_nf_hook_thresh (net/bridge/br_netfilter_hooks.c:1051)
[141229.512227] ? br_pass_frame_up (net/bridge/br_input.c:75)
[141229.512363] br_nf_pre_routing_finish (net/bridge/br_netfilter_hooks.c:427)
[141229.512501] ? br_pass_frame_up (net/bridge/br_input.c:75)
[141229.512644] ? nf_nat_ipv4_pre_routing (net/netfilter/nf_nat_proto.c:656) nf_nat
[141229.512792] br_nf_pre_routing (net/bridge/br_netfilter_hooks.c:538)
[141229.512928] ? br_nf_hook_thresh (net/bridge/br_netfilter_hooks.c:354)
[141229.513061] br_handle_frame (./include/linux/netfilter.h:144 net/bridge/br_input.c:272 net/bridge/br_input.c:417)
[141229.513196] ? br_pass_frame_up (net/bridge/br_input.c:75)
[141229.513333] __netif_receive_skb_core.constprop.0 (net/core/dev.c:5446 (discriminator 1))
[141229.513475] ? ip_finish_output2 (net/ipv4/ip_output.c:243)
[141229.513613] process_backlog (net/core/dev.c:5551 net/core/dev.c:5666 net/core/dev.c:5994)
[141229.513749] __napi_poll (net/core/dev.c:6556)
[141229.513887] net_rx_action (net/core/dev.c:6625 net/core/dev.c:6756)
[141229.514023] __do_softirq (./arch/x86/include/asm/preempt.h:27 kernel/softirq.c:564)
[141229.514158] do_softirq (kernel/softirq.c:463 (discriminator 32) kernel/softirq.c:450 (discriminator 32))
[141229.514292]  </IRQ>
[141229.514420]  <TASK>
[141229.514548] flush_smp_call_function_queue (./arch/x86/include/asm/irqflags.h:134 (discriminator 1) kernel/smp.c:579 (discriminator 1))
[141229.514688] do_idle (kernel/sched/idle.c:314)
[141229.514822] cpu_startup_entry (kernel/sched/idle.c:379)
[141229.516148] start_secondary (arch/x86/kernel/smpboot.c:326)
[141229.516291] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
[141229.516435]  </TASK>
[141229.516562] ---[ end trace 0000000000000000 ]—


Best regards,
Martin



> On 15 Sep 2023, at 9:45, Eric Dumazet <edumazet@google.com> wrote:
> 
> scripts/decode_stacktrace.sh



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-11-16 14:17   ` Martin Zaharinov
@ 2023-12-06 22:26     ` Martin Zaharinov
       [not found]       ` <5E63894D-913B-416C-B901-F628BB6C00E0@gmail.com>
  0 siblings, 1 reply; 35+ messages in thread
From: Martin Zaharinov @ 2023-12-06 22:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Paolo Abeni, patchwork-bot+netdevbpf, Jakub Kicinski,
	Stephen Hemminger, kuba+netdrv, dsahern

Hi all


its strange same problem is go on 6.6.4 same same debug log

diff hardware , users number and ….

in debug log is same : lib/rcuref.c 

in this line is : 


        /*
         * If the reference count was already in the dead zone, then this
         * put() operation is imbalanced. Warn, put the reference count back to
         * DEAD and tell the caller to not deconstruct the object.
         */
        if (WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()")) {
                atomic_set(&ref->refcnt, RCUREF_DEAD);
                return false;
        }


[529520.875413] CPU: 13 PID: 0 Comm: swapper/13 Tainted: G           O       6.6.3 #1
[529520.875533] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.3 10/28/2020
[529520.875653] RIP: 0010:rcuref_put_slowpath+0x5f/0x70
[529520.875748] Code: 31 c0 eb e2 80 3d 9e d1 e6 00 00 74 0a c7 03 00 00 00 e0 31 c0 eb cf 48 c7 c7 d9 96 e3 8f c6 05 84 d1 e6 00 01 e8 41 9d c7 ff <0f> 0b eb df cc cc cc cc cc cc cc cc cc cc cc cc cc 48 89 fa 83 e2
[529520.875908] RSP: 0018:ffffa823c052cde8 EFLAGS: 00010296
[529520.876003] RAX: 0000000000000019 RBX: ffffa0f049053180 RCX: 00000000fff7ffff
[529520.876122] RDX: 00000000fff7ffff RSI: 0000000000000001 RDI: 00000000ffffffea
[529520.876244] RBP: ffffa0f0a8fffec0 R08: 0000000000000000 R09: 00000000fff7ffff
[529520.876364] R10: ffffa0f79ae00000 R11: 0000000000000003 R12: ffffa0f04655f000
[529520.876482] R13: 0000000000000258 R14: ffffa0f16ade1000 R15: ffffa0f79f964bd0
[529520.876601] FS:  0000000000000000(0000) GS:ffffa0f79f940000(0000) knlGS:0000000000000000
[529520.876723] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[529520.876822] CR2: 00007fa9bd56b3c8 CR3: 000000016e43e002 CR4: 00000000003706e0
[529520.877043] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[529520.877164] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[529520.877287] Call Trace:
[529520.877382]  <IRQ>
[529520.877472]  ? __warn+0x6c/0x130
[529520.877566]  ? report_bug+0x1b8/0x200
[529520.877661]  ? handle_bug+0x36/0x70
[529520.877753]  ? exc_invalid_op+0x17/0x1a0
[529520.877849]  ? asm_exc_invalid_op+0x16/0x20
[529520.877947]  ? rcuref_put_slowpath+0x5f/0x70
[529520.878043]  ? rcuref_put_slowpath+0x5f/0x70
[529520.878136]  dst_release+0x1c/0x40
[529520.878229]  __dev_queue_xmit+0x594/0xcd0
[529520.878324]  ? eth_header+0x25/0xc0
[529520.878417]  ip_finish_output2+0x1a0/0x530
[529520.878514]  process_backlog+0x107/0x210
[529520.878610]  __napi_poll+0x20/0x180
[529520.878702]  net_rx_action+0x29f/0x380
[529520.878935]  __do_softirq+0xd0/0x202
[529520.879033]  do_softirq+0x3a/0x50
[529520.879127]  </IRQ>
[529520.879217]  <TASK>
[529520.879306]  flush_smp_call_function_queue+0x3f/0x50
[529520.879407]  do_idle+0x14d/0x210
[529520.879500]  cpu_startup_entry+0x21/0x30
[529520.879597]  start_secondary+0xe1/0xf0
[529520.879693]  secondary_startup_64_no_verify+0x166/0x16b
[529520.879793]  </TASK>
[529520.879884] ---[ end trace 0000000000000000 ]—


m.

> On 16 Nov 2023, at 16:17, Martin Zaharinov <micron10@gmail.com> wrote:
> 
> Hi All
> 
> report same problem with kernel 6.6.1 - i think problem is in rcu but … if have options to add people from RCU here.
> 
> See report : 
> 
> 
> 
> [141229.505339] ------------[ cut here ]------------
> [141229.505492] rcuref - imbalanced put()
> [141229.505504] WARNING: CPU: 8 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> [141229.505821] Modules linked in: xsk_diag unix_diag iptable_filter xt_TCPMSS iptable_mangle xt_addrtype xt_nat xt_MASQUERADE iptable_nat ip_tables netconsole coretemp e1000 ixgbe mdio pppoe pppox sha1_ssse3 sha1_generic ppp_mppe libarc4 ppp_generic slhc nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
> [141229.506349] CPU: 8 PID: 0 Comm: swapper/8 Tainted: G           O       6.6.1 #1
> [141229.506527] Hardware name: Persy Super Server/X11DDW-L, BIOS 4.0 07/11/2023
> [141229.506701] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> [141229.506843] Code: 31 c0 eb e2 80 3d ef 4e e6 00 00 74 0a c7 03 00 00 00 e0 31 c0 eb cf 48 c7 c7 07 99 e3 97 c6 05 d5 4e e6 00 01 e8 d1 1f c7 ff <0f> 0b eb df cc cc cc cc cc cc cc cc cc cc cc cc cc 48 89 fa 83 e2
> All code
> ========
>   0: 31 c0                 xor    %eax,%eax
>   2: eb e2                 jmp    0xffffffffffffffe6
>   4: 80 3d ef 4e e6 00 00 cmpb   $0x0,0xe64eef(%rip)        # 0xe64efa
>   b: 74 0a                 je     0x17
>   d: c7 03 00 00 00 e0     movl   $0xe0000000,(%rbx)
>  13: 31 c0                 xor    %eax,%eax
>  15: eb cf                 jmp    0xffffffffffffffe6
>  17: 48 c7 c7 07 99 e3 97 mov    $0xffffffff97e39907,%rdi
>  1e: c6 05 d5 4e e6 00 01 movb   $0x1,0xe64ed5(%rip)        # 0xe64efa
>  25: e8 d1 1f c7 ff        call   0xffffffffffc71ffb
>  2a:* 0f 0b                 ud2     <-- trapping instruction
>  2c: eb df                 jmp    0xd
>  2e: cc                    int3
>  2f: cc                    int3
>  30: cc                    int3
>  31: cc                    int3
>  32: cc                    int3
>  33: cc                    int3
>  34: cc                    int3
>  35: cc                    int3
>  36: cc                    int3
>  37: cc                    int3
>  38: cc                    int3
>  39: cc                    int3
>  3a: cc                    int3
>  3b: 48 89 fa              mov    %rdi,%rdx
>  3e: 83                    .byte 0x83
>  3f: e2                    .byte 0xe2
> 
> Code starting with the faulting instruction
> ===========================================
>   0: 0f 0b                 ud2
>   2: eb df                 jmp    0xffffffffffffffe3
>   4: cc                    int3
>   5: cc                    int3
>   6: cc                    int3
>   7: cc                    int3
>   8: cc                    int3
>   9: cc                    int3
>   a: cc                    int3
>   b: cc                    int3
>   c: cc                    int3
>   d: cc                    int3
>   e: cc                    int3
>   f: cc                    int3
>  10: cc                    int3
>  11: 48 89 fa              mov    %rdi,%rdx
>  14: 83                    .byte 0x83
>  15: e2                    .byte 0xe2
> [141229.507086] RSP: 0018:ffffa444449e0978 EFLAGS: 00010296
> [141229.507229] RAX: 0000000000000019 RBX: ffff9b54866a4100 RCX: 00000000fff7ffff
> [141229.507404] RDX: 00000000fff7ffff RSI: 0000000000000001 RDI: 00000000ffffffea
> [141229.507577] RBP: ffff9b53e57b1ec0 R08: 0000000000000000 R09: 00000000fff7ffff
> [141229.507751] R10: ffff9b62db200000 R11: 0000000000000003 R12: ffff9b5b0595c000
> [141229.507929] R13: ffff9b5b09c32200 R14: ffff9b5b09e29a00 R15: ffff9b5b0557e080
> [141229.508101] FS:  0000000000000000(0000) GS:ffff9b62dfa00000(0000) knlGS:0000000000000000
> [141229.508279] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [141229.508425] CR2: 00007fbadced6a80 CR3: 000000096f014002 CR4: 00000000003706e0
> [141229.508599] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [141229.508773] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [141229.508947] Call Trace:
> [141229.509079]  <IRQ>
> [141229.509206] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
> [141229.509342] ? report_bug (lib/bug.c:180 lib/bug.c:219)
> [141229.509482] ? handle_bug (arch/x86/kernel/traps.c:237)
> [141229.509617] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
> [141229.509751] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
> [141229.509892] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> [141229.510028] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> [141229.510164] dst_release (./arch/x86/include/asm/preempt.h:95 ./include/linux/rcuref.h:151 net/core/dst.c:166)
> [141229.510302] __dev_queue_xmit (./include/net/dst.h:283 net/core/dev.c:4324)
> [141229.510441] vlan_dev_hard_start_xmit (net/8021q/vlan_dev.c:130)
> [141229.510584] dev_hard_start_xmit (./include/linux/netdevice.h:4904 net/core/dev.c:3573 net/core/dev.c:3589)
> [141229.510722] __dev_queue_xmit (./include/linux/netdevice.h:3278 (discriminator 25) net/core/dev.c:4370 (discriminator 25))
> [141229.510862] ? eth_header (net/ethernet/eth.c:85)
> [141229.510998] ip_finish_output2 (./include/net/neighbour.h:542 (discriminator 2) net/ipv4/ip_output.c:233 (discriminator 2))
> [141229.511135] ip_sabotage_in (net/bridge/br_netfilter_hooks.c:881 net/bridge/br_netfilter_hooks.c:866)
> [141229.511269] nf_hook_slow (./include/linux/netfilter.h:144 net/netfilter/core.c:626)
> [141229.511406] ip_rcv (./include/linux/netfilter.h:259 ./include/linux/netfilter.h:302 net/ipv4/ip_input.c:569)
> [141229.511540] ? ip_rcv_core.constprop.0 (net/ipv4/ip_input.c:436)
> [141229.511678] netif_receive_skb (net/core/dev.c:5552 net/core/dev.c:5666 net/core/dev.c:5752 net/core/dev.c:5811)
> [141229.511814] br_handle_frame_finish (net/bridge/br_input.c:216)
> [141229.511954] ? br_pass_frame_up (net/bridge/br_input.c:75)
> [141229.512092] br_nf_hook_thresh (net/bridge/br_netfilter_hooks.c:1051)
> [141229.512227] ? br_pass_frame_up (net/bridge/br_input.c:75)
> [141229.512363] br_nf_pre_routing_finish (net/bridge/br_netfilter_hooks.c:427)
> [141229.512501] ? br_pass_frame_up (net/bridge/br_input.c:75)
> [141229.512644] ? nf_nat_ipv4_pre_routing (net/netfilter/nf_nat_proto.c:656) nf_nat
> [141229.512792] br_nf_pre_routing (net/bridge/br_netfilter_hooks.c:538)
> [141229.512928] ? br_nf_hook_thresh (net/bridge/br_netfilter_hooks.c:354)
> [141229.513061] br_handle_frame (./include/linux/netfilter.h:144 net/bridge/br_input.c:272 net/bridge/br_input.c:417)
> [141229.513196] ? br_pass_frame_up (net/bridge/br_input.c:75)
> [141229.513333] __netif_receive_skb_core.constprop.0 (net/core/dev.c:5446 (discriminator 1))
> [141229.513475] ? ip_finish_output2 (net/ipv4/ip_output.c:243)
> [141229.513613] process_backlog (net/core/dev.c:5551 net/core/dev.c:5666 net/core/dev.c:5994)
> [141229.513749] __napi_poll (net/core/dev.c:6556)
> [141229.513887] net_rx_action (net/core/dev.c:6625 net/core/dev.c:6756)
> [141229.514023] __do_softirq (./arch/x86/include/asm/preempt.h:27 kernel/softirq.c:564)
> [141229.514158] do_softirq (kernel/softirq.c:463 (discriminator 32) kernel/softirq.c:450 (discriminator 32))
> [141229.514292]  </IRQ>
> [141229.514420]  <TASK>
> [141229.514548] flush_smp_call_function_queue (./arch/x86/include/asm/irqflags.h:134 (discriminator 1) kernel/smp.c:579 (discriminator 1))
> [141229.514688] do_idle (kernel/sched/idle.c:314)
> [141229.514822] cpu_startup_entry (kernel/sched/idle.c:379)
> [141229.516148] start_secondary (arch/x86/kernel/smpboot.c:326)
> [141229.516291] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
> [141229.516435]  </TASK>
> [141229.516562] ---[ end trace 0000000000000000 ]—
> 
> 
> Best regards,
> Martin
> 
> 
> 
>> On 15 Sep 2023, at 9:45, Eric Dumazet <edumazet@google.com> wrote:
>> 
>> scripts/decode_stacktrace.sh
> 
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
       [not found]       ` <5E63894D-913B-416C-B901-F628BB6C00E0@gmail.com>
@ 2023-12-08 22:20         ` Thomas Gleixner
  2023-12-08 23:01           ` Martin Zaharinov
  0 siblings, 1 reply; 35+ messages in thread
From: Thomas Gleixner @ 2023-12-08 22:20 UTC (permalink / raw)
  To: Martin Zaharinov, peterz
  Cc: netdev, Paolo Abeni, patchwork-bot+netdevbpf, Jakub Kicinski,
	Stephen Hemminger, kuba+netdrv, dsahern, Eric Dumazet

On Thu, Dec 07 2023 at 00:38, Martin Zaharinov wrote:
>> On 7 Dec 2023, at 0:26, Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> in this line is : 
>> 
>> 
>>        /*
>>         * If the reference count was already in the dead zone, then this
>>         * put() operation is imbalanced. Warn, put the reference count back to
>>         * DEAD and tell the caller to not deconstruct the object.
>>         */
>>        if (WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()")) {
>>                atomic_set(&ref->refcnt, RCUREF_DEAD);
>>                return false;
>>        }

So a rcuref_put() operation triggers the warning because the reference
count is already dead, which means the rcuref_put() operation is
imbalanced.

>> [529520.875413] CPU: 13 PID: 0 Comm: swapper/13 Tainted: G           O       6.6.3 #1

Can you reproduce this without the Out of Tree module?

>> [529520.875653] RIP: 0010:rcuref_put_slowpath+0x5f/0x70
>> [529520.878136]  dst_release+0x1c/0x40
>> [529520.878229]  __dev_queue_xmit+0x594/0xcd0
>> [529520.878324]  ? eth_header+0x25/0xc0
>> [529520.878417]  ip_finish_output2+0x1a0/0x530
>> [529520.878514]  process_backlog+0x107/0x210
>> [529520.878610]  __napi_poll+0x20/0x180
>> [529520.878702]  net_rx_action+0x29f/0x380
>> [529520.878935]  __do_softirq+0xd0/0x202
>> [529520.879033]  do_softirq+0x3a/0x50

So this is one call chain triggering the issue...

>>> report same problem with kernel 6.6.1 - i think problem is in rcu
>>> but … if have options to add people from RCU here.

That's definitely not a RCU problem. It's a simple refcount fail.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-12-08 22:20         ` Thomas Gleixner
@ 2023-12-08 23:01           ` Martin Zaharinov
  2023-12-12 18:16             ` Thomas Gleixner
  0 siblings, 1 reply; 35+ messages in thread
From: Martin Zaharinov @ 2023-12-08 23:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: peterz, netdev, Paolo Abeni, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern,
	Eric Dumazet

Hi Thomas,



> On 9 Dec 2023, at 0:20, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Thu, Dec 07 2023 at 00:38, Martin Zaharinov wrote:
>>> On 7 Dec 2023, at 0:26, Martin Zaharinov <micron10@gmail.com> wrote:
>>> 
>>> in this line is : 
>>> 
>>> 
>>>       /*
>>>        * If the reference count was already in the dead zone, then this
>>>        * put() operation is imbalanced. Warn, put the reference count back to
>>>        * DEAD and tell the caller to not deconstruct the object.
>>>        */
>>>       if (WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()")) {
>>>               atomic_set(&ref->refcnt, RCUREF_DEAD);
>>>               return false;
>>>       }
> 
> So a rcuref_put() operation triggers the warning because the reference
> count is already dead, which means the rcuref_put() operation is
> imbalanced.
> 
>>> [529520.875413] CPU: 13 PID: 0 Comm: swapper/13 Tainted: G           O       6.6.3 #1
> 
> Can you reproduce this without the Out of Tree module?
Same error without Out of Tree modules. i try many time from kernel 6.5.x to now.

> 
>>> [529520.875653] RIP: 0010:rcuref_put_slowpath+0x5f/0x70
>>> [529520.878136]  dst_release+0x1c/0x40
>>> [529520.878229]  __dev_queue_xmit+0x594/0xcd0
>>> [529520.878324]  ? eth_header+0x25/0xc0
>>> [529520.878417]  ip_finish_output2+0x1a0/0x530
>>> [529520.878514]  process_backlog+0x107/0x210
>>> [529520.878610]  __napi_poll+0x20/0x180
>>> [529520.878702]  net_rx_action+0x29f/0x380
>>> [529520.878935]  __do_softirq+0xd0/0x202
>>> [529520.879033]  do_softirq+0x3a/0x50
> 
> So this is one call chain triggering the issue...
> 
>>>> report same problem with kernel 6.6.1 - i think problem is in rcu
>>>> but … if have options to add people from RCU here.
> 
> That's definitely not a RCU problem. It's a simple refcount fail.
> 
> Thanks,
> 
>        tglx
> 

Is this a problem or only simple fail , and is it possible to catch what is a problem and fix this fail.

m.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-12-08 23:01           ` Martin Zaharinov
@ 2023-12-12 18:16             ` Thomas Gleixner
  2023-12-19  9:25               ` Martin Zaharinov
  0 siblings, 1 reply; 35+ messages in thread
From: Thomas Gleixner @ 2023-12-12 18:16 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: peterz, netdev, Paolo Abeni, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern,
	Eric Dumazet

Martin!

On Sat, Dec 09 2023 at 01:01, Martin Zaharinov wrote:
>> On 9 Dec 2023, at 0:20, Thomas Gleixner <tglx@linutronix.de> wrote:
>> That's definitely not a RCU problem. It's a simple refcount fail.
>> 
> Is this a problem or only simple fail , and is it possible to catch
> what is a problem and fix this fail.

Underaccounting a reference count is potentially Use After Free.

    if (rcuref_put(ref))
       call_rcu(ref....);

So after the grace period is over @ref will be freed. Depending on the
timing the context which does the extra put() might already operate on a
freed object.

How to catch that, that's a good question. There is no instrumentation
so far for this. Below is a straight forward trace_printk() based
tracking of rcurefs, which should help to narrow down the context.

Btw, how easy is this to reproduce?

Thanks,

        tglx
---
--- a/include/linux/rcuref.h
+++ b/include/linux/rcuref.h
@@ -64,8 +64,10 @@ static inline __must_check bool rcuref_g
 	 * Unconditionally increase the reference count. The saturation and
 	 * dead zones provide enough tolerance for this.
 	 */
-	if (likely(!atomic_add_negative_relaxed(1, &ref->refcnt)))
+	if (likely(!atomic_add_negative_relaxed(1, &ref->refcnt))) {
+		trace_printk("get(FASTPATH): %px\n", ref);
 		return true;
+	}
 
 	/* Handle the cases inside the saturation and dead zones */
 	return rcuref_get_slowpath(ref);
@@ -84,8 +86,10 @@ static __always_inline __must_check bool
 	 * Unconditionally decrease the reference count. The saturation and
 	 * dead zones provide enough tolerance for this.
 	 */
-	if (likely(!atomic_add_negative_release(-1, &ref->refcnt)))
+	if (likely(!atomic_add_negative_release(-1, &ref->refcnt))) {
+		trace_printk("put(FASTPATH): %px\n", ref);
 		return false;
+	}
 
 	/*
 	 * Handle the last reference drop and cases inside the saturation
--- a/lib/rcuref.c
+++ b/lib/rcuref.c
@@ -200,6 +200,7 @@ bool rcuref_get_slowpath(rcuref_t *ref)
 	 */
 	if (cnt >= RCUREF_RELEASED) {
 		atomic_set(&ref->refcnt, RCUREF_DEAD);
+		trace_printk("get(DEAD): %px %pS\n", ref, __builtin_return_address(0));
 		return false;
 	}
 
@@ -211,8 +212,15 @@ bool rcuref_get_slowpath(rcuref_t *ref)
 	 * object memory, but prevents the obvious reference count overflow
 	 * damage.
 	 */
-	if (WARN_ONCE(cnt > RCUREF_MAXREF, "rcuref saturated - leaking memory"))
+	if (cnt > RCUREF_MAXREF) {
+		trace_printk("get(SATURATED): %px %pS\n", ref, __builtin_return_address(0));
+		WARN_ONCE(1, "rcuref saturated - leaking memory");
 		atomic_set(&ref->refcnt, RCUREF_SATURATED);
+	} else {
+		trace_printk("get(UNDEFINED): %px %pS\n", ref, __builtin_return_address(0));
+		WARN_ON_ONCE(1);
+	}
+
 	return true;
 }
 EXPORT_SYMBOL_GPL(rcuref_get_slowpath);
@@ -248,9 +256,12 @@ bool rcuref_put_slowpath(rcuref_t *ref)
 		 * require a retry. If this fails the caller is not
 		 * allowed to deconstruct the object.
 		 */
-		if (!atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD))
+		if (!atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD)) {
+			trace_printk("put(NOTDEAD): %px %pS\n", ref, __builtin_return_address(0));
 			return false;
+		}
 
+		trace_printk("put(NOWDEAD): %px %pS\n", ref, __builtin_return_address(0));
 		/*
 		 * The caller can safely schedule the object for
 		 * deconstruction. Provide acquire ordering.
@@ -264,7 +275,9 @@ bool rcuref_put_slowpath(rcuref_t *ref)
 	 * put() operation is imbalanced. Warn, put the reference count back to
 	 * DEAD and tell the caller to not deconstruct the object.
 	 */
-	if (WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()")) {
+	if (cnt >= RCUREF_RELEASED) {
+		trace_printk("put(WASDEAD): %px %pS\n", ref, __builtin_return_address(0));
+		WARN_ONCE(1, "rcuref - imbalanced put()");
 		atomic_set(&ref->refcnt, RCUREF_DEAD);
 		return false;
 	}
@@ -274,8 +287,13 @@ bool rcuref_put_slowpath(rcuref_t *ref)
 	 * mean saturation value and tell the caller to not deconstruct the
 	 * object.
 	 */
-	if (cnt > RCUREF_MAXREF)
+	if (cnt > RCUREF_MAXREF) {
+		trace_printk("put(SATURATED): %px %pS\n", ref, __builtin_return_address(0));
 		atomic_set(&ref->refcnt, RCUREF_SATURATED);
+	} else {
+		trace_printk("put(UNDEFINED): %px %pS\n", ref, __builtin_return_address(0));
+		WARN_ON_ONCE(1);
+	}
 	return false;
 }
 EXPORT_SYMBOL_GPL(rcuref_put_slowpath);


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-12-12 18:16             ` Thomas Gleixner
@ 2023-12-19  9:25               ` Martin Zaharinov
  2023-12-19 14:26                 ` Thomas Gleixner
  0 siblings, 1 reply; 35+ messages in thread
From: Martin Zaharinov @ 2023-12-19  9:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: peterz, netdev, Paolo Abeni, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern,
	Eric Dumazet

Hi Thomas,
Thanks for your response!



> On 12 Dec 2023, at 20:16, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> Martin!
> 
> On Sat, Dec 09 2023 at 01:01, Martin Zaharinov wrote:
>>> On 9 Dec 2023, at 0:20, Thomas Gleixner <tglx@linutronix.de> wrote:
>>> That's definitely not a RCU problem. It's a simple refcount fail.
>>> 
>> Is this a problem or only simple fail , and is it possible to catch
>> what is a problem and fix this fail.
> 
> Underaccounting a reference count is potentially Use After Free.
> 
>    if (rcuref_put(ref))
>       call_rcu(ref....);
> 
> So after the grace period is over @ref will be freed. Depending on the
> timing the context which does the extra put() might already operate on a
> freed object.
> 
> How to catch that, that's a good question. There is no instrumentation
> so far for this. Below is a straight forward trace_printk() based
> tracking of rcurefs, which should help to narrow down the context.
> 
> Btw, how easy is this to reproduce?

Its not easy this report is generate on machine with 5-6k users , with traffic and one time is show on 1 day , other show after 4-5 days…




> 
> Thanks,
> 
>        tglx
> ---
> --- a/include/linux/rcuref.h
> +++ b/include/linux/rcuref.h
> @@ -64,8 +64,10 @@ static inline __must_check bool rcuref_g
> * Unconditionally increase the reference count. The saturation and
> * dead zones provide enough tolerance for this.
> */
> - if (likely(!atomic_add_negative_relaxed(1, &ref->refcnt)))
> + if (likely(!atomic_add_negative_relaxed(1, &ref->refcnt))) {
> + trace_printk("get(FASTPATH): %px\n", ref);
> return true;
> + }
> 
> /* Handle the cases inside the saturation and dead zones */
> return rcuref_get_slowpath(ref);
> @@ -84,8 +86,10 @@ static __always_inline __must_check bool
> * Unconditionally decrease the reference count. The saturation and
> * dead zones provide enough tolerance for this.
> */
> - if (likely(!atomic_add_negative_release(-1, &ref->refcnt)))
> + if (likely(!atomic_add_negative_release(-1, &ref->refcnt))) {
> + trace_printk("put(FASTPATH): %px\n", ref);
> return false;
> + }
> 
> /*
> * Handle the last reference drop and cases inside the saturation
> --- a/lib/rcuref.c
> +++ b/lib/rcuref.c
> @@ -200,6 +200,7 @@ bool rcuref_get_slowpath(rcuref_t *ref)
> */
> if (cnt >= RCUREF_RELEASED) {
> atomic_set(&ref->refcnt, RCUREF_DEAD);
> + trace_printk("get(DEAD): %px %pS\n", ref, __builtin_return_address(0));
> return false;
> }
> 
> @@ -211,8 +212,15 @@ bool rcuref_get_slowpath(rcuref_t *ref)
> * object memory, but prevents the obvious reference count overflow
> * damage.
> */
> - if (WARN_ONCE(cnt > RCUREF_MAXREF, "rcuref saturated - leaking memory"))
> + if (cnt > RCUREF_MAXREF) {
> + trace_printk("get(SATURATED): %px %pS\n", ref, __builtin_return_address(0));
> + WARN_ONCE(1, "rcuref saturated - leaking memory");
> atomic_set(&ref->refcnt, RCUREF_SATURATED);
> + } else {
> + trace_printk("get(UNDEFINED): %px %pS\n", ref, __builtin_return_address(0));
> + WARN_ON_ONCE(1);
> + }
> +
> return true;
> }
> EXPORT_SYMBOL_GPL(rcuref_get_slowpath);
> @@ -248,9 +256,12 @@ bool rcuref_put_slowpath(rcuref_t *ref)
> * require a retry. If this fails the caller is not
> * allowed to deconstruct the object.
> */
> - if (!atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD))
> + if (!atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD)) {
> + trace_printk("put(NOTDEAD): %px %pS\n", ref, __builtin_return_address(0));
> return false;
> + }
> 
> + trace_printk("put(NOWDEAD): %px %pS\n", ref, __builtin_return_address(0));
> /*
> * The caller can safely schedule the object for
> * deconstruction. Provide acquire ordering.
> @@ -264,7 +275,9 @@ bool rcuref_put_slowpath(rcuref_t *ref)
> * put() operation is imbalanced. Warn, put the reference count back to
> * DEAD and tell the caller to not deconstruct the object.
> */
> - if (WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()")) {
> + if (cnt >= RCUREF_RELEASED) {
> + trace_printk("put(WASDEAD): %px %pS\n", ref, __builtin_return_address(0));
> + WARN_ONCE(1, "rcuref - imbalanced put()");
> atomic_set(&ref->refcnt, RCUREF_DEAD);
> return false;
> }
> @@ -274,8 +287,13 @@ bool rcuref_put_slowpath(rcuref_t *ref)
> * mean saturation value and tell the caller to not deconstruct the
> * object.
> */
> - if (cnt > RCUREF_MAXREF)
> + if (cnt > RCUREF_MAXREF) {
> + trace_printk("put(SATURATED): %px %pS\n", ref, __builtin_return_address(0));
> atomic_set(&ref->refcnt, RCUREF_SATURATED);
> + } else {
> + trace_printk("put(UNDEFINED): %px %pS\n", ref, __builtin_return_address(0));
> + WARN_ON_ONCE(1);
> + }
> return false;
> }
> EXPORT_SYMBOL_GPL(rcuref_put_slowpath);

Apply this patch and will upload image on one machine as fast as possible and when get any reports will send you.

Best regards,
Martin


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-12-19  9:25               ` Martin Zaharinov
@ 2023-12-19 14:26                 ` Thomas Gleixner
  2023-12-22 17:26                   ` Martin Zaharinov
  0 siblings, 1 reply; 35+ messages in thread
From: Thomas Gleixner @ 2023-12-19 14:26 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: peterz, netdev, Paolo Abeni, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern,
	Eric Dumazet

On Tue, Dec 19 2023 at 11:25, Martin Zaharinov wrote:
>> On 12 Dec 2023, at 20:16, Thomas Gleixner <tglx@linutronix.de> wrote:
>> Btw, how easy is this to reproduce?
>
> Its not easy this report is generate on machine with 5-6k users , with
> traffic and one time is show on 1 day , other show after 4-5 days…

I love those bugs ...

> Apply this patch and will upload image on one machine as fast as
> possible and when get any reports will send you.

Let's see how that goes!

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-12-19 14:26                 ` Thomas Gleixner
@ 2023-12-22 17:26                   ` Martin Zaharinov
  2023-12-29 12:00                     ` Martin Zaharinov
  0 siblings, 1 reply; 35+ messages in thread
From: Martin Zaharinov @ 2023-12-22 17:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: peterz, netdev, Paolo Abeni, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern,
	Eric Dumazet

Hi Thomas,

 this is with applyed patch from you.
See logs


[43040.198064] ------------[ cut here ]------------
[43040.198407] WARNING: CPU: 47 PID: 0 at lib/rcuref.c:294 rcuref_put_slowpath+0x2f/0x70
[43040.198685] Modules linked in: pppoe pppox ppp_generic slhc nft_limit nft_ct nft_nat nft_chain_nat nf_tables netconsole tg3 igb i2c_algo_bit e1000e bnxt_en mlx5_core mlxfw mlx4_en mlx4_core i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_devintf ipmi_msghandler rtc_cmos
[43040.199478] CPU: 47 PID: 0 Comm: swapper/47 Tainted: G           O       6.6.8 #1
[43040.199660] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[43040.199886] RIP: 0010:rcuref_put_slowpath+0x2f/0x70
[43040.200028] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4e e3 00
[43040.200387] RSP: 0018:ffffa39d83e88c30 EFLAGS: 00010246
[43040.200528] RAX: 0000000000000000 RBX: ffff9c58e966b840 RCX: ffff9c5bc4e35680
[43040.200700] RDX: ffff9c5fafde4f08 RSI: 00000000fffffe01 RDI: ffff9c58e966b840
[43040.200871] RBP: ffff9c5fafde4f08 R08: ffff9c5fafde4f08 R09: 0000000000000001
[43040.201044] R10: 00000000000286e0 R11: 0000000000000001 R12: ffff9c58e966b800
[43040.201255] R13: ffff9c58e966b868 R14: ffff9c5fafde4f08 R15: 000000008f5de42b
[43040.201439] FS:  0000000000000000(0000) GS:ffff9c5fafdc0000(0000) knlGS:0000000000000000
[43040.201642] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43040.201799] CR2: 00007f1401217714 CR3: 0000000464b94003 CR4: 00000000001706e0
[43040.201994] Call Trace:
[43040.202095]  <IRQ>
[43040.202187]  ? __warn+0x6c/0x130
[43040.202301]  ? report_bug+0x1b8/0x200
[43040.202418]  ? handle_bug+0x36/0x70
[43040.202534]  ? exc_invalid_op+0x17/0x1a0
[43040.202652]  ? asm_exc_invalid_op+0x16/0x20
[43040.202781]  ? rcuref_put_slowpath+0x2f/0x70
[43040.202909]  dst_release+0x1c/0x40
[43040.203026]  rt_cache_route+0xbd/0xf0
[43040.203143]  rt_set_nexthop.isra.0+0x1b6/0x450
[43040.203272]  ip_route_input_slow+0x5d9/0xcc0
[43040.203401]  ? nf_conntrack_udp_packet+0x17c/0x240 [nf_conntrack]
[43040.203581]  ip_route_input_noref+0xe0/0xf0
[43040.203704]  ip_rcv_finish_core.isra.0+0xbb/0x440
[43040.203855]  ip_rcv+0xd5/0x110
[43040.203962]  ? ip_rcv_core+0x360/0x360
[43040.204079]  process_backlog+0x107/0x210
[43040.204201]  __napi_poll+0x20/0x180
[43040.204315]  net_rx_action+0x29f/0x380
[43040.204432]  __do_softirq+0xd0/0x202
[43040.204549]  irq_exit_rcu+0x82/0xa0
[43040.204667]  common_interrupt+0x7a/0xa0
[43040.204786]  </IRQ>
[43040.204876]  <TASK>
[43040.204965]  asm_common_interrupt+0x22/0x40
[43040.205090] RIP: 0010:acpi_safe_halt+0x1b/0x20
[43040.205220] Code: ed c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 65 48 8b 04 25 40 32 02 00 48 8b 00 a8 08 75 0c eb 07 0f 00 2d 57 0f 2c 00 fb f4 <fa> c3 0f 1f 00 0f b6 47 08 3c 01 74 0b 3c 02 74 05 8b 7f 04 eb 9f
[43040.205578] RSP: 0018:ffffa39d8234fe80 EFLAGS: 00000246
[43040.205718] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
[43040.205890] RDX: ffff9c5fafdc0000 RSI: ffff9c5882e95800 RDI: ffff9c5882e95864
[43040.206063] RBP: ffffffffa9216ea0 R08: ffffffffa9216ea0 R09: 0000000000000003
[43040.206246] R10: 0000000000000002 R11: 0000000000000008 R12: 0000000000000001
[43040.206419] R13: ffffffffa9216f08 R14: ffffffffa9216f20 R15: 0000000000000000
[43040.206593]  acpi_idle_enter+0x77/0xc0
[43040.206711]  cpuidle_enter_state+0x69/0x6a0
[43040.206835]  cpuidle_enter+0x24/0x40
[43040.206954]  do_idle+0x1a7/0x210
[43040.207066]  cpu_startup_entry+0x21/0x30
[43040.207188]  start_secondary+0xe1/0xf0
[43040.207310]  secondary_startup_64_no_verify+0x166/0x16b
[43040.207451]  </TASK>
[43040.207542] ---[ end trace 0000000000000000 ]---



[43040.198064] ------------[ cut here ]------------
[43040.198407] WARNING: CPU: 47 PID: 0 at lib/rcuref.c:294 rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
[43040.198685] Modules linked in: pppoe pppox ppp_generic slhc nft_limit nft_ct nft_nat nft_chain_nat nf_tables netconsole tg3 igb i2c_algo_bit e1000e bnxt_en mlx5_core mlxfw mlx4_en mlx4_core i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_devintf ipmi_msghandler rtc_cmos
[43040.199478] CPU: 47 PID: 0 Comm: swapper/47 Tainted: G           O       6.6.8 #1
[43040.199660] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[43040.199886] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
[43040.200028] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4e e3 00
All code
========
   0:	07                   	(bad)
   1:	83 f8 ff             	cmp    $0xffffffff,%eax
   4:	75 19                	jne    0x1f
   6:	ba 00 00 00 e0       	mov    $0xe0000000,%edx
   b:	f0 0f b1 17          	lock cmpxchg %edx,(%rdi)
   f:	83 f8 ff             	cmp    $0xffffffff,%eax
  12:	74 04                	je     0x18
  14:	31 c0                	xor    %eax,%eax
  16:	5b                   	pop    %rbx
  17:	c3                   	ret
  18:	b8 01 00 00 00       	mov    $0x1,%eax
  1d:	5b                   	pop    %rbx
  1e:	c3                   	ret
  1f:	3d ff ff ff bf       	cmp    $0xbfffffff,%eax
  24:	77 14                	ja     0x3a
  26:	85 c0                	test   %eax,%eax
  28:	78 06                	js     0x30
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	31 c0                	xor    %eax,%eax
  2e:	eb e6                	jmp    0x16
  30:	c7 07 00 00 00 a0    	movl   $0xa0000000,(%rdi)
  36:	31 c0                	xor    %eax,%eax
  38:	eb dc                	jmp    0x16
  3a:	80                   	.byte 0x80
  3b:	3d e2 4e e3 00       	cmp    $0xe34ee2,%eax

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	31 c0                	xor    %eax,%eax
   4:	eb e6                	jmp    0xffffffffffffffec
   6:	c7 07 00 00 00 a0    	movl   $0xa0000000,(%rdi)
   c:	31 c0                	xor    %eax,%eax
   e:	eb dc                	jmp    0xffffffffffffffec
  10:	80                   	.byte 0x80
  11:	3d e2 4e e3 00       	cmp    $0xe34ee2,%eax
[43040.200387] RSP: 0018:ffffa39d83e88c30 EFLAGS: 00010246
[43040.200528] RAX: 0000000000000000 RBX: ffff9c58e966b840 RCX: ffff9c5bc4e35680
[43040.200700] RDX: ffff9c5fafde4f08 RSI: 00000000fffffe01 RDI: ffff9c58e966b840
[43040.200871] RBP: ffff9c5fafde4f08 R08: ffff9c5fafde4f08 R09: 0000000000000001
[43040.201044] R10: 00000000000286e0 R11: 0000000000000001 R12: ffff9c58e966b800
[43040.201255] R13: ffff9c58e966b868 R14: ffff9c5fafde4f08 R15: 000000008f5de42b
[43040.201439] FS:  0000000000000000(0000) GS:ffff9c5fafdc0000(0000) knlGS:0000000000000000
[43040.201642] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43040.201799] CR2: 00007f1401217714 CR3: 0000000464b94003 CR4: 00000000001706e0
[43040.201994] Call Trace:
[43040.202095]  <IRQ>
[43040.202187] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
[43040.202301] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[43040.202418] ? handle_bug (arch/x86/kernel/traps.c:237)
[43040.202534] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
[43040.202652] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
[43040.202781] ? rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
[43040.202909] dst_release (net/core/dst.c:166 (discriminator 1))
[43040.203026] rt_cache_route (net/ipv4/route.c:1499)
[43040.203143] rt_set_nexthop.isra.0 (net/ipv4/route.c:1606 (discriminator 1))
[43040.203272] ip_route_input_slow (./include/net/lwtunnel.h:140 net/ipv4/route.c:1875 net/ipv4/route.c:2154 net/ipv4/route.c:2337)
[43040.203401] ? nf_conntrack_udp_packet (net/netfilter/nf_conntrack_proto_udp.c:124) nf_conntrack
[43040.203581] ip_route_input_noref (net/ipv4/route.c:2499)
[43040.203704] ip_rcv_finish_core.isra.0 (net/ipv4/ip_input.c:367 (discriminator 1))
[43040.203855] ip_rcv (net/ipv4/ip_input.c:448 ./include/linux/netfilter.h:304 ./include/linux/netfilter.h:298 net/ipv4/ip_input.c:569)
[43040.203962] ? ip_rcv_core (net/ipv4/ip_input.c:436)
[43040.204079] process_backlog (net/core/dev.c:5997)
[43040.204201] __napi_poll (net/core/dev.c:6556)
[43040.204315] net_rx_action (net/core/dev.c:6625 net/core/dev.c:6756)
[43040.204432] __do_softirq (./arch/x86/include/asm/preempt.h:27 kernel/softirq.c:564)
[43040.204549] irq_exit_rcu (kernel/softirq.c:436 kernel/softirq.c:641 kernel/softirq.c:653)
[43040.204667] common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 47))
[43040.204786]  </IRQ>
[43040.204876]  <TASK>
[43040.204965] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:640)
[43040.205090] RIP: 0010:acpi_safe_halt (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:72 drivers/acpi/processor_idle.c:113)
[43040.205220] Code: ed c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 65 48 8b 04 25 40 32 02 00 48 8b 00 a8 08 75 0c eb 07 0f 00 2d 57 0f 2c 00 fb f4 <fa> c3 0f 1f 00 0f b6 47 08 3c 01 74 0b 3c 02 74 05 8b 7f 04 eb 9f
All code
========
   0:	ed                   	in     (%dx),%eax
   1:	c3                   	ret
   2:	66 66 2e 0f 1f 84 00 	data16 cs nopw 0x0(%rax,%rax,1)
   9:	00 00 00 00
   d:	66 90                	xchg   %ax,%ax
   f:	65 48 8b 04 25 40 32 	mov    %gs:0x23240,%rax
  16:	02 00
  18:	48 8b 00             	mov    (%rax),%rax
  1b:	a8 08                	test   $0x8,%al
  1d:	75 0c                	jne    0x2b
  1f:	eb 07                	jmp    0x28
  21:	0f 00 2d 57 0f 2c 00 	verw   0x2c0f57(%rip)        # 0x2c0f7f
  28:	fb                   	sti
  29:	f4                   	hlt
  2a:*	fa                   	cli    		<-- trapping instruction
  2b:	c3                   	ret
  2c:	0f 1f 00             	nopl   (%rax)
  2f:	0f b6 47 08          	movzbl 0x8(%rdi),%eax
  33:	3c 01                	cmp    $0x1,%al
  35:	74 0b                	je     0x42
  37:	3c 02                	cmp    $0x2,%al
  39:	74 05                	je     0x40
  3b:	8b 7f 04             	mov    0x4(%rdi),%edi
  3e:	eb 9f                	jmp    0xffffffffffffffdf

Code starting with the faulting instruction
===========================================
   0:	fa                   	cli
   1:	c3                   	ret
   2:	0f 1f 00             	nopl   (%rax)
   5:	0f b6 47 08          	movzbl 0x8(%rdi),%eax
   9:	3c 01                	cmp    $0x1,%al
   b:	74 0b                	je     0x18
   d:	3c 02                	cmp    $0x2,%al
   f:	74 05                	je     0x16
  11:	8b 7f 04             	mov    0x4(%rdi),%edi
  14:	eb 9f                	jmp    0xffffffffffffffb5
[43040.205578] RSP: 0018:ffffa39d8234fe80 EFLAGS: 00000246
[43040.205718] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
[43040.205890] RDX: ffff9c5fafdc0000 RSI: ffff9c5882e95800 RDI: ffff9c5882e95864
[43040.206063] RBP: ffffffffa9216ea0 R08: ffffffffa9216ea0 R09: 0000000000000003
[43040.206246] R10: 0000000000000002 R11: 0000000000000008 R12: 0000000000000001
[43040.206419] R13: ffffffffa9216f08 R14: ffffffffa9216f20 R15: 0000000000000000
[43040.206593] acpi_idle_enter (drivers/acpi/processor_idle.c:709)
[43040.206711] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:267)
[43040.206835] cpuidle_enter (drivers/cpuidle/cpuidle.c:390 (discriminator 2))
[43040.206954] do_idle (kernel/sched/idle.c:134 kernel/sched/idle.c:215 kernel/sched/idle.c:282)
[43040.207066] cpu_startup_entry (kernel/sched/idle.c:379)
[43040.207188] start_secondary (arch/x86/kernel/smpboot.c:326)
[43040.207310] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
[43040.207451]  </TASK>
[43040.207542] ---[ end trace 0000000000000000 ]---

> On 19 Dec 2023, at 16:26, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Tue, Dec 19 2023 at 11:25, Martin Zaharinov wrote:
>>> On 12 Dec 2023, at 20:16, Thomas Gleixner <tglx@linutronix.de> wrote:
>>> Btw, how easy is this to reproduce?
>> 
>> Its not easy this report is generate on machine with 5-6k users , with
>> traffic and one time is show on 1 day , other show after 4-5 days…
> 
> I love those bugs ...
> 
>> Apply this patch and will upload image on one machine as fast as
>> possible and when get any reports will send you.
> 
> Let's see how that goes!
> 
> Thanks,
> 
>        tglx


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-12-22 17:26                   ` Martin Zaharinov
@ 2023-12-29 12:00                     ` Martin Zaharinov
  2024-01-04 20:51                       ` Martin Zaharinov
  0 siblings, 1 reply; 35+ messages in thread
From: Martin Zaharinov @ 2023-12-29 12:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: peterz, netdev, Paolo Abeni, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern,
	Eric Dumazet

Hi Thomas,

One more report from second machine:

[21299.954952] ------------[ cut here ]------------
[21299.955047] WARNING: CPU: 15 PID: 0 at lib/rcuref.c:294 rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
[21299.955153] Modules linked in: nft_limit nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp virtio_net net_failover failover virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring e1000e e1000 vmxnet3 i40e ixgbe mdio bnxt_en nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rtc_cmos
[21299.955378] CPU: 15 PID: 0 Comm: swapper/15 Tainted: G           O       6.6.8 #1
[21299.955475] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 02/09/2023
[21299.955575] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
[21299.955662] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4e e3 00
All code
========
   0:	07                   	(bad)
   1:	83 f8 ff             	cmp    $0xffffffff,%eax
   4:	75 19                	jne    0x1f
   6:	ba 00 00 00 e0       	mov    $0xe0000000,%edx
   b:	f0 0f b1 17          	lock cmpxchg %edx,(%rdi)
   f:	83 f8 ff             	cmp    $0xffffffff,%eax
  12:	74 04                	je     0x18
  14:	31 c0                	xor    %eax,%eax
  16:	5b                   	pop    %rbx
  17:	c3                   	ret
  18:	b8 01 00 00 00       	mov    $0x1,%eax
  1d:	5b                   	pop    %rbx
  1e:	c3                   	ret
  1f:	3d ff ff ff bf       	cmp    $0xbfffffff,%eax
  24:	77 14                	ja     0x3a
  26:	85 c0                	test   %eax,%eax
  28:	78 06                	js     0x30
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	31 c0                	xor    %eax,%eax
  2e:	eb e6                	jmp    0x16
  30:	c7 07 00 00 00 a0    	movl   $0xa0000000,(%rdi)
  36:	31 c0                	xor    %eax,%eax
  38:	eb dc                	jmp    0x16
  3a:	80                   	.byte 0x80
  3b:	3d e2 4e e3 00       	cmp    $0xe34ee2,%eax

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	31 c0                	xor    %eax,%eax
   4:	eb e6                	jmp    0xffffffffffffffec
   6:	c7 07 00 00 00 a0    	movl   $0xa0000000,(%rdi)
   c:	31 c0                	xor    %eax,%eax
   e:	eb dc                	jmp    0xffffffffffffffec
  10:	80                   	.byte 0x80
  11:	3d e2 4e e3 00       	cmp    $0xe34ee2,%eax
[21299.955793] RSP: 0018:ffff96a7c0578c30 EFLAGS: 00010246
[21299.955879] RAX: 0000000000000000 RBX: ffff8b75d1e49a80 RCX: ffff8b75c6667c80
[21299.955974] RDX: ffff8b84bfbe4f08 RSI: 00000000fffffe01 RDI: ffff8b75d1e49a80
[21299.956070] RBP: ffff8b84bfbe4f08 R08: ffff8b84bfbe4f08 R09: 0000000000000001
[21299.956167] R10: 0000000000028530 R11: 0000000000000001 R12: ffff8b75d1e49a40
[21299.956261] R13: ffff8b75d1e49aa8 R14: ffff8b84bfbe4f08 R15: 00000000c26ab667
[21299.956358] FS:  0000000000000000(0000) GS:ffff8b84bfbc0000(0000) knlGS:0000000000000000
[21299.956457] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21299.956540] CR2: 00007f2e185c73c8 CR3: 0000000950014003 CR4: 00000000003706e0
[21299.956635] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[21299.956730] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[21299.956826] Call Trace:
[21299.956905]  <IRQ>
[21299.956983] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
[21299.957065] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[21299.957147] ? handle_bug (arch/x86/kernel/traps.c:237)
[21299.957228] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
[21299.957308] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
[21299.957393] ? rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
[21299.957476] dst_release (net/core/dst.c:166 (discriminator 1))
[21299.957559] rt_cache_route (net/ipv4/route.c:1499)
[21299.957641] rt_set_nexthop.isra.0 (net/ipv4/route.c:1606 (discriminator 1))
[21299.957722] ip_route_input_slow (./include/net/lwtunnel.h:140 net/ipv4/route.c:1875 net/ipv4/route.c:2154 net/ipv4/route.c:2337)
[21299.957804] ? free_unref_page (./include/linux/list.h:150 (discriminator 1) ./include/linux/list.h:169 (discriminator 1) mm/page_alloc.c:2377 (discriminator 1) mm/page_alloc.c:2428 (discriminator 1))
[21299.957889] ip_route_input_noref (net/ipv4/route.c:2499)
[21299.957972] ip_rcv_finish_core.isra.0 (net/ipv4/ip_input.c:367 (discriminator 1))
[21299.958058] ip_rcv (net/ipv4/ip_input.c:448 ./include/linux/netfilter.h:304 ./include/linux/netfilter.h:298 net/ipv4/ip_input.c:569)
[21299.958139] ? ip_rcv_core (net/ipv4/ip_input.c:436)
[21299.958220] process_backlog (net/core/dev.c:5997)
[21299.958302] __napi_poll (net/core/dev.c:6556)
[21299.958384] net_rx_action (net/core/dev.c:6625 net/core/dev.c:6756)
[21299.958466] __do_softirq (./arch/x86/include/asm/preempt.h:27 kernel/softirq.c:564)
[21299.958549] irq_exit_rcu (kernel/softirq.c:436 kernel/softirq.c:641 kernel/softirq.c:653)
[21299.958631] sysvec_call_function_single (arch/x86/kernel/smp.c:262 (discriminator 47))
[21299.958714]  </IRQ>
[21299.958792]  <TASK>
[21299.958869] asm_sysvec_call_function_single (./arch/x86/include/asm/idtentry.h:656)
[21299.958953] RIP: 0010:acpi_safe_halt (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:72 drivers/acpi/processor_idle.c:113)
[21299.959038] Code: ed c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 65 48 8b 04 25 40 32 02 00 48 8b 00 a8 08 75 0c eb 07 0f 00 2d 57 0f 2c 00 fb f4 <fa> c3 0f 1f 00 0f b6 47 08 3c 01 74 0b 3c 02 74 05 8b 7f 04 eb 9f
All code
========
   0:	ed                   	in     (%dx),%eax
   1:	c3                   	ret
   2:	66 66 2e 0f 1f 84 00 	data16 cs nopw 0x0(%rax,%rax,1)
   9:	00 00 00 00
   d:	66 90                	xchg   %ax,%ax
   f:	65 48 8b 04 25 40 32 	mov    %gs:0x23240,%rax
  16:	02 00
  18:	48 8b 00             	mov    (%rax),%rax
  1b:	a8 08                	test   $0x8,%al
  1d:	75 0c                	jne    0x2b
  1f:	eb 07                	jmp    0x28
  21:	0f 00 2d 57 0f 2c 00 	verw   0x2c0f57(%rip)        # 0x2c0f7f
  28:	fb                   	sti
  29:	f4                   	hlt
  2a:*	fa                   	cli    		<-- trapping instruction
  2b:	c3                   	ret
  2c:	0f 1f 00             	nopl   (%rax)
  2f:	0f b6 47 08          	movzbl 0x8(%rdi),%eax
  33:	3c 01                	cmp    $0x1,%al
  35:	74 0b                	je     0x42
  37:	3c 02                	cmp    $0x2,%al
  39:	74 05                	je     0x40
  3b:	8b 7f 04             	mov    0x4(%rdi),%edi
  3e:	eb 9f                	jmp    0xffffffffffffffdf

Code starting with the faulting instruction
===========================================
   0:	fa                   	cli
   1:	c3                   	ret
   2:	0f 1f 00             	nopl   (%rax)
   5:	0f b6 47 08          	movzbl 0x8(%rdi),%eax
   9:	3c 01                	cmp    $0x1,%al
   b:	74 0b                	je     0x18
   d:	3c 02                	cmp    $0x2,%al
   f:	74 05                	je     0x16
  11:	8b 7f 04             	mov    0x4(%rdi),%edi
  14:	eb 9f                	jmp    0xffffffffffffffb5
[21299.959162] RSP: 0018:ffff96a7c015be80 EFLAGS: 00000246
[21299.959247] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
[21299.959343] RDX: ffff8b84bfbc0000 RSI: ffff8b75c76ba000 RDI: ffff8b75c76ba064
[21299.959437] RBP: ffffffffae216ea0 R08: ffffffffae216ea0 R09: 0000000000000003
[21299.959533] R10: 0000000000000002 R11: 0000000000000008 R12: 0000000000000001
[21299.959630] R13: ffffffffae216f08 R14: ffffffffae216f20 R15: 0000000000000000
[21299.959725] acpi_idle_enter (drivers/acpi/processor_idle.c:709)
[21299.959807] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:267)
[21299.959890] cpuidle_enter (drivers/cpuidle/cpuidle.c:390 (discriminator 2))
[21299.959975] do_idle (kernel/sched/idle.c:134 kernel/sched/idle.c:215 kernel/sched/idle.c:282)
[21299.960058] cpu_startup_entry (kernel/sched/idle.c:379)
[21299.960140] start_secondary (arch/x86/kernel/smpboot.c:326)
[21299.960223] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
[21299.960306]  </TASK>
[21299.960384] ---[ end trace 0000000000000000 ]---

> On 22 Dec 2023, at 19:26, Martin Zaharinov <micron10@gmail.com> wrote:
> 
> Hi Thomas,
> 
> this is with applyed patch from you.
> See logs
> 
> 
> [43040.198064] ------------[ cut here ]------------
> [43040.198407] WARNING: CPU: 47 PID: 0 at lib/rcuref.c:294 rcuref_put_slowpath+0x2f/0x70
> [43040.198685] Modules linked in: pppoe pppox ppp_generic slhc nft_limit nft_ct nft_nat nft_chain_nat nf_tables netconsole tg3 igb i2c_algo_bit e1000e bnxt_en mlx5_core mlxfw mlx4_en mlx4_core i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_devintf ipmi_msghandler rtc_cmos
> [43040.199478] CPU: 47 PID: 0 Comm: swapper/47 Tainted: G           O       6.6.8 #1
> [43040.199660] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> [43040.199886] RIP: 0010:rcuref_put_slowpath+0x2f/0x70
> [43040.200028] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4e e3 00
> [43040.200387] RSP: 0018:ffffa39d83e88c30 EFLAGS: 00010246
> [43040.200528] RAX: 0000000000000000 RBX: ffff9c58e966b840 RCX: ffff9c5bc4e35680
> [43040.200700] RDX: ffff9c5fafde4f08 RSI: 00000000fffffe01 RDI: ffff9c58e966b840
> [43040.200871] RBP: ffff9c5fafde4f08 R08: ffff9c5fafde4f08 R09: 0000000000000001
> [43040.201044] R10: 00000000000286e0 R11: 0000000000000001 R12: ffff9c58e966b800
> [43040.201255] R13: ffff9c58e966b868 R14: ffff9c5fafde4f08 R15: 000000008f5de42b
> [43040.201439] FS:  0000000000000000(0000) GS:ffff9c5fafdc0000(0000) knlGS:0000000000000000
> [43040.201642] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [43040.201799] CR2: 00007f1401217714 CR3: 0000000464b94003 CR4: 00000000001706e0
> [43040.201994] Call Trace:
> [43040.202095]  <IRQ>
> [43040.202187]  ? __warn+0x6c/0x130
> [43040.202301]  ? report_bug+0x1b8/0x200
> [43040.202418]  ? handle_bug+0x36/0x70
> [43040.202534]  ? exc_invalid_op+0x17/0x1a0
> [43040.202652]  ? asm_exc_invalid_op+0x16/0x20
> [43040.202781]  ? rcuref_put_slowpath+0x2f/0x70
> [43040.202909]  dst_release+0x1c/0x40
> [43040.203026]  rt_cache_route+0xbd/0xf0
> [43040.203143]  rt_set_nexthop.isra.0+0x1b6/0x450
> [43040.203272]  ip_route_input_slow+0x5d9/0xcc0
> [43040.203401]  ? nf_conntrack_udp_packet+0x17c/0x240 [nf_conntrack]
> [43040.203581]  ip_route_input_noref+0xe0/0xf0
> [43040.203704]  ip_rcv_finish_core.isra.0+0xbb/0x440
> [43040.203855]  ip_rcv+0xd5/0x110
> [43040.203962]  ? ip_rcv_core+0x360/0x360
> [43040.204079]  process_backlog+0x107/0x210
> [43040.204201]  __napi_poll+0x20/0x180
> [43040.204315]  net_rx_action+0x29f/0x380
> [43040.204432]  __do_softirq+0xd0/0x202
> [43040.204549]  irq_exit_rcu+0x82/0xa0
> [43040.204667]  common_interrupt+0x7a/0xa0
> [43040.204786]  </IRQ>
> [43040.204876]  <TASK>
> [43040.204965]  asm_common_interrupt+0x22/0x40
> [43040.205090] RIP: 0010:acpi_safe_halt+0x1b/0x20
> [43040.205220] Code: ed c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 65 48 8b 04 25 40 32 02 00 48 8b 00 a8 08 75 0c eb 07 0f 00 2d 57 0f 2c 00 fb f4 <fa> c3 0f 1f 00 0f b6 47 08 3c 01 74 0b 3c 02 74 05 8b 7f 04 eb 9f
> [43040.205578] RSP: 0018:ffffa39d8234fe80 EFLAGS: 00000246
> [43040.205718] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
> [43040.205890] RDX: ffff9c5fafdc0000 RSI: ffff9c5882e95800 RDI: ffff9c5882e95864
> [43040.206063] RBP: ffffffffa9216ea0 R08: ffffffffa9216ea0 R09: 0000000000000003
> [43040.206246] R10: 0000000000000002 R11: 0000000000000008 R12: 0000000000000001
> [43040.206419] R13: ffffffffa9216f08 R14: ffffffffa9216f20 R15: 0000000000000000
> [43040.206593]  acpi_idle_enter+0x77/0xc0
> [43040.206711]  cpuidle_enter_state+0x69/0x6a0
> [43040.206835]  cpuidle_enter+0x24/0x40
> [43040.206954]  do_idle+0x1a7/0x210
> [43040.207066]  cpu_startup_entry+0x21/0x30
> [43040.207188]  start_secondary+0xe1/0xf0
> [43040.207310]  secondary_startup_64_no_verify+0x166/0x16b
> [43040.207451]  </TASK>
> [43040.207542] ---[ end trace 0000000000000000 ]---
> 
> 
> 
> [43040.198064] ------------[ cut here ]------------
> [43040.198407] WARNING: CPU: 47 PID: 0 at lib/rcuref.c:294 rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
> [43040.198685] Modules linked in: pppoe pppox ppp_generic slhc nft_limit nft_ct nft_nat nft_chain_nat nf_tables netconsole tg3 igb i2c_algo_bit e1000e bnxt_en mlx5_core mlxfw mlx4_en mlx4_core i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_devintf ipmi_msghandler rtc_cmos
> [43040.199478] CPU: 47 PID: 0 Comm: swapper/47 Tainted: G           O       6.6.8 #1
> [43040.199660] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> [43040.199886] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
> [43040.200028] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4e e3 00
> All code
> ========
>   0: 07                    (bad)
>   1: 83 f8 ff              cmp    $0xffffffff,%eax
>   4: 75 19                 jne    0x1f
>   6: ba 00 00 00 e0        mov    $0xe0000000,%edx
>   b: f0 0f b1 17           lock cmpxchg %edx,(%rdi)
>   f: 83 f8 ff              cmp    $0xffffffff,%eax
>  12: 74 04                 je     0x18
>  14: 31 c0                 xor    %eax,%eax
>  16: 5b                    pop    %rbx
>  17: c3                    ret
>  18: b8 01 00 00 00        mov    $0x1,%eax
>  1d: 5b                    pop    %rbx
>  1e: c3                    ret
>  1f: 3d ff ff ff bf        cmp    $0xbfffffff,%eax
>  24: 77 14                 ja     0x3a
>  26: 85 c0                 test   %eax,%eax
>  28: 78 06                 js     0x30
>  2a:* 0f 0b                 ud2     <-- trapping instruction
>  2c: 31 c0                 xor    %eax,%eax
>  2e: eb e6                 jmp    0x16
>  30: c7 07 00 00 00 a0     movl   $0xa0000000,(%rdi)
>  36: 31 c0                 xor    %eax,%eax
>  38: eb dc                 jmp    0x16
>  3a: 80                    .byte 0x80
>  3b: 3d e2 4e e3 00        cmp    $0xe34ee2,%eax
> 
> Code starting with the faulting instruction
> ===========================================
>   0: 0f 0b                 ud2
>   2: 31 c0                 xor    %eax,%eax
>   4: eb e6                 jmp    0xffffffffffffffec
>   6: c7 07 00 00 00 a0     movl   $0xa0000000,(%rdi)
>   c: 31 c0                 xor    %eax,%eax
>   e: eb dc                 jmp    0xffffffffffffffec
>  10: 80                    .byte 0x80
>  11: 3d e2 4e e3 00        cmp    $0xe34ee2,%eax
> [43040.200387] RSP: 0018:ffffa39d83e88c30 EFLAGS: 00010246
> [43040.200528] RAX: 0000000000000000 RBX: ffff9c58e966b840 RCX: ffff9c5bc4e35680
> [43040.200700] RDX: ffff9c5fafde4f08 RSI: 00000000fffffe01 RDI: ffff9c58e966b840
> [43040.200871] RBP: ffff9c5fafde4f08 R08: ffff9c5fafde4f08 R09: 0000000000000001
> [43040.201044] R10: 00000000000286e0 R11: 0000000000000001 R12: ffff9c58e966b800
> [43040.201255] R13: ffff9c58e966b868 R14: ffff9c5fafde4f08 R15: 000000008f5de42b
> [43040.201439] FS:  0000000000000000(0000) GS:ffff9c5fafdc0000(0000) knlGS:0000000000000000
> [43040.201642] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [43040.201799] CR2: 00007f1401217714 CR3: 0000000464b94003 CR4: 00000000001706e0
> [43040.201994] Call Trace:
> [43040.202095]  <IRQ>
> [43040.202187] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
> [43040.202301] ? report_bug (lib/bug.c:180 lib/bug.c:219)
> [43040.202418] ? handle_bug (arch/x86/kernel/traps.c:237)
> [43040.202534] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
> [43040.202652] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
> [43040.202781] ? rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
> [43040.202909] dst_release (net/core/dst.c:166 (discriminator 1))
> [43040.203026] rt_cache_route (net/ipv4/route.c:1499)
> [43040.203143] rt_set_nexthop.isra.0 (net/ipv4/route.c:1606 (discriminator 1))
> [43040.203272] ip_route_input_slow (./include/net/lwtunnel.h:140 net/ipv4/route.c:1875 net/ipv4/route.c:2154 net/ipv4/route.c:2337)
> [43040.203401] ? nf_conntrack_udp_packet (net/netfilter/nf_conntrack_proto_udp.c:124) nf_conntrack
> [43040.203581] ip_route_input_noref (net/ipv4/route.c:2499)
> [43040.203704] ip_rcv_finish_core.isra.0 (net/ipv4/ip_input.c:367 (discriminator 1))
> [43040.203855] ip_rcv (net/ipv4/ip_input.c:448 ./include/linux/netfilter.h:304 ./include/linux/netfilter.h:298 net/ipv4/ip_input.c:569)
> [43040.203962] ? ip_rcv_core (net/ipv4/ip_input.c:436)
> [43040.204079] process_backlog (net/core/dev.c:5997)
> [43040.204201] __napi_poll (net/core/dev.c:6556)
> [43040.204315] net_rx_action (net/core/dev.c:6625 net/core/dev.c:6756)
> [43040.204432] __do_softirq (./arch/x86/include/asm/preempt.h:27 kernel/softirq.c:564)
> [43040.204549] irq_exit_rcu (kernel/softirq.c:436 kernel/softirq.c:641 kernel/softirq.c:653)
> [43040.204667] common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 47))
> [43040.204786]  </IRQ>
> [43040.204876]  <TASK>
> [43040.204965] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:640)
> [43040.205090] RIP: 0010:acpi_safe_halt (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:72 drivers/acpi/processor_idle.c:113)
> [43040.205220] Code: ed c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 65 48 8b 04 25 40 32 02 00 48 8b 00 a8 08 75 0c eb 07 0f 00 2d 57 0f 2c 00 fb f4 <fa> c3 0f 1f 00 0f b6 47 08 3c 01 74 0b 3c 02 74 05 8b 7f 04 eb 9f
> All code
> ========
>   0: ed                    in     (%dx),%eax
>   1: c3                    ret
>   2: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
>   9: 00 00 00 00
>   d: 66 90                 xchg   %ax,%ax
>   f: 65 48 8b 04 25 40 32 mov    %gs:0x23240,%rax
>  16: 02 00
>  18: 48 8b 00              mov    (%rax),%rax
>  1b: a8 08                 test   $0x8,%al
>  1d: 75 0c                 jne    0x2b
>  1f: eb 07                 jmp    0x28
>  21: 0f 00 2d 57 0f 2c 00 verw   0x2c0f57(%rip)        # 0x2c0f7f
>  28: fb                    sti
>  29: f4                    hlt
>  2a:* fa                    cli     <-- trapping instruction
>  2b: c3                    ret
>  2c: 0f 1f 00              nopl   (%rax)
>  2f: 0f b6 47 08           movzbl 0x8(%rdi),%eax
>  33: 3c 01                 cmp    $0x1,%al
>  35: 74 0b                 je     0x42
>  37: 3c 02                 cmp    $0x2,%al
>  39: 74 05                 je     0x40
>  3b: 8b 7f 04              mov    0x4(%rdi),%edi
>  3e: eb 9f                 jmp    0xffffffffffffffdf
> 
> Code starting with the faulting instruction
> ===========================================
>   0: fa                    cli
>   1: c3                    ret
>   2: 0f 1f 00              nopl   (%rax)
>   5: 0f b6 47 08           movzbl 0x8(%rdi),%eax
>   9: 3c 01                 cmp    $0x1,%al
>   b: 74 0b                 je     0x18
>   d: 3c 02                 cmp    $0x2,%al
>   f: 74 05                 je     0x16
>  11: 8b 7f 04              mov    0x4(%rdi),%edi
>  14: eb 9f                 jmp    0xffffffffffffffb5
> [43040.205578] RSP: 0018:ffffa39d8234fe80 EFLAGS: 00000246
> [43040.205718] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
> [43040.205890] RDX: ffff9c5fafdc0000 RSI: ffff9c5882e95800 RDI: ffff9c5882e95864
> [43040.206063] RBP: ffffffffa9216ea0 R08: ffffffffa9216ea0 R09: 0000000000000003
> [43040.206246] R10: 0000000000000002 R11: 0000000000000008 R12: 0000000000000001
> [43040.206419] R13: ffffffffa9216f08 R14: ffffffffa9216f20 R15: 0000000000000000
> [43040.206593] acpi_idle_enter (drivers/acpi/processor_idle.c:709)
> [43040.206711] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:267)
> [43040.206835] cpuidle_enter (drivers/cpuidle/cpuidle.c:390 (discriminator 2))
> [43040.206954] do_idle (kernel/sched/idle.c:134 kernel/sched/idle.c:215 kernel/sched/idle.c:282)
> [43040.207066] cpu_startup_entry (kernel/sched/idle.c:379)
> [43040.207188] start_secondary (arch/x86/kernel/smpboot.c:326)
> [43040.207310] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
> [43040.207451]  </TASK>
> [43040.207542] ---[ end trace 0000000000000000 ]---
> 
>> On 19 Dec 2023, at 16:26, Thomas Gleixner <tglx@linutronix.de> wrote:
>> 
>> On Tue, Dec 19 2023 at 11:25, Martin Zaharinov wrote:
>>>> On 12 Dec 2023, at 20:16, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>> Btw, how easy is this to reproduce?
>>> 
>>> Its not easy this report is generate on machine with 5-6k users , with
>>> traffic and one time is show on 1 day , other show after 4-5 days…
>> 
>> I love those bugs ...
>> 
>>> Apply this patch and will upload image on one machine as fast as
>>> possible and when get any reports will send you.
>> 
>> Let's see how that goes!
>> 
>> Thanks,
>> 
>>       tglx
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2023-12-29 12:00                     ` Martin Zaharinov
@ 2024-01-04 20:51                       ` Martin Zaharinov
  2024-01-07 11:03                         ` Martin Zaharinov
  0 siblings, 1 reply; 35+ messages in thread
From: Martin Zaharinov @ 2024-01-04 20:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: peterz, netdev, Paolo Abeni, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern,
	Eric Dumazet

Hi Thomas ,

Happy New Year!

here is two debugs from two new installed machins with kernel 6.6.9:

dmesg1 :

[ 2257.449125] ------------[ cut here ]------------
[ 2257.449245] WARNING: CPU: 1 PID: 40622 at lib/rcuref.c:294 rcuref_put_slowpath+0x2f/0x70
[ 2257.449373] Modules linked in: nft_limit nf_conntrack_netlink pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding ixgbe mdio i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
[ 2257.449642] CPU: 1 PID: 40622 Comm: nc Tainted: G           O       6.6.9 #1
[ 2257.449761] Hardware name: Supermicro PIO-5038MR-H8TRF-NODE/X10SRD-F, BIOS 3.3 10/28/2020
[ 2257.449883] RIP: 0010:rcuref_put_slowpath+0x2f/0x70
[ 2257.449977] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4c e3 00
[ 2257.450135] RSP: 0000:ffffb455cef83b78 EFLAGS: 00010246
[ 2257.450227] RAX: 0000000000000000 RBX: ffff94873bb77dc0 RCX: ffff9486c0d46b80
[ 2257.450341] RDX: ffff948736578428 RSI: 00000000fffffe01 RDI: ffff94873bb77dc0
[ 2257.450456] RBP: ffff948736578428 R08: ffff948e1fa64f08 R09: 0000000000000001
[ 2257.450570] R10: 0000000000028530 R11: 0000000000000001 R12: ffff94873bb77d80
[ 2257.450685] R13: ffff94873bb77de8 R14: ffff948e1fa64f08 R15: 000000000266f59d
[ 2257.450802] FS:  00007f0cdbc73800(0000) GS:ffff948e1fa40000(0000) knlGS:0000000000000000
[ 2257.450918] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2257.451012] CR2: 00007f0cdc3f5c30 CR3: 0000000178ea0002 CR4: 00000000003706e0
[ 2257.451127] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2257.451240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2257.451353] Call Trace:
[ 2257.451441]  <TASK>
[ 2257.451526]  ? __warn+0x6c/0x130
[ 2257.451616]  ? report_bug+0x1b8/0x200
[ 2257.451707]  ? handle_bug+0x36/0x70
[ 2257.451797]  ? exc_invalid_op+0x17/0x1a0
[ 2257.451886]  ? asm_exc_invalid_op+0x16/0x20
[ 2257.452038]  ? rcuref_put_slowpath+0x2f/0x70
[ 2257.452129]  dst_release+0x1c/0x40
[ 2257.452222]  rt_cache_route+0xbd/0xf0
[ 2257.452313]  ? kmem_cache_alloc+0x31/0x390
[ 2257.452404]  rt_set_nexthop.isra.0+0x1b6/0x450
[ 2257.452495]  ip_route_input_slow+0x5d9/0xcc0
[ 2257.452586]  ? nft_nat_do_chain+0x7f/0xd0 [nft_chain_nat]
[ 2257.452681]  ? nf_conntrack_udp_packet+0xcf/0x240 [nf_conntrack]
[ 2257.452784]  ? nf_nat_inet_fn+0x36f/0x3f0 [nf_nat]
[ 2257.452880]  ip_route_input_noref+0xe0/0xf0
[ 2257.452970]  ip_rcv_finish_core.isra.0+0xbb/0x440
[ 2257.453064]  ip_rcv+0xd5/0x110
[ 2257.453151]  ? ip_rcv_core+0x360/0x360
[ 2257.453240]  process_backlog+0x107/0x210
[ 2257.453330]  __napi_poll+0x20/0x180
[ 2257.453420]  net_rx_action+0x29f/0x380
[ 2257.453510]  __do_softirq+0xd0/0x202
[ 2257.453599]  irq_exit_rcu+0x82/0xa0
[ 2257.453689]  sysvec_call_function_single+0x32/0x80
[ 2257.453781]  asm_sysvec_call_function_single+0x16/0x20
[ 2257.453874] RIP: 0033:0x7f0cdc5928b2
[ 2257.453963] Code: 06 00 00 4c 89 65 88 49 83 fd 08 0f 84 f7 06 00 00 49 83 fd 26 0f 84 05 07 00 00 4d 85 ed 0f 84 5f 01 00 00 41 0f b6 44 24 04 <89> c6 40 c0 ee 04 0f 84 72 06 00 00 41 0f b6 54 24 05 83 e2 03 ff
[ 2257.454121] RSP: 002b:00007ffc04d3e890 EFLAGS: 00000206
[ 2257.454215] RAX: 0000000000000012 RBX: 00007f0cdc444db8 RCX: 00007f0cdc4e6e60
[ 2257.454329] RDX: 0000000000000009 RSI: 00007f0cdc57ef30 RDI: 00007f0cdc42c808
[ 2257.454442] RBP: 00007ffc04d3e9b0 R08: 00007f0cdc445028 R09: 00007ffc04d3e940
[ 2257.454555] R10: 00007f0cdbf00be8 R11: 0000000000000000 R12: 00007f0cdc42c898
[ 2257.454670] R13: 0000000000000006 R14: 0000000600000006 R15: 00007f0cdc581000
[ 2257.454784]  </TASK>
[ 2257.454869] ---[ end trace 0000000000000000 ]—


[ 2257.449125] ------------[ cut here ]------------
[ 2257.449245] WARNING: CPU: 1 PID: 40622 at lib/rcuref.c:294 rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
[ 2257.449373] Modules linked in: nft_limit nf_conntrack_netlink pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding ixgbe mdio i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
[ 2257.449642] CPU: 1 PID: 40622 Comm: nc Tainted: G           O       6.6.9 #1
[ 2257.449761] Hardware name: Supermicro PIO-5038MR-H8TRF-NODE/X10SRD-F, BIOS 3.3 10/28/2020
[ 2257.449883] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
[ 2257.449977] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4c e3 00
All code
========
   0:	07                   	(bad)
   1:	83 f8 ff             	cmp    $0xffffffff,%eax
   4:	75 19                	jne    0x1f
   6:	ba 00 00 00 e0       	mov    $0xe0000000,%edx
   b:	f0 0f b1 17          	lock cmpxchg %edx,(%rdi)
   f:	83 f8 ff             	cmp    $0xffffffff,%eax
  12:	74 04                	je     0x18
  14:	31 c0                	xor    %eax,%eax
  16:	5b                   	pop    %rbx
  17:	c3                   	ret
  18:	b8 01 00 00 00       	mov    $0x1,%eax
  1d:	5b                   	pop    %rbx
  1e:	c3                   	ret
  1f:	3d ff ff ff bf       	cmp    $0xbfffffff,%eax
  24:	77 14                	ja     0x3a
  26:	85 c0                	test   %eax,%eax
  28:	78 06                	js     0x30
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	31 c0                	xor    %eax,%eax
  2e:	eb e6                	jmp    0x16
  30:	c7 07 00 00 00 a0    	movl   $0xa0000000,(%rdi)
  36:	31 c0                	xor    %eax,%eax
  38:	eb dc                	jmp    0x16
  3a:	80                   	.byte 0x80
  3b:	3d e2 4c e3 00       	cmp    $0xe34ce2,%eax

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	31 c0                	xor    %eax,%eax
   4:	eb e6                	jmp    0xffffffffffffffec
   6:	c7 07 00 00 00 a0    	movl   $0xa0000000,(%rdi)
   c:	31 c0                	xor    %eax,%eax
   e:	eb dc                	jmp    0xffffffffffffffec
  10:	80                   	.byte 0x80
  11:	3d e2 4c e3 00       	cmp    $0xe34ce2,%eax
[ 2257.450135] RSP: 0000:ffffb455cef83b78 EFLAGS: 00010246
[ 2257.450227] RAX: 0000000000000000 RBX: ffff94873bb77dc0 RCX: ffff9486c0d46b80
[ 2257.450341] RDX: ffff948736578428 RSI: 00000000fffffe01 RDI: ffff94873bb77dc0
[ 2257.450456] RBP: ffff948736578428 R08: ffff948e1fa64f08 R09: 0000000000000001
[ 2257.450570] R10: 0000000000028530 R11: 0000000000000001 R12: ffff94873bb77d80
[ 2257.450685] R13: ffff94873bb77de8 R14: ffff948e1fa64f08 R15: 000000000266f59d
[ 2257.450802] FS:  00007f0cdbc73800(0000) GS:ffff948e1fa40000(0000) knlGS:0000000000000000
[ 2257.450918] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2257.451012] CR2: 00007f0cdc3f5c30 CR3: 0000000178ea0002 CR4: 00000000003706e0
[ 2257.451127] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2257.451240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2257.451353] Call Trace:
[ 2257.451441]  <TASK>
[ 2257.451526] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
[ 2257.451616] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 2257.451707] ? handle_bug (arch/x86/kernel/traps.c:237)
[ 2257.451797] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
[ 2257.451886] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
[ 2257.452038] ? rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
[ 2257.452129] dst_release (net/core/dst.c:166 (discriminator 1))
[ 2257.452222] rt_cache_route (net/ipv4/route.c:1499)
[ 2257.452313] ? kmem_cache_alloc (mm/slab.h:711 (discriminator 1) mm/slub.c:3461 (discriminator 1) mm/slub.c:3487 (discriminator 1) mm/slub.c:3494 (discriminator 1) mm/slub.c:3503 (discriminator 1))
[ 2257.452404] rt_set_nexthop.isra.0 (net/ipv4/route.c:1606 (discriminator 1))
[ 2257.452495] ip_route_input_slow (./include/net/lwtunnel.h:140 net/ipv4/route.c:1875 net/ipv4/route.c:2154 net/ipv4/route.c:2337)
[ 2257.452586] ? nft_nat_do_chain (net/netfilter/nft_chain_nat.c:33) nft_chain_nat
[ 2257.452681] ? nf_conntrack_udp_packet (net/netfilter/nf_conntrack_proto_udp.c:130) nf_conntrack
[ 2257.452784] ? nf_nat_inet_fn (net/netfilter/nf_nat_core.c:844) nf_nat
[ 2257.452880] ip_route_input_noref (net/ipv4/route.c:2499)
[ 2257.452970] ip_rcv_finish_core.isra.0 (net/ipv4/ip_input.c:367 (discriminator 1))
[ 2257.453064] ip_rcv (net/ipv4/ip_input.c:448 ./include/linux/netfilter.h:304 ./include/linux/netfilter.h:298 net/ipv4/ip_input.c:569)
[ 2257.453151] ? ip_rcv_core (net/ipv4/ip_input.c:436)
[ 2257.453240] process_backlog (net/core/dev.c:6000)
[ 2257.453330] __napi_poll (net/core/dev.c:6559)
[ 2257.453420] net_rx_action (net/core/dev.c:6628 net/core/dev.c:6759)
[ 2257.453510] __do_softirq (./arch/x86/include/asm/preempt.h:27 kernel/softirq.c:564)
[ 2257.453599] irq_exit_rcu (kernel/softirq.c:436 kernel/softirq.c:641 kernel/softirq.c:653)
[ 2257.453689] sysvec_call_function_single (arch/x86/kernel/smp.c:262 (discriminator 69))
[ 2257.453781] asm_sysvec_call_function_single (./arch/x86/include/asm/idtentry.h:656)
[ 2257.453874] RIP: 0033:0x7f0cdc5928b2
[ 2257.453963] Code: 06 00 00 4c 89 65 88 49 83 fd 08 0f 84 f7 06 00 00 49 83 fd 26 0f 84 05 07 00 00 4d 85 ed 0f 84 5f 01 00 00 41 0f b6 44 24 04 <89> c6 40 c0 ee 04 0f 84 72 06 00 00 41 0f b6 54 24 05 83 e2 03 ff
All code
========
   0:	06                   	(bad)
   1:	00 00                	add    %al,(%rax)
   3:	4c 89 65 88          	mov    %r12,-0x78(%rbp)
   7:	49 83 fd 08          	cmp    $0x8,%r13
   b:	0f 84 f7 06 00 00    	je     0x708
  11:	49 83 fd 26          	cmp    $0x26,%r13
  15:	0f 84 05 07 00 00    	je     0x720
  1b:	4d 85 ed             	test   %r13,%r13
  1e:	0f 84 5f 01 00 00    	je     0x183
  24:	41 0f b6 44 24 04    	movzbl 0x4(%r12),%eax
  2a:*	89 c6                	mov    %eax,%esi		<-- trapping instruction
  2c:	40 c0 ee 04          	shr    $0x4,%sil
  30:	0f 84 72 06 00 00    	je     0x6a8
  36:	41 0f b6 54 24 05    	movzbl 0x5(%r12),%edx
  3c:	83 e2 03             	and    $0x3,%edx
  3f:	ff                   	.byte 0xff

Code starting with the faulting instruction
===========================================
   0:	89 c6                	mov    %eax,%esi
   2:	40 c0 ee 04          	shr    $0x4,%sil
   6:	0f 84 72 06 00 00    	je     0x67e
   c:	41 0f b6 54 24 05    	movzbl 0x5(%r12),%edx
  12:	83 e2 03             	and    $0x3,%edx
  15:	ff                   	.byte 0xff
[ 2257.454121] RSP: 002b:00007ffc04d3e890 EFLAGS: 00000206
[ 2257.454215] RAX: 0000000000000012 RBX: 00007f0cdc444db8 RCX: 00007f0cdc4e6e60
[ 2257.454329] RDX: 0000000000000009 RSI: 00007f0cdc57ef30 RDI: 00007f0cdc42c808
[ 2257.454442] RBP: 00007ffc04d3e9b0 R08: 00007f0cdc445028 R09: 00007ffc04d3e940
[ 2257.454555] R10: 00007f0cdbf00be8 R11: 0000000000000000 R12: 00007f0cdc42c898
[ 2257.454670] R13: 0000000000000006 R14: 0000000600000006 R15: 00007f0cdc581000
[ 2257.454784]  </TASK>
[ 2257.454869] ---[ end trace 0000000000000000 ]—


dmesg2 : 

[ 2567.167952] ------------[ cut here ]------------
[ 2567.168053] WARNING: CPU: 11 PID: 0 at lib/rcuref.c:294 rcuref_put_slowpath+0x2f/0x70
[ 2567.168175] Modules linked in: nft_limit nf_conntrack_netlink pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding ixgbe mdio i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
[ 2567.168445] CPU: 11 PID: 0 Comm: swapper/11 Tainted: G           O       6.6.9 #1
[ 2567.168561] Hardware name: Supermicro X10SRD-F/X10SRD-F, BIOS 3.4 06/05/2021
[ 2567.168675] RIP: 0010:rcuref_put_slowpath+0x2f/0x70
[ 2567.168767] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4c e3 00
[ 2567.168924] RSP: 0018:ffffaeaf80418d00 EFLAGS: 00010246
[ 2567.169017] RAX: 0000000000000000 RBX: ffff9fef84d6a940 RCX: 0000000000000074
[ 2567.169132] RDX: ffff9fefe2e30000 RSI: 0000000000000000 RDI: ffff9fef84d6a940
[ 2567.169246] RBP: ffff9fefe2e306c0 R08: 0000000000000000 R09: 0000000000029300
[ 2567.169359] R10: 0000000000029300 R11: ffffaeaf80418d90 R12: ffff9fef8aebe000
[ 2567.169473] R13: ffff9fef80896800 R14: ffff9fef85335200 R15: ffff9fef8ae07080
[ 2567.169586] FS:  0000000000000000(0000) GS:ffff9ff6dfcc0000(0000) knlGS:0000000000000000
[ 2567.169702] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2567.169795] CR2: 00007f4eaa7e6650 CR3: 0000000156dcd006 CR4: 00000000003706e0
[ 2567.169908] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2567.170022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2567.170137] Call Trace:
[ 2567.170224]  <IRQ>
[ 2567.170309]  ? __warn+0x6c/0x130
[ 2567.170399]  ? report_bug+0x1b8/0x200
[ 2567.170488]  ? handle_bug+0x36/0x70
[ 2567.170577]  ? exc_invalid_op+0x17/0x1a0
[ 2567.170667]  ? asm_exc_invalid_op+0x16/0x20
[ 2567.170758]  ? rcuref_put_slowpath+0x2f/0x70
[ 2567.170850]  dst_release+0x1c/0x40
[ 2567.170939]  __dev_queue_xmit+0x598/0xce0
[ 2567.171029]  vlan_dev_hard_start_xmit+0x82/0xc0
[ 2567.171122]  dev_hard_start_xmit+0x95/0xe0
[ 2567.171216]  __dev_queue_xmit+0x863/0xce0
[ 2567.171305]  ? eth_header+0x25/0xc0
[ 2567.171394]  ip_finish_output2+0x1a0/0x530
[ 2567.171485]  process_backlog+0x107/0x210
[ 2567.171575]  __napi_poll+0x20/0x180
[ 2567.171663]  net_rx_action+0x29f/0x380
[ 2567.171752]  ? rebalance_domains+0x14c/0x300
[ 2567.171843]  __do_softirq+0xd0/0x202
[ 2567.171932]  irq_exit_rcu+0x82/0xa0
[ 2567.172022]  common_interrupt+0x7a/0xa0
[ 2567.172111]  </IRQ>
[ 2567.172198]  <TASK>
[ 2567.172283]  asm_common_interrupt+0x22/0x40
[ 2567.172374] RIP: 0010:cpuidle_enter_state+0xa3/0x6a0
[ 2567.172467] Code: 46 40 40 0f 84 02 01 00 00 e8 c9 a0 70 ff e8 d4 f6 ff ff 31 ff 49 89 c6 e8 0a b9 6f ff 45 84 ff 0f 85 d9 00 00 00 fb 45 85 ed <0f> 88 b8 00 00 00 49 63 cd 48 8b 04 24 48 6b f1 68 49 29 c6 48 8d
[ 2567.172623] RSP: 0018:ffffaeaf80177e98 EFLAGS: 00000202
[ 2567.172715] RAX: ffff9ff6dfce3a80 RBX: ffff9fef81338000 RCX: 000000000000001f
[ 2567.172828] RDX: 00000255b721ed84 RSI: 00000000238e3b7a RDI: 0000000000000000
[ 2567.172942] RBP: ffffffffba216ea0 R08: 0000000000000004 R09: ffff9ff6dfcdef00
[ 2567.173055] R10: ffff9ff6dfcdef00 R11: 0000000000000007 R12: 0000000000000001
[ 2567.173168] R13: 0000000000000001 R14: 00000255b721ed84 R15: 0000000000000000
[ 2567.173283]  ? cpuidle_enter_state+0x96/0x6a0
[ 2567.173374]  cpuidle_enter+0x24/0x40
[ 2567.173464]  do_idle+0x1a7/0x210
[ 2567.173552]  cpu_startup_entry+0x21/0x30
[ 2567.173642]  start_secondary+0xe1/0xf0
[ 2567.173732]  secondary_startup_64_no_verify+0x178/0x17b
[ 2567.173825]  </TASK>
[ 2567.173910] ---[ end trace 0000000000000000 ]—


[ 2567.167952] ------------[ cut here ]------------
[ 2567.168053] WARNING: CPU: 11 PID: 0 at lib/rcuref.c:294 rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
[ 2567.168175] Modules linked in: nft_limit nf_conntrack_netlink pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding ixgbe mdio i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
[ 2567.168445] CPU: 11 PID: 0 Comm: swapper/11 Tainted: G           O       6.6.9 #1
[ 2567.168561] Hardware name: Supermicro X10SRD-F/X10SRD-F, BIOS 3.4 06/05/2021
[ 2567.168675] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
[ 2567.168767] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4c e3 00
All code
========
   0:	07                   	(bad)
   1:	83 f8 ff             	cmp    $0xffffffff,%eax
   4:	75 19                	jne    0x1f
   6:	ba 00 00 00 e0       	mov    $0xe0000000,%edx
   b:	f0 0f b1 17          	lock cmpxchg %edx,(%rdi)
   f:	83 f8 ff             	cmp    $0xffffffff,%eax
  12:	74 04                	je     0x18
  14:	31 c0                	xor    %eax,%eax
  16:	5b                   	pop    %rbx
  17:	c3                   	ret
  18:	b8 01 00 00 00       	mov    $0x1,%eax
  1d:	5b                   	pop    %rbx
  1e:	c3                   	ret
  1f:	3d ff ff ff bf       	cmp    $0xbfffffff,%eax
  24:	77 14                	ja     0x3a
  26:	85 c0                	test   %eax,%eax
  28:	78 06                	js     0x30
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	31 c0                	xor    %eax,%eax
  2e:	eb e6                	jmp    0x16
  30:	c7 07 00 00 00 a0    	movl   $0xa0000000,(%rdi)
  36:	31 c0                	xor    %eax,%eax
  38:	eb dc                	jmp    0x16
  3a:	80                   	.byte 0x80
  3b:	3d e2 4c e3 00       	cmp    $0xe34ce2,%eax

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	31 c0                	xor    %eax,%eax
   4:	eb e6                	jmp    0xffffffffffffffec
   6:	c7 07 00 00 00 a0    	movl   $0xa0000000,(%rdi)
   c:	31 c0                	xor    %eax,%eax
   e:	eb dc                	jmp    0xffffffffffffffec
  10:	80                   	.byte 0x80
  11:	3d e2 4c e3 00       	cmp    $0xe34ce2,%eax
[ 2567.168924] RSP: 0018:ffffaeaf80418d00 EFLAGS: 00010246
[ 2567.169017] RAX: 0000000000000000 RBX: ffff9fef84d6a940 RCX: 0000000000000074
[ 2567.169132] RDX: ffff9fefe2e30000 RSI: 0000000000000000 RDI: ffff9fef84d6a940
[ 2567.169246] RBP: ffff9fefe2e306c0 R08: 0000000000000000 R09: 0000000000029300
[ 2567.169359] R10: 0000000000029300 R11: ffffaeaf80418d90 R12: ffff9fef8aebe000
[ 2567.169473] R13: ffff9fef80896800 R14: ffff9fef85335200 R15: ffff9fef8ae07080
[ 2567.169586] FS:  0000000000000000(0000) GS:ffff9ff6dfcc0000(0000) knlGS:0000000000000000
[ 2567.169702] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2567.169795] CR2: 00007f4eaa7e6650 CR3: 0000000156dcd006 CR4: 00000000003706e0
[ 2567.169908] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2567.170022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2567.170137] Call Trace:
[ 2567.170224]  <IRQ>
[ 2567.170309] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
[ 2567.170399] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 2567.170488] ? handle_bug (arch/x86/kernel/traps.c:237)
[ 2567.170577] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
[ 2567.170667] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
[ 2567.170758] ? rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
[ 2567.170850] dst_release (net/core/dst.c:166 (discriminator 1))
[ 2567.170939] __dev_queue_xmit (./include/net/dst.h:283 net/core/dev.c:4327)
[ 2567.171029] vlan_dev_hard_start_xmit (net/8021q/vlan_dev.c:130)
[ 2567.171122] dev_hard_start_xmit (./include/linux/netdevice.h:4926 net/core/dev.c:3576 net/core/dev.c:3592)
[ 2567.171216] __dev_queue_xmit (./include/linux/netdevice.h:3300 (discriminator 25) net/core/dev.c:4373 (discriminator 25))
[ 2567.171305] ? eth_header (net/ethernet/eth.c:85)
[ 2567.171394] ip_finish_output2 (./include/net/neighbour.h:542 (discriminator 2) net/ipv4/ip_output.c:233 (discriminator 2))
[ 2567.171485] process_backlog (net/core/dev.c:6000)
[ 2567.171575] __napi_poll (net/core/dev.c:6559)
[ 2567.171663] net_rx_action (net/core/dev.c:6628 net/core/dev.c:6759)
[ 2567.171752] ? rebalance_domains (kernel/sched/fair.c:11719 kernel/sched/fair.c:11895)
[ 2567.171843] __do_softirq (./arch/x86/include/asm/preempt.h:27 kernel/softirq.c:564)
[ 2567.171932] irq_exit_rcu (kernel/softirq.c:436 kernel/softirq.c:641 kernel/softirq.c:653)
[ 2567.172022] common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 47))
[ 2567.172111]  </IRQ>
[ 2567.172198]  <TASK>
[ 2567.172283] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:640)
[ 2567.172374] RIP: 0010:cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291)
[ 2567.172467] Code: 46 40 40 0f 84 02 01 00 00 e8 c9 a0 70 ff e8 d4 f6 ff ff 31 ff 49 89 c6 e8 0a b9 6f ff 45 84 ff 0f 85 d9 00 00 00 fb 45 85 ed <0f> 88 b8 00 00 00 49 63 cd 48 8b 04 24 48 6b f1 68 49 29 c6 48 8d
All code
========
   0:	46                   	rex.RX
   1:	40                   	rex
   2:	40 0f 84 02 01 00 00 	rex je 0x10b
   9:	e8 c9 a0 70 ff       	call   0xffffffffff70a0d7
   e:	e8 d4 f6 ff ff       	call   0xfffffffffffff6e7
  13:	31 ff                	xor    %edi,%edi
  15:	49 89 c6             	mov    %rax,%r14
  18:	e8 0a b9 6f ff       	call   0xffffffffff6fb927
  1d:	45 84 ff             	test   %r15b,%r15b
  20:	0f 85 d9 00 00 00    	jne    0xff
  26:	fb                   	sti
  27:	45 85 ed             	test   %r13d,%r13d
  2a:*	0f 88 b8 00 00 00    	js     0xe8		<-- trapping instruction
  30:	49 63 cd             	movslq %r13d,%rcx
  33:	48 8b 04 24          	mov    (%rsp),%rax
  37:	48 6b f1 68          	imul   $0x68,%rcx,%rsi
  3b:	49 29 c6             	sub    %rax,%r14
  3e:	48                   	rex.W
  3f:	8d                   	.byte 0x8d

Code starting with the faulting instruction
===========================================
   0:	0f 88 b8 00 00 00    	js     0xbe
   6:	49 63 cd             	movslq %r13d,%rcx
   9:	48 8b 04 24          	mov    (%rsp),%rax
   d:	48 6b f1 68          	imul   $0x68,%rcx,%rsi
  11:	49 29 c6             	sub    %rax,%r14
  14:	48                   	rex.W
  15:	8d                   	.byte 0x8d
[ 2567.172623] RSP: 0018:ffffaeaf80177e98 EFLAGS: 00000202
[ 2567.172715] RAX: ffff9ff6dfce3a80 RBX: ffff9fef81338000 RCX: 000000000000001f
[ 2567.172828] RDX: 00000255b721ed84 RSI: 00000000238e3b7a RDI: 0000000000000000
[ 2567.172942] RBP: ffffffffba216ea0 R08: 0000000000000004 R09: ffff9ff6dfcdef00
[ 2567.173055] R10: ffff9ff6dfcdef00 R11: 0000000000000007 R12: 0000000000000001
[ 2567.173168] R13: 0000000000000001 R14: 00000255b721ed84 R15: 0000000000000000
[ 2567.173283] ? cpuidle_enter_state (drivers/cpuidle/cpuidle.c:285)
[ 2567.173374] cpuidle_enter (drivers/cpuidle/cpuidle.c:390 (discriminator 2))
[ 2567.173464] do_idle (kernel/sched/idle.c:134 kernel/sched/idle.c:215 kernel/sched/idle.c:282)
[ 2567.173552] cpu_startup_entry (kernel/sched/idle.c:379)
[ 2567.173642] start_secondary (arch/x86/kernel/smpboot.c:326)
[ 2567.173732] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:449)
[ 2567.173825]  </TASK>
[ 2567.173910] ---[ end trace 0000000000000000 ]—

best regards,
Martin


> On 29 Dec 2023, at 14:00, Martin Zaharinov <micron10@gmail.com> wrote:
> 
> Hi Thomas,
> 
> One more report from second machine:
> 
> [21299.954952] ------------[ cut here ]------------
> [21299.955047] WARNING: CPU: 15 PID: 0 at lib/rcuref.c:294 rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
> [21299.955153] Modules linked in: nft_limit nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp virtio_net net_failover failover virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring e1000e e1000 vmxnet3 i40e ixgbe mdio bnxt_en nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rtc_cmos
> [21299.955378] CPU: 15 PID: 0 Comm: swapper/15 Tainted: G           O       6.6.8 #1
> [21299.955475] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 02/09/2023
> [21299.955575] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
> [21299.955662] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4e e3 00
> All code
> ========
>   0: 07                    (bad)
>   1: 83 f8 ff              cmp    $0xffffffff,%eax
>   4: 75 19                 jne    0x1f
>   6: ba 00 00 00 e0        mov    $0xe0000000,%edx
>   b: f0 0f b1 17           lock cmpxchg %edx,(%rdi)
>   f: 83 f8 ff              cmp    $0xffffffff,%eax
>  12: 74 04                 je     0x18
>  14: 31 c0                 xor    %eax,%eax
>  16: 5b                    pop    %rbx
>  17: c3                    ret
>  18: b8 01 00 00 00        mov    $0x1,%eax
>  1d: 5b                    pop    %rbx
>  1e: c3                    ret
>  1f: 3d ff ff ff bf        cmp    $0xbfffffff,%eax
>  24: 77 14                 ja     0x3a
>  26: 85 c0                 test   %eax,%eax
>  28: 78 06                 js     0x30
>  2a:* 0f 0b                 ud2     <-- trapping instruction
>  2c: 31 c0                 xor    %eax,%eax
>  2e: eb e6                 jmp    0x16
>  30: c7 07 00 00 00 a0     movl   $0xa0000000,(%rdi)
>  36: 31 c0                 xor    %eax,%eax
>  38: eb dc                 jmp    0x16
>  3a: 80                    .byte 0x80
>  3b: 3d e2 4e e3 00        cmp    $0xe34ee2,%eax
> 
> Code starting with the faulting instruction
> ===========================================
>   0: 0f 0b                 ud2
>   2: 31 c0                 xor    %eax,%eax
>   4: eb e6                 jmp    0xffffffffffffffec
>   6: c7 07 00 00 00 a0     movl   $0xa0000000,(%rdi)
>   c: 31 c0                 xor    %eax,%eax
>   e: eb dc                 jmp    0xffffffffffffffec
>  10: 80                    .byte 0x80
>  11: 3d e2 4e e3 00        cmp    $0xe34ee2,%eax
> [21299.955793] RSP: 0018:ffff96a7c0578c30 EFLAGS: 00010246
> [21299.955879] RAX: 0000000000000000 RBX: ffff8b75d1e49a80 RCX: ffff8b75c6667c80
> [21299.955974] RDX: ffff8b84bfbe4f08 RSI: 00000000fffffe01 RDI: ffff8b75d1e49a80
> [21299.956070] RBP: ffff8b84bfbe4f08 R08: ffff8b84bfbe4f08 R09: 0000000000000001
> [21299.956167] R10: 0000000000028530 R11: 0000000000000001 R12: ffff8b75d1e49a40
> [21299.956261] R13: ffff8b75d1e49aa8 R14: ffff8b84bfbe4f08 R15: 00000000c26ab667
> [21299.956358] FS:  0000000000000000(0000) GS:ffff8b84bfbc0000(0000) knlGS:0000000000000000
> [21299.956457] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [21299.956540] CR2: 00007f2e185c73c8 CR3: 0000000950014003 CR4: 00000000003706e0
> [21299.956635] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [21299.956730] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [21299.956826] Call Trace:
> [21299.956905]  <IRQ>
> [21299.956983] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
> [21299.957065] ? report_bug (lib/bug.c:180 lib/bug.c:219)
> [21299.957147] ? handle_bug (arch/x86/kernel/traps.c:237)
> [21299.957228] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
> [21299.957308] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
> [21299.957393] ? rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
> [21299.957476] dst_release (net/core/dst.c:166 (discriminator 1))
> [21299.957559] rt_cache_route (net/ipv4/route.c:1499)
> [21299.957641] rt_set_nexthop.isra.0 (net/ipv4/route.c:1606 (discriminator 1))
> [21299.957722] ip_route_input_slow (./include/net/lwtunnel.h:140 net/ipv4/route.c:1875 net/ipv4/route.c:2154 net/ipv4/route.c:2337)
> [21299.957804] ? free_unref_page (./include/linux/list.h:150 (discriminator 1) ./include/linux/list.h:169 (discriminator 1) mm/page_alloc.c:2377 (discriminator 1) mm/page_alloc.c:2428 (discriminator 1))
> [21299.957889] ip_route_input_noref (net/ipv4/route.c:2499)
> [21299.957972] ip_rcv_finish_core.isra.0 (net/ipv4/ip_input.c:367 (discriminator 1))
> [21299.958058] ip_rcv (net/ipv4/ip_input.c:448 ./include/linux/netfilter.h:304 ./include/linux/netfilter.h:298 net/ipv4/ip_input.c:569)
> [21299.958139] ? ip_rcv_core (net/ipv4/ip_input.c:436)
> [21299.958220] process_backlog (net/core/dev.c:5997)
> [21299.958302] __napi_poll (net/core/dev.c:6556)
> [21299.958384] net_rx_action (net/core/dev.c:6625 net/core/dev.c:6756)
> [21299.958466] __do_softirq (./arch/x86/include/asm/preempt.h:27 kernel/softirq.c:564)
> [21299.958549] irq_exit_rcu (kernel/softirq.c:436 kernel/softirq.c:641 kernel/softirq.c:653)
> [21299.958631] sysvec_call_function_single (arch/x86/kernel/smp.c:262 (discriminator 47))
> [21299.958714]  </IRQ>
> [21299.958792]  <TASK>
> [21299.958869] asm_sysvec_call_function_single (./arch/x86/include/asm/idtentry.h:656)
> [21299.958953] RIP: 0010:acpi_safe_halt (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:72 drivers/acpi/processor_idle.c:113)
> [21299.959038] Code: ed c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 65 48 8b 04 25 40 32 02 00 48 8b 00 a8 08 75 0c eb 07 0f 00 2d 57 0f 2c 00 fb f4 <fa> c3 0f 1f 00 0f b6 47 08 3c 01 74 0b 3c 02 74 05 8b 7f 04 eb 9f
> All code
> ========
>   0: ed                    in     (%dx),%eax
>   1: c3                    ret
>   2: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
>   9: 00 00 00 00
>   d: 66 90                 xchg   %ax,%ax
>   f: 65 48 8b 04 25 40 32 mov    %gs:0x23240,%rax
>  16: 02 00
>  18: 48 8b 00              mov    (%rax),%rax
>  1b: a8 08                 test   $0x8,%al
>  1d: 75 0c                 jne    0x2b
>  1f: eb 07                 jmp    0x28
>  21: 0f 00 2d 57 0f 2c 00 verw   0x2c0f57(%rip)        # 0x2c0f7f
>  28: fb                    sti
>  29: f4                    hlt
>  2a:* fa                    cli     <-- trapping instruction
>  2b: c3                    ret
>  2c: 0f 1f 00              nopl   (%rax)
>  2f: 0f b6 47 08           movzbl 0x8(%rdi),%eax
>  33: 3c 01                 cmp    $0x1,%al
>  35: 74 0b                 je     0x42
>  37: 3c 02                 cmp    $0x2,%al
>  39: 74 05                 je     0x40
>  3b: 8b 7f 04              mov    0x4(%rdi),%edi
>  3e: eb 9f                 jmp    0xffffffffffffffdf
> 
> Code starting with the faulting instruction
> ===========================================
>   0: fa                    cli
>   1: c3                    ret
>   2: 0f 1f 00              nopl   (%rax)
>   5: 0f b6 47 08           movzbl 0x8(%rdi),%eax
>   9: 3c 01                 cmp    $0x1,%al
>   b: 74 0b                 je     0x18
>   d: 3c 02                 cmp    $0x2,%al
>   f: 74 05                 je     0x16
>  11: 8b 7f 04              mov    0x4(%rdi),%edi
>  14: eb 9f                 jmp    0xffffffffffffffb5
> [21299.959162] RSP: 0018:ffff96a7c015be80 EFLAGS: 00000246
> [21299.959247] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
> [21299.959343] RDX: ffff8b84bfbc0000 RSI: ffff8b75c76ba000 RDI: ffff8b75c76ba064
> [21299.959437] RBP: ffffffffae216ea0 R08: ffffffffae216ea0 R09: 0000000000000003
> [21299.959533] R10: 0000000000000002 R11: 0000000000000008 R12: 0000000000000001
> [21299.959630] R13: ffffffffae216f08 R14: ffffffffae216f20 R15: 0000000000000000
> [21299.959725] acpi_idle_enter (drivers/acpi/processor_idle.c:709)
> [21299.959807] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:267)
> [21299.959890] cpuidle_enter (drivers/cpuidle/cpuidle.c:390 (discriminator 2))
> [21299.959975] do_idle (kernel/sched/idle.c:134 kernel/sched/idle.c:215 kernel/sched/idle.c:282)
> [21299.960058] cpu_startup_entry (kernel/sched/idle.c:379)
> [21299.960140] start_secondary (arch/x86/kernel/smpboot.c:326)
> [21299.960223] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
> [21299.960306]  </TASK>
> [21299.960384] ---[ end trace 0000000000000000 ]---
> 
>> On 22 Dec 2023, at 19:26, Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> Hi Thomas,
>> 
>> this is with applyed patch from you.
>> See logs
>> 
>> 
>> [43040.198064] ------------[ cut here ]------------
>> [43040.198407] WARNING: CPU: 47 PID: 0 at lib/rcuref.c:294 rcuref_put_slowpath+0x2f/0x70
>> [43040.198685] Modules linked in: pppoe pppox ppp_generic slhc nft_limit nft_ct nft_nat nft_chain_nat nf_tables netconsole tg3 igb i2c_algo_bit e1000e bnxt_en mlx5_core mlxfw mlx4_en mlx4_core i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_devintf ipmi_msghandler rtc_cmos
>> [43040.199478] CPU: 47 PID: 0 Comm: swapper/47 Tainted: G           O       6.6.8 #1
>> [43040.199660] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
>> [43040.199886] RIP: 0010:rcuref_put_slowpath+0x2f/0x70
>> [43040.200028] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4e e3 00
>> [43040.200387] RSP: 0018:ffffa39d83e88c30 EFLAGS: 00010246
>> [43040.200528] RAX: 0000000000000000 RBX: ffff9c58e966b840 RCX: ffff9c5bc4e35680
>> [43040.200700] RDX: ffff9c5fafde4f08 RSI: 00000000fffffe01 RDI: ffff9c58e966b840
>> [43040.200871] RBP: ffff9c5fafde4f08 R08: ffff9c5fafde4f08 R09: 0000000000000001
>> [43040.201044] R10: 00000000000286e0 R11: 0000000000000001 R12: ffff9c58e966b800
>> [43040.201255] R13: ffff9c58e966b868 R14: ffff9c5fafde4f08 R15: 000000008f5de42b
>> [43040.201439] FS:  0000000000000000(0000) GS:ffff9c5fafdc0000(0000) knlGS:0000000000000000
>> [43040.201642] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [43040.201799] CR2: 00007f1401217714 CR3: 0000000464b94003 CR4: 00000000001706e0
>> [43040.201994] Call Trace:
>> [43040.202095]  <IRQ>
>> [43040.202187]  ? __warn+0x6c/0x130
>> [43040.202301]  ? report_bug+0x1b8/0x200
>> [43040.202418]  ? handle_bug+0x36/0x70
>> [43040.202534]  ? exc_invalid_op+0x17/0x1a0
>> [43040.202652]  ? asm_exc_invalid_op+0x16/0x20
>> [43040.202781]  ? rcuref_put_slowpath+0x2f/0x70
>> [43040.202909]  dst_release+0x1c/0x40
>> [43040.203026]  rt_cache_route+0xbd/0xf0
>> [43040.203143]  rt_set_nexthop.isra.0+0x1b6/0x450
>> [43040.203272]  ip_route_input_slow+0x5d9/0xcc0
>> [43040.203401]  ? nf_conntrack_udp_packet+0x17c/0x240 [nf_conntrack]
>> [43040.203581]  ip_route_input_noref+0xe0/0xf0
>> [43040.203704]  ip_rcv_finish_core.isra.0+0xbb/0x440
>> [43040.203855]  ip_rcv+0xd5/0x110
>> [43040.203962]  ? ip_rcv_core+0x360/0x360
>> [43040.204079]  process_backlog+0x107/0x210
>> [43040.204201]  __napi_poll+0x20/0x180
>> [43040.204315]  net_rx_action+0x29f/0x380
>> [43040.204432]  __do_softirq+0xd0/0x202
>> [43040.204549]  irq_exit_rcu+0x82/0xa0
>> [43040.204667]  common_interrupt+0x7a/0xa0
>> [43040.204786]  </IRQ>
>> [43040.204876]  <TASK>
>> [43040.204965]  asm_common_interrupt+0x22/0x40
>> [43040.205090] RIP: 0010:acpi_safe_halt+0x1b/0x20
>> [43040.205220] Code: ed c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 65 48 8b 04 25 40 32 02 00 48 8b 00 a8 08 75 0c eb 07 0f 00 2d 57 0f 2c 00 fb f4 <fa> c3 0f 1f 00 0f b6 47 08 3c 01 74 0b 3c 02 74 05 8b 7f 04 eb 9f
>> [43040.205578] RSP: 0018:ffffa39d8234fe80 EFLAGS: 00000246
>> [43040.205718] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
>> [43040.205890] RDX: ffff9c5fafdc0000 RSI: ffff9c5882e95800 RDI: ffff9c5882e95864
>> [43040.206063] RBP: ffffffffa9216ea0 R08: ffffffffa9216ea0 R09: 0000000000000003
>> [43040.206246] R10: 0000000000000002 R11: 0000000000000008 R12: 0000000000000001
>> [43040.206419] R13: ffffffffa9216f08 R14: ffffffffa9216f20 R15: 0000000000000000
>> [43040.206593]  acpi_idle_enter+0x77/0xc0
>> [43040.206711]  cpuidle_enter_state+0x69/0x6a0
>> [43040.206835]  cpuidle_enter+0x24/0x40
>> [43040.206954]  do_idle+0x1a7/0x210
>> [43040.207066]  cpu_startup_entry+0x21/0x30
>> [43040.207188]  start_secondary+0xe1/0xf0
>> [43040.207310]  secondary_startup_64_no_verify+0x166/0x16b
>> [43040.207451]  </TASK>
>> [43040.207542] ---[ end trace 0000000000000000 ]---
>> 
>> 
>> 
>> [43040.198064] ------------[ cut here ]------------
>> [43040.198407] WARNING: CPU: 47 PID: 0 at lib/rcuref.c:294 rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
>> [43040.198685] Modules linked in: pppoe pppox ppp_generic slhc nft_limit nft_ct nft_nat nft_chain_nat nf_tables netconsole tg3 igb i2c_algo_bit e1000e bnxt_en mlx5_core mlxfw mlx4_en mlx4_core i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_devintf ipmi_msghandler rtc_cmos
>> [43040.199478] CPU: 47 PID: 0 Comm: swapper/47 Tainted: G           O       6.6.8 #1
>> [43040.199660] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
>> [43040.199886] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
>> [43040.200028] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4e e3 00
>> All code
>> ========
>>  0: 07                    (bad)
>>  1: 83 f8 ff              cmp    $0xffffffff,%eax
>>  4: 75 19                 jne    0x1f
>>  6: ba 00 00 00 e0        mov    $0xe0000000,%edx
>>  b: f0 0f b1 17           lock cmpxchg %edx,(%rdi)
>>  f: 83 f8 ff              cmp    $0xffffffff,%eax
>> 12: 74 04                 je     0x18
>> 14: 31 c0                 xor    %eax,%eax
>> 16: 5b                    pop    %rbx
>> 17: c3                    ret
>> 18: b8 01 00 00 00        mov    $0x1,%eax
>> 1d: 5b                    pop    %rbx
>> 1e: c3                    ret
>> 1f: 3d ff ff ff bf        cmp    $0xbfffffff,%eax
>> 24: 77 14                 ja     0x3a
>> 26: 85 c0                 test   %eax,%eax
>> 28: 78 06                 js     0x30
>> 2a:* 0f 0b                 ud2     <-- trapping instruction
>> 2c: 31 c0                 xor    %eax,%eax
>> 2e: eb e6                 jmp    0x16
>> 30: c7 07 00 00 00 a0     movl   $0xa0000000,(%rdi)
>> 36: 31 c0                 xor    %eax,%eax
>> 38: eb dc                 jmp    0x16
>> 3a: 80                    .byte 0x80
>> 3b: 3d e2 4e e3 00        cmp    $0xe34ee2,%eax
>> 
>> Code starting with the faulting instruction
>> ===========================================
>>  0: 0f 0b                 ud2
>>  2: 31 c0                 xor    %eax,%eax
>>  4: eb e6                 jmp    0xffffffffffffffec
>>  6: c7 07 00 00 00 a0     movl   $0xa0000000,(%rdi)
>>  c: 31 c0                 xor    %eax,%eax
>>  e: eb dc                 jmp    0xffffffffffffffec
>> 10: 80                    .byte 0x80
>> 11: 3d e2 4e e3 00        cmp    $0xe34ee2,%eax
>> [43040.200387] RSP: 0018:ffffa39d83e88c30 EFLAGS: 00010246
>> [43040.200528] RAX: 0000000000000000 RBX: ffff9c58e966b840 RCX: ffff9c5bc4e35680
>> [43040.200700] RDX: ffff9c5fafde4f08 RSI: 00000000fffffe01 RDI: ffff9c58e966b840
>> [43040.200871] RBP: ffff9c5fafde4f08 R08: ffff9c5fafde4f08 R09: 0000000000000001
>> [43040.201044] R10: 00000000000286e0 R11: 0000000000000001 R12: ffff9c58e966b800
>> [43040.201255] R13: ffff9c58e966b868 R14: ffff9c5fafde4f08 R15: 000000008f5de42b
>> [43040.201439] FS:  0000000000000000(0000) GS:ffff9c5fafdc0000(0000) knlGS:0000000000000000
>> [43040.201642] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [43040.201799] CR2: 00007f1401217714 CR3: 0000000464b94003 CR4: 00000000001706e0
>> [43040.201994] Call Trace:
>> [43040.202095]  <IRQ>
>> [43040.202187] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
>> [43040.202301] ? report_bug (lib/bug.c:180 lib/bug.c:219)
>> [43040.202418] ? handle_bug (arch/x86/kernel/traps.c:237)
>> [43040.202534] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
>> [43040.202652] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
>> [43040.202781] ? rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
>> [43040.202909] dst_release (net/core/dst.c:166 (discriminator 1))
>> [43040.203026] rt_cache_route (net/ipv4/route.c:1499)
>> [43040.203143] rt_set_nexthop.isra.0 (net/ipv4/route.c:1606 (discriminator 1))
>> [43040.203272] ip_route_input_slow (./include/net/lwtunnel.h:140 net/ipv4/route.c:1875 net/ipv4/route.c:2154 net/ipv4/route.c:2337)
>> [43040.203401] ? nf_conntrack_udp_packet (net/netfilter/nf_conntrack_proto_udp.c:124) nf_conntrack
>> [43040.203581] ip_route_input_noref (net/ipv4/route.c:2499)
>> [43040.203704] ip_rcv_finish_core.isra.0 (net/ipv4/ip_input.c:367 (discriminator 1))
>> [43040.203855] ip_rcv (net/ipv4/ip_input.c:448 ./include/linux/netfilter.h:304 ./include/linux/netfilter.h:298 net/ipv4/ip_input.c:569)
>> [43040.203962] ? ip_rcv_core (net/ipv4/ip_input.c:436)
>> [43040.204079] process_backlog (net/core/dev.c:5997)
>> [43040.204201] __napi_poll (net/core/dev.c:6556)
>> [43040.204315] net_rx_action (net/core/dev.c:6625 net/core/dev.c:6756)
>> [43040.204432] __do_softirq (./arch/x86/include/asm/preempt.h:27 kernel/softirq.c:564)
>> [43040.204549] irq_exit_rcu (kernel/softirq.c:436 kernel/softirq.c:641 kernel/softirq.c:653)
>> [43040.204667] common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 47))
>> [43040.204786]  </IRQ>
>> [43040.204876]  <TASK>
>> [43040.204965] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:640)
>> [43040.205090] RIP: 0010:acpi_safe_halt (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:72 drivers/acpi/processor_idle.c:113)
>> [43040.205220] Code: ed c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 65 48 8b 04 25 40 32 02 00 48 8b 00 a8 08 75 0c eb 07 0f 00 2d 57 0f 2c 00 fb f4 <fa> c3 0f 1f 00 0f b6 47 08 3c 01 74 0b 3c 02 74 05 8b 7f 04 eb 9f
>> All code
>> ========
>>  0: ed                    in     (%dx),%eax
>>  1: c3                    ret
>>  2: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
>>  9: 00 00 00 00
>>  d: 66 90                 xchg   %ax,%ax
>>  f: 65 48 8b 04 25 40 32 mov    %gs:0x23240,%rax
>> 16: 02 00
>> 18: 48 8b 00              mov    (%rax),%rax
>> 1b: a8 08                 test   $0x8,%al
>> 1d: 75 0c                 jne    0x2b
>> 1f: eb 07                 jmp    0x28
>> 21: 0f 00 2d 57 0f 2c 00 verw   0x2c0f57(%rip)        # 0x2c0f7f
>> 28: fb                    sti
>> 29: f4                    hlt
>> 2a:* fa                    cli     <-- trapping instruction
>> 2b: c3                    ret
>> 2c: 0f 1f 00              nopl   (%rax)
>> 2f: 0f b6 47 08           movzbl 0x8(%rdi),%eax
>> 33: 3c 01                 cmp    $0x1,%al
>> 35: 74 0b                 je     0x42
>> 37: 3c 02                 cmp    $0x2,%al
>> 39: 74 05                 je     0x40
>> 3b: 8b 7f 04              mov    0x4(%rdi),%edi
>> 3e: eb 9f                 jmp    0xffffffffffffffdf
>> 
>> Code starting with the faulting instruction
>> ===========================================
>>  0: fa                    cli
>>  1: c3                    ret
>>  2: 0f 1f 00              nopl   (%rax)
>>  5: 0f b6 47 08           movzbl 0x8(%rdi),%eax
>>  9: 3c 01                 cmp    $0x1,%al
>>  b: 74 0b                 je     0x18
>>  d: 3c 02                 cmp    $0x2,%al
>>  f: 74 05                 je     0x16
>> 11: 8b 7f 04              mov    0x4(%rdi),%edi
>> 14: eb 9f                 jmp    0xffffffffffffffb5
>> [43040.205578] RSP: 0018:ffffa39d8234fe80 EFLAGS: 00000246
>> [43040.205718] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
>> [43040.205890] RDX: ffff9c5fafdc0000 RSI: ffff9c5882e95800 RDI: ffff9c5882e95864
>> [43040.206063] RBP: ffffffffa9216ea0 R08: ffffffffa9216ea0 R09: 0000000000000003
>> [43040.206246] R10: 0000000000000002 R11: 0000000000000008 R12: 0000000000000001
>> [43040.206419] R13: ffffffffa9216f08 R14: ffffffffa9216f20 R15: 0000000000000000
>> [43040.206593] acpi_idle_enter (drivers/acpi/processor_idle.c:709)
>> [43040.206711] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:267)
>> [43040.206835] cpuidle_enter (drivers/cpuidle/cpuidle.c:390 (discriminator 2))
>> [43040.206954] do_idle (kernel/sched/idle.c:134 kernel/sched/idle.c:215 kernel/sched/idle.c:282)
>> [43040.207066] cpu_startup_entry (kernel/sched/idle.c:379)
>> [43040.207188] start_secondary (arch/x86/kernel/smpboot.c:326)
>> [43040.207310] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
>> [43040.207451]  </TASK>
>> [43040.207542] ---[ end trace 0000000000000000 ]---
>> 
>>> On 19 Dec 2023, at 16:26, Thomas Gleixner <tglx@linutronix.de> wrote:
>>> 
>>> On Tue, Dec 19 2023 at 11:25, Martin Zaharinov wrote:
>>>>> On 12 Dec 2023, at 20:16, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>>> Btw, how easy is this to reproduce?
>>>> 
>>>> Its not easy this report is generate on machine with 5-6k users , with
>>>> traffic and one time is show on 1 day , other show after 4-5 days…
>>> 
>>> I love those bugs ...
>>> 
>>>> Apply this patch and will upload image on one machine as fast as
>>>> possible and when get any reports will send you.
>>> 
>>> Let's see how that goes!
>>> 
>>> Thanks,
>>> 
>>>      tglx
>> 
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Urgent Bug Report Kernel crash 6.5.2
  2024-01-04 20:51                       ` Martin Zaharinov
@ 2024-01-07 11:03                         ` Martin Zaharinov
  0 siblings, 0 replies; 35+ messages in thread
From: Martin Zaharinov @ 2024-01-07 11:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: peterz, netdev, Paolo Abeni, patchwork-bot+netdevbpf,
	Jakub Kicinski, Stephen Hemminger, kuba+netdrv, dsahern,
	Eric Dumazet

Hi Thomas 

this is one more report from one machine 

Here you will see have to bug report in same day:


[Sat Jan  6 07:37:23 2024] ------------[ cut here ]------------
[Sat Jan 6 07:37:23 2024] WARNING: CPU: 12 PID: 0 at lib/rcuref.c:294 rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
[Sat Jan  6 07:37:23 2024] Modules linked in:  pppoe pppox ppp_generic slhc nft_limit nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding mlx5_core mlxfw mlx4_en mlx4_core i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
[Sat Jan  6 07:37:23 2024] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G           O       6.6.10 #1
[Sat Jan  6 07:37:23 2024] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 7.80 10/28/2020
[Sat Jan 6 07:37:23 2024] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
[Sat Jan 6 07:37:23 2024] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d b2 4a e3 00
All code
========
   0:	07                   	(bad)
   1:	83 f8 ff             	cmp    $0xffffffff,%eax
   4:	75 19                	jne    0x1f
   6:	ba 00 00 00 e0       	mov    $0xe0000000,%edx
   b:	f0 0f b1 17          	lock cmpxchg %edx,(%rdi)
   f:	83 f8 ff             	cmp    $0xffffffff,%eax
  12:	74 04                	je     0x18
  14:	31 c0                	xor    %eax,%eax
  16:	5b                   	pop    %rbx
  17:	c3                   	ret
  18:	b8 01 00 00 00       	mov    $0x1,%eax
  1d:	5b                   	pop    %rbx
  1e:	c3                   	ret
  1f:	3d ff ff ff bf       	cmp    $0xbfffffff,%eax
  24:	77 14                	ja     0x3a
  26:	85 c0                	test   %eax,%eax
  28:	78 06                	js     0x30
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	31 c0                	xor    %eax,%eax
  2e:	eb e6                	jmp    0x16
  30:	c7 07 00 00 00 a0    	movl   $0xa0000000,(%rdi)
  36:	31 c0                	xor    %eax,%eax
  38:	eb dc                	jmp    0x16
  3a:	80                   	.byte 0x80
  3b:	3d b2 4a e3 00       	cmp    $0xe34ab2,%eax

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	31 c0                	xor    %eax,%eax
   4:	eb e6                	jmp    0xffffffffffffffec
   6:	c7 07 00 00 00 a0    	movl   $0xa0000000,(%rdi)
   c:	31 c0                	xor    %eax,%eax
   e:	eb dc                	jmp    0xffffffffffffffec
  10:	80                   	.byte 0x80
  11:	3d b2 4a e3 00       	cmp    $0xe34ab2,%eax
[Sat Jan  6 07:37:23 2024] RSP: 0018:ffffa773091ccdd8 EFLAGS: 00010246
[Sat Jan  6 07:37:23 2024] RAX: 0000000000000000 RBX: ffff8f458c192d00 RCX: 0000000000000042
[Sat Jan  6 07:37:23 2024] RDX: ffff8f455ad71800 RSI: 0000000000000000 RDI: ffff8f458c192d00
[Sat Jan  6 07:37:23 2024] RBP: ffff8f455ad71ec0 R08: 0000000000000000 R09: 0000000000000000
[Sat Jan  6 07:37:23 2024] R10: 0000000000000002 R11: ffffa773091ccd90 R12: ffff8f25c68df800
[Sat Jan  6 07:37:23 2024] R13: 000000000000000e R14: 0000000000000010 R15: ffff8f64bf8a4d10
[Sat Jan  6 07:37:23 2024] FS:  0000000000000000(0000) GS:ffff8f64bf880000(0000) knlGS:0000000000000000
[Sat Jan  6 07:37:23 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Jan  6 07:37:23 2024] CR2: 00007fbd91318650 CR3: 000000177e014005 CR4: 00000000003706e0
[Sat Jan  6 07:37:23 2024] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Sat Jan  6 07:37:23 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Sat Jan  6 07:37:23 2024] Call Trace:
[Sat Jan  6 07:37:23 2024]  <IRQ>
[Sat Jan 6 07:37:23 2024] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
[Sat Jan 6 07:37:23 2024] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[Sat Jan 6 07:37:23 2024] ? handle_bug (arch/x86/kernel/traps.c:237)
[Sat Jan 6 07:37:23 2024] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
[Sat Jan 6 07:37:23 2024] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
[Sat Jan 6 07:37:23 2024] ? rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
[Sat Jan 6 07:37:23 2024] dst_release (net/core/dst.c:166 (discriminator 1))
[Sat Jan 6 07:37:23 2024] __dev_queue_xmit (./include/net/dst.h:283 net/core/dev.c:4327)
[Sat Jan 6 07:37:23 2024] ? nf_hook_slow (./include/linux/netfilter.h:144 net/netfilter/core.c:626)
[Sat Jan 6 07:37:23 2024] ip_finish_output2 (./include/net/neighbour.h:526 ./include/net/neighbour.h:540 net/ipv4/ip_output.c:233)
[Sat Jan 6 07:37:23 2024] process_backlog (net/core/dev.c:6000)
[Sat Jan 6 07:37:23 2024] __napi_poll (net/core/dev.c:6559)
[Sat Jan 6 07:37:23 2024] net_rx_action (net/core/dev.c:6628 net/core/dev.c:6759)
[Sat Jan 6 07:37:23 2024] __do_softirq (./arch/x86/include/asm/preempt.h:27 kernel/softirq.c:564)
[Sat Jan 6 07:37:23 2024] irq_exit_rcu (kernel/softirq.c:436 kernel/softirq.c:641 kernel/softirq.c:653)
[Sat Jan 6 07:37:23 2024] sysvec_call_function_single (arch/x86/kernel/smp.c:262 (discriminator 47))
[Sat Jan  6 07:37:23 2024]  </IRQ>
[Sat Jan  6 07:37:23 2024]  <TASK>
[Sat Jan 6 07:37:23 2024] asm_sysvec_call_function_single (./arch/x86/include/asm/idtentry.h:656)
[Sat Jan 6 07:37:23 2024] RIP: 0010:acpi_safe_halt (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:72 drivers/acpi/processor_idle.c:113)
[Sat Jan 6 07:37:23 2024] Code: ed c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 65 48 8b 04 25 40 32 02 00 48 8b 00 a8 08 75 0c eb 07 0f 00 2d c7 0c 2c 00 fb f4 <fa> c3 0f 1f 00 0f b6 47 08 3c 01 74 0b 3c 02 74 05 8b 7f 04 eb 9f
All code
========
   0:	ed                   	in     (%dx),%eax
   1:	c3                   	ret
   2:	66 66 2e 0f 1f 84 00 	data16 cs nopw 0x0(%rax,%rax,1)
   9:	00 00 00 00
   d:	66 90                	xchg   %ax,%ax
   f:	65 48 8b 04 25 40 32 	mov    %gs:0x23240,%rax
  16:	02 00
  18:	48 8b 00             	mov    (%rax),%rax
  1b:	a8 08                	test   $0x8,%al
  1d:	75 0c                	jne    0x2b
  1f:	eb 07                	jmp    0x28
  21:	0f 00 2d c7 0c 2c 00 	verw   0x2c0cc7(%rip)        # 0x2c0cef
  28:	fb                   	sti
  29:	f4                   	hlt
  2a:*	fa                   	cli    		<-- trapping instruction
  2b:	c3                   	ret
  2c:	0f 1f 00             	nopl   (%rax)
  2f:	0f b6 47 08          	movzbl 0x8(%rdi),%eax
  33:	3c 01                	cmp    $0x1,%al
  35:	74 0b                	je     0x42
  37:	3c 02                	cmp    $0x2,%al
  39:	74 05                	je     0x40
  3b:	8b 7f 04             	mov    0x4(%rdi),%edi
  3e:	eb 9f                	jmp    0xffffffffffffffdf

Code starting with the faulting instruction
===========================================
   0:	fa                   	cli
   1:	c3                   	ret
   2:	0f 1f 00             	nopl   (%rax)
   5:	0f b6 47 08          	movzbl 0x8(%rdi),%eax
   9:	3c 01                	cmp    $0x1,%al
   b:	74 0b                	je     0x18
   d:	3c 02                	cmp    $0x2,%al
   f:	74 05                	je     0x16
  11:	8b 7f 04             	mov    0x4(%rdi),%edi
  14:	eb 9f                	jmp    0xffffffffffffffb5
[Sat Jan  6 07:37:23 2024] RSP: 0018:ffffa773007fbe80 EFLAGS: 00000246
[Sat Jan  6 07:37:23 2024] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
[Sat Jan  6 07:37:23 2024] RDX: ffff8f64bf880000 RSI: ffff8f454c6f6800 RDI: ffff8f454c6f6864
[Sat Jan  6 07:37:23 2024] RBP: ffffffffaa216ea0 R08: ffffffffaa216ea0 R09: 0000000000000003
[Sat Jan  6 07:37:23 2024] R10: 0000000000000002 R11: 0000000000000007 R12: 0000000000000001
[Sat Jan  6 07:37:23 2024] R13: ffffffffaa216f08 R14: ffffffffaa216f20 R15: 0000000000000000
[Sat Jan 6 07:37:23 2024] acpi_idle_enter (drivers/acpi/processor_idle.c:709)
[Sat Jan 6 07:37:23 2024] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:267)
[Sat Jan 6 07:37:23 2024] cpuidle_enter (drivers/cpuidle/cpuidle.c:390 (discriminator 2))
[Sat Jan 6 07:37:23 2024] do_idle (kernel/sched/idle.c:134 kernel/sched/idle.c:215 kernel/sched/idle.c:282)
[Sat Jan 6 07:37:23 2024] cpu_startup_entry (kernel/sched/idle.c:379)
[Sat Jan 6 07:37:23 2024] start_secondary (arch/x86/kernel/smpboot.c:326)
[Sat Jan 6 07:37:23 2024] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:449)
[Sat Jan  6 07:37:23 2024]  </TASK>
[Sat Jan  6 07:37:23 2024] ---[ end trace 0000000000000000 ]---
[Sat Jan  6 21:33:28 2024] ------------[ cut here ]------------
[Sat Jan  6 21:33:28 2024] rcuref - imbalanced put()
[Sat Jan 6 21:33:28 2024] WARNING: CPU: 26 PID: 0 at lib/rcuref.c:279 rcuref_put_slowpath (lib/rcuref.c:279 (discriminator 1))
[Sat Jan  6 21:33:28 2024] Modules linked in: pppoe pppox ppp_generic slhc nft_limit nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding mlx5_core mlxfw mlx4_en mlx4_core i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
[Sat Jan  6 21:33:28 2024] CPU: 26 PID: 0 Comm: swapper/26 Tainted: G        W  O       6.6.10 #1
[Sat Jan  6 21:33:28 2024] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 7.80 10/28/2020
[Sat Jan 6 21:33:28 2024] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:279 (discriminator 1))
[Sat Jan 6 21:33:28 2024] Code: 31 c0 eb dc 80 3d b2 4a e3 00 00 74 0a c7 03 00 00 00 e0 31 c0 eb c9 48 c7 c7 54 29 e4 a9 c6 05 98 4a e3 00 01 e8 db 7c c3 ff <0f> 0b eb df cc cc cc cc cc cc cc 48 89 fa 83 e2 07 48 85 f6 74 7f
All code
========
   0:	31 c0                	xor    %eax,%eax
   2:	eb dc                	jmp    0xffffffffffffffe0
   4:	80 3d b2 4a e3 00 00 	cmpb   $0x0,0xe34ab2(%rip)        # 0xe34abd
   b:	74 0a                	je     0x17
   d:	c7 03 00 00 00 e0    	movl   $0xe0000000,(%rbx)
  13:	31 c0                	xor    %eax,%eax
  15:	eb c9                	jmp    0xffffffffffffffe0
  17:	48 c7 c7 54 29 e4 a9 	mov    $0xffffffffa9e42954,%rdi
  1e:	c6 05 98 4a e3 00 01 	movb   $0x1,0xe34a98(%rip)        # 0xe34abd
  25:	e8 db 7c c3 ff       	call   0xffffffffffc37d05
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	eb df                	jmp    0xd
  2e:	cc                   	int3
  2f:	cc                   	int3
  30:	cc                   	int3
  31:	cc                   	int3
  32:	cc                   	int3
  33:	cc                   	int3
  34:	cc                   	int3
  35:	48 89 fa             	mov    %rdi,%rdx
  38:	83 e2 07             	and    $0x7,%edx
  3b:	48 85 f6             	test   %rsi,%rsi
  3e:	74 7f                	je     0xbf

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	eb df                	jmp    0xffffffffffffffe3
   4:	cc                   	int3
   5:	cc                   	int3
   6:	cc                   	int3
   7:	cc                   	int3
   8:	cc                   	int3
   9:	cc                   	int3
   a:	cc                   	int3
   b:	48 89 fa             	mov    %rdi,%rdx
   e:	83 e2 07             	and    $0x7,%edx
  11:	48 85 f6             	test   %rsi,%rsi
  14:	74 7f                	je     0x95
[Sat Jan  6 21:33:28 2024] RSP: 0018:ffffa7730d528dd8 EFLAGS: 00010292
[Sat Jan  6 21:33:28 2024] RAX: 0000000000000019 RBX: ffff8f4573f7d000 RCX: 00000000ffefffff
[Sat Jan  6 21:33:28 2024] RDX: 00000000ffefffff RSI: 0000000000000001 RDI: 00000000ffffffea
[Sat Jan  6 21:33:28 2024] RBP: ffff8f265f02d6c0 R08: 0000000000000000 R09: 00000000ffefffff
[Sat Jan  6 21:33:28 2024] R10: ffff8f64b6800000 R11: 0000000000000003 R12: ffff8f25c68df800
[Sat Jan  6 21:33:28 2024] R13: 000000000000000e R14: 0000000000000010 R15: ffff8f44c0024d10
[Sat Jan  6 21:33:28 2024] FS:  0000000000000000(0000) GS:ffff8f44c0000000(0000) knlGS:0000000000000000
[Sat Jan  6 21:33:28 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Jan  6 21:33:28 2024] CR2: 00007fd79aca5000 CR3: 000000015d226005 CR4: 00000000003706e0
[Sat Jan  6 21:33:28 2024] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Sat Jan  6 21:33:28 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Sat Jan  6 21:33:28 2024] Call Trace:
[Sat Jan  6 21:33:28 2024]  <IRQ>
[Sat Jan 6 21:33:28 2024] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
[Sat Jan 6 21:33:28 2024] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[Sat Jan 6 21:33:28 2024] ? handle_bug (arch/x86/kernel/traps.c:237)
[Sat Jan 6 21:33:28 2024] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
[Sat Jan 6 21:33:28 2024] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
[Sat Jan 6 21:33:28 2024] ? rcuref_put_slowpath (lib/rcuref.c:279 (discriminator 1))
[Sat Jan 6 21:33:28 2024] ? rcuref_put_slowpath (lib/rcuref.c:279 (discriminator 1))
[Sat Jan 6 21:33:28 2024] dst_release (net/core/dst.c:166 (discriminator 1))
[Sat Jan 6 21:33:28 2024] __dev_queue_xmit (./include/net/dst.h:283 net/core/dev.c:4327)
[Sat Jan 6 21:33:28 2024] ? nf_hook_slow (./include/linux/netfilter.h:144 net/netfilter/core.c:626)
[Sat Jan 6 21:33:28 2024] ip_finish_output2 (./include/net/neighbour.h:526 ./include/net/neighbour.h:540 net/ipv4/ip_output.c:233)
[Sat Jan 6 21:33:28 2024] process_backlog (net/core/dev.c:6000)
[Sat Jan 6 21:33:28 2024] __napi_poll (net/core/dev.c:6559)
[Sat Jan 6 21:33:28 2024] net_rx_action (net/core/dev.c:6628 net/core/dev.c:6759)
[Sat Jan 6 21:33:28 2024] __do_softirq (./arch/x86/include/asm/preempt.h:27 kernel/softirq.c:564)
[Sat Jan 6 21:33:28 2024] irq_exit_rcu (kernel/softirq.c:436 kernel/softirq.c:641 kernel/softirq.c:653)
[Sat Jan 6 21:33:28 2024] sysvec_call_function_single (arch/x86/kernel/smp.c:262 (discriminator 47))
[Sat Jan  6 21:33:28 2024]  </IRQ>
[Sat Jan  6 21:33:28 2024]  <TASK>
[Sat Jan 6 21:33:28 2024] asm_sysvec_call_function_single (./arch/x86/include/asm/idtentry.h:656)
[Sat Jan 6 21:33:28 2024] RIP: 0010:acpi_safe_halt (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:72 drivers/acpi/processor_idle.c:113)
[Sat Jan 6 21:33:28 2024] Code: ed c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 65 48 8b 04 25 40 32 02 00 48 8b 00 a8 08 75 0c eb 07 0f 00 2d c7 0c 2c 00 fb f4 <fa> c3 0f 1f 00 0f b6 47 08 3c 01 74 0b 3c 02 74 05 8b 7f 04 eb 9f
All code
========
   0:	ed                   	in     (%dx),%eax
   1:	c3                   	ret
   2:	66 66 2e 0f 1f 84 00 	data16 cs nopw 0x0(%rax,%rax,1)
   9:	00 00 00 00
   d:	66 90                	xchg   %ax,%ax
   f:	65 48 8b 04 25 40 32 	mov    %gs:0x23240,%rax
  16:	02 00
  18:	48 8b 00             	mov    (%rax),%rax
  1b:	a8 08                	test   $0x8,%al
  1d:	75 0c                	jne    0x2b
  1f:	eb 07                	jmp    0x28
  21:	0f 00 2d c7 0c 2c 00 	verw   0x2c0cc7(%rip)        # 0x2c0cef
  28:	fb                   	sti
  29:	f4                   	hlt
  2a:*	fa                   	cli    		<-- trapping instruction
  2b:	c3                   	ret
  2c:	0f 1f 00             	nopl   (%rax)
  2f:	0f b6 47 08          	movzbl 0x8(%rdi),%eax
  33:	3c 01                	cmp    $0x1,%al
  35:	74 0b                	je     0x42
  37:	3c 02                	cmp    $0x2,%al
  39:	74 05                	je     0x40
  3b:	8b 7f 04             	mov    0x4(%rdi),%edi
  3e:	eb 9f                	jmp    0xffffffffffffffdf

Code starting with the faulting instruction
===========================================
   0:	fa                   	cli
   1:	c3                   	ret
   2:	0f 1f 00             	nopl   (%rax)
   5:	0f b6 47 08          	movzbl 0x8(%rdi),%eax
   9:	3c 01                	cmp    $0x1,%al
   b:	74 0b                	je     0x18
   d:	3c 02                	cmp    $0x2,%al
   f:	74 05                	je     0x16
  11:	8b 7f 04             	mov    0x4(%rdi),%edi
  14:	eb 9f                	jmp    0xffffffffffffffb5
[Sat Jan  6 21:33:28 2024] RSP: 0018:ffffa77300e7be80 EFLAGS: 00000246
[Sat Jan  6 21:33:28 2024] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
[Sat Jan  6 21:33:28 2024] RDX: ffff8f44c0000000 RSI: ffff8f454c6fc800 RDI: ffff8f454c6fc864
[Sat Jan  6 21:33:28 2024] RBP: ffffffffaa216ea0 R08: ffffffffaa216ea0 R09: 00003fd7b44cb0a0
[Sat Jan  6 21:33:28 2024] R10: 0000000000000002 R11: 0000000000000007 R12: 0000000000000001
[Sat Jan  6 21:33:28 2024] R13: ffffffffaa216f08 R14: ffffffffaa216f20 R15: 0000000000000000
[Sat Jan 6 21:33:28 2024] acpi_idle_enter (drivers/acpi/processor_idle.c:709)
[Sat Jan 6 21:33:28 2024] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:267)
[Sat Jan 6 21:33:28 2024] cpuidle_enter (drivers/cpuidle/cpuidle.c:390 (discriminator 2))
[Sat Jan 6 21:33:28 2024] do_idle (kernel/sched/idle.c:134 kernel/sched/idle.c:215 kernel/sched/idle.c:282)
[Sat Jan 6 21:33:28 2024] cpu_startup_entry (kernel/sched/idle.c:379)
[Sat Jan 6 21:33:28 2024] start_secondary (arch/x86/kernel/smpboot.c:326)
[Sat Jan 6 21:33:28 2024] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:449)
[Sat Jan  6 21:33:28 2024]  </TASK>
[Sat Jan  6 21:33:28 2024] ---[ end trace 0000000000000000 ]---

> On 4 Jan 2024, at 22:51, Martin Zaharinov <micron10@gmail.com> wrote:
> 
> Hi Thomas ,
> 
> Happy New Year!
> 
> here is two debugs from two new installed machins with kernel 6.6.9:
> 
> dmesg1 :
> 
> [ 2257.449125] ------------[ cut here ]------------
> [ 2257.449245] WARNING: CPU: 1 PID: 40622 at lib/rcuref.c:294 rcuref_put_slowpath+0x2f/0x70
> [ 2257.449373] Modules linked in: nft_limit nf_conntrack_netlink pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding ixgbe mdio i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
> [ 2257.449642] CPU: 1 PID: 40622 Comm: nc Tainted: G           O       6.6.9 #1
> [ 2257.449761] Hardware name: Supermicro PIO-5038MR-H8TRF-NODE/X10SRD-F, BIOS 3.3 10/28/2020
> [ 2257.449883] RIP: 0010:rcuref_put_slowpath+0x2f/0x70
> [ 2257.449977] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4c e3 00
> [ 2257.450135] RSP: 0000:ffffb455cef83b78 EFLAGS: 00010246
> [ 2257.450227] RAX: 0000000000000000 RBX: ffff94873bb77dc0 RCX: ffff9486c0d46b80
> [ 2257.450341] RDX: ffff948736578428 RSI: 00000000fffffe01 RDI: ffff94873bb77dc0
> [ 2257.450456] RBP: ffff948736578428 R08: ffff948e1fa64f08 R09: 0000000000000001
> [ 2257.450570] R10: 0000000000028530 R11: 0000000000000001 R12: ffff94873bb77d80
> [ 2257.450685] R13: ffff94873bb77de8 R14: ffff948e1fa64f08 R15: 000000000266f59d
> [ 2257.450802] FS:  00007f0cdbc73800(0000) GS:ffff948e1fa40000(0000) knlGS:0000000000000000
> [ 2257.450918] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2257.451012] CR2: 00007f0cdc3f5c30 CR3: 0000000178ea0002 CR4: 00000000003706e0
> [ 2257.451127] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2257.451240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2257.451353] Call Trace:
> [ 2257.451441]  <TASK>
> [ 2257.451526]  ? __warn+0x6c/0x130
> [ 2257.451616]  ? report_bug+0x1b8/0x200
> [ 2257.451707]  ? handle_bug+0x36/0x70
> [ 2257.451797]  ? exc_invalid_op+0x17/0x1a0
> [ 2257.451886]  ? asm_exc_invalid_op+0x16/0x20
> [ 2257.452038]  ? rcuref_put_slowpath+0x2f/0x70
> [ 2257.452129]  dst_release+0x1c/0x40
> [ 2257.452222]  rt_cache_route+0xbd/0xf0
> [ 2257.452313]  ? kmem_cache_alloc+0x31/0x390
> [ 2257.452404]  rt_set_nexthop.isra.0+0x1b6/0x450
> [ 2257.452495]  ip_route_input_slow+0x5d9/0xcc0
> [ 2257.452586]  ? nft_nat_do_chain+0x7f/0xd0 [nft_chain_nat]
> [ 2257.452681]  ? nf_conntrack_udp_packet+0xcf/0x240 [nf_conntrack]
> [ 2257.452784]  ? nf_nat_inet_fn+0x36f/0x3f0 [nf_nat]
> [ 2257.452880]  ip_route_input_noref+0xe0/0xf0
> [ 2257.452970]  ip_rcv_finish_core.isra.0+0xbb/0x440
> [ 2257.453064]  ip_rcv+0xd5/0x110
> [ 2257.453151]  ? ip_rcv_core+0x360/0x360
> [ 2257.453240]  process_backlog+0x107/0x210
> [ 2257.453330]  __napi_poll+0x20/0x180
> [ 2257.453420]  net_rx_action+0x29f/0x380
> [ 2257.453510]  __do_softirq+0xd0/0x202
> [ 2257.453599]  irq_exit_rcu+0x82/0xa0
> [ 2257.453689]  sysvec_call_function_single+0x32/0x80
> [ 2257.453781]  asm_sysvec_call_function_single+0x16/0x20
> [ 2257.453874] RIP: 0033:0x7f0cdc5928b2
> [ 2257.453963] Code: 06 00 00 4c 89 65 88 49 83 fd 08 0f 84 f7 06 00 00 49 83 fd 26 0f 84 05 07 00 00 4d 85 ed 0f 84 5f 01 00 00 41 0f b6 44 24 04 <89> c6 40 c0 ee 04 0f 84 72 06 00 00 41 0f b6 54 24 05 83 e2 03 ff
> [ 2257.454121] RSP: 002b:00007ffc04d3e890 EFLAGS: 00000206
> [ 2257.454215] RAX: 0000000000000012 RBX: 00007f0cdc444db8 RCX: 00007f0cdc4e6e60
> [ 2257.454329] RDX: 0000000000000009 RSI: 00007f0cdc57ef30 RDI: 00007f0cdc42c808
> [ 2257.454442] RBP: 00007ffc04d3e9b0 R08: 00007f0cdc445028 R09: 00007ffc04d3e940
> [ 2257.454555] R10: 00007f0cdbf00be8 R11: 0000000000000000 R12: 00007f0cdc42c898
> [ 2257.454670] R13: 0000000000000006 R14: 0000000600000006 R15: 00007f0cdc581000
> [ 2257.454784]  </TASK>
> [ 2257.454869] ---[ end trace 0000000000000000 ]—
> 
> 
> [ 2257.449125] ------------[ cut here ]------------
> [ 2257.449245] WARNING: CPU: 1 PID: 40622 at lib/rcuref.c:294 rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
> [ 2257.449373] Modules linked in: nft_limit nf_conntrack_netlink pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding ixgbe mdio i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
> [ 2257.449642] CPU: 1 PID: 40622 Comm: nc Tainted: G           O       6.6.9 #1
> [ 2257.449761] Hardware name: Supermicro PIO-5038MR-H8TRF-NODE/X10SRD-F, BIOS 3.3 10/28/2020
> [ 2257.449883] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
> [ 2257.449977] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4c e3 00
> All code
> ========
>   0: 07                    (bad)
>   1: 83 f8 ff              cmp    $0xffffffff,%eax
>   4: 75 19                 jne    0x1f
>   6: ba 00 00 00 e0        mov    $0xe0000000,%edx
>   b: f0 0f b1 17           lock cmpxchg %edx,(%rdi)
>   f: 83 f8 ff              cmp    $0xffffffff,%eax
>  12: 74 04                 je     0x18
>  14: 31 c0                 xor    %eax,%eax
>  16: 5b                    pop    %rbx
>  17: c3                    ret
>  18: b8 01 00 00 00        mov    $0x1,%eax
>  1d: 5b                    pop    %rbx
>  1e: c3                    ret
>  1f: 3d ff ff ff bf        cmp    $0xbfffffff,%eax
>  24: 77 14                 ja     0x3a
>  26: 85 c0                 test   %eax,%eax
>  28: 78 06                 js     0x30
>  2a:* 0f 0b                 ud2     <-- trapping instruction
>  2c: 31 c0                 xor    %eax,%eax
>  2e: eb e6                 jmp    0x16
>  30: c7 07 00 00 00 a0     movl   $0xa0000000,(%rdi)
>  36: 31 c0                 xor    %eax,%eax
>  38: eb dc                 jmp    0x16
>  3a: 80                    .byte 0x80
>  3b: 3d e2 4c e3 00        cmp    $0xe34ce2,%eax
> 
> Code starting with the faulting instruction
> ===========================================
>   0: 0f 0b                 ud2
>   2: 31 c0                 xor    %eax,%eax
>   4: eb e6                 jmp    0xffffffffffffffec
>   6: c7 07 00 00 00 a0     movl   $0xa0000000,(%rdi)
>   c: 31 c0                 xor    %eax,%eax
>   e: eb dc                 jmp    0xffffffffffffffec
>  10: 80                    .byte 0x80
>  11: 3d e2 4c e3 00        cmp    $0xe34ce2,%eax
> [ 2257.450135] RSP: 0000:ffffb455cef83b78 EFLAGS: 00010246
> [ 2257.450227] RAX: 0000000000000000 RBX: ffff94873bb77dc0 RCX: ffff9486c0d46b80
> [ 2257.450341] RDX: ffff948736578428 RSI: 00000000fffffe01 RDI: ffff94873bb77dc0
> [ 2257.450456] RBP: ffff948736578428 R08: ffff948e1fa64f08 R09: 0000000000000001
> [ 2257.450570] R10: 0000000000028530 R11: 0000000000000001 R12: ffff94873bb77d80
> [ 2257.450685] R13: ffff94873bb77de8 R14: ffff948e1fa64f08 R15: 000000000266f59d
> [ 2257.450802] FS:  00007f0cdbc73800(0000) GS:ffff948e1fa40000(0000) knlGS:0000000000000000
> [ 2257.450918] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2257.451012] CR2: 00007f0cdc3f5c30 CR3: 0000000178ea0002 CR4: 00000000003706e0
> [ 2257.451127] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2257.451240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2257.451353] Call Trace:
> [ 2257.451441]  <TASK>
> [ 2257.451526] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
> [ 2257.451616] ? report_bug (lib/bug.c:180 lib/bug.c:219)
> [ 2257.451707] ? handle_bug (arch/x86/kernel/traps.c:237)
> [ 2257.451797] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
> [ 2257.451886] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
> [ 2257.452038] ? rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
> [ 2257.452129] dst_release (net/core/dst.c:166 (discriminator 1))
> [ 2257.452222] rt_cache_route (net/ipv4/route.c:1499)
> [ 2257.452313] ? kmem_cache_alloc (mm/slab.h:711 (discriminator 1) mm/slub.c:3461 (discriminator 1) mm/slub.c:3487 (discriminator 1) mm/slub.c:3494 (discriminator 1) mm/slub.c:3503 (discriminator 1))
> [ 2257.452404] rt_set_nexthop.isra.0 (net/ipv4/route.c:1606 (discriminator 1))
> [ 2257.452495] ip_route_input_slow (./include/net/lwtunnel.h:140 net/ipv4/route.c:1875 net/ipv4/route.c:2154 net/ipv4/route.c:2337)
> [ 2257.452586] ? nft_nat_do_chain (net/netfilter/nft_chain_nat.c:33) nft_chain_nat
> [ 2257.452681] ? nf_conntrack_udp_packet (net/netfilter/nf_conntrack_proto_udp.c:130) nf_conntrack
> [ 2257.452784] ? nf_nat_inet_fn (net/netfilter/nf_nat_core.c:844) nf_nat
> [ 2257.452880] ip_route_input_noref (net/ipv4/route.c:2499)
> [ 2257.452970] ip_rcv_finish_core.isra.0 (net/ipv4/ip_input.c:367 (discriminator 1))
> [ 2257.453064] ip_rcv (net/ipv4/ip_input.c:448 ./include/linux/netfilter.h:304 ./include/linux/netfilter.h:298 net/ipv4/ip_input.c:569)
> [ 2257.453151] ? ip_rcv_core (net/ipv4/ip_input.c:436)
> [ 2257.453240] process_backlog (net/core/dev.c:6000)
> [ 2257.453330] __napi_poll (net/core/dev.c:6559)
> [ 2257.453420] net_rx_action (net/core/dev.c:6628 net/core/dev.c:6759)
> [ 2257.453510] __do_softirq (./arch/x86/include/asm/preempt.h:27 kernel/softirq.c:564)
> [ 2257.453599] irq_exit_rcu (kernel/softirq.c:436 kernel/softirq.c:641 kernel/softirq.c:653)
> [ 2257.453689] sysvec_call_function_single (arch/x86/kernel/smp.c:262 (discriminator 69))
> [ 2257.453781] asm_sysvec_call_function_single (./arch/x86/include/asm/idtentry.h:656)
> [ 2257.453874] RIP: 0033:0x7f0cdc5928b2
> [ 2257.453963] Code: 06 00 00 4c 89 65 88 49 83 fd 08 0f 84 f7 06 00 00 49 83 fd 26 0f 84 05 07 00 00 4d 85 ed 0f 84 5f 01 00 00 41 0f b6 44 24 04 <89> c6 40 c0 ee 04 0f 84 72 06 00 00 41 0f b6 54 24 05 83 e2 03 ff
> All code
> ========
>   0: 06                    (bad)
>   1: 00 00                 add    %al,(%rax)
>   3: 4c 89 65 88           mov    %r12,-0x78(%rbp)
>   7: 49 83 fd 08           cmp    $0x8,%r13
>   b: 0f 84 f7 06 00 00     je     0x708
>  11: 49 83 fd 26           cmp    $0x26,%r13
>  15: 0f 84 05 07 00 00     je     0x720
>  1b: 4d 85 ed              test   %r13,%r13
>  1e: 0f 84 5f 01 00 00     je     0x183
>  24: 41 0f b6 44 24 04     movzbl 0x4(%r12),%eax
>  2a:* 89 c6                 mov    %eax,%esi <-- trapping instruction
>  2c: 40 c0 ee 04           shr    $0x4,%sil
>  30: 0f 84 72 06 00 00     je     0x6a8
>  36: 41 0f b6 54 24 05     movzbl 0x5(%r12),%edx
>  3c: 83 e2 03              and    $0x3,%edx
>  3f: ff                    .byte 0xff
> 
> Code starting with the faulting instruction
> ===========================================
>   0: 89 c6                 mov    %eax,%esi
>   2: 40 c0 ee 04           shr    $0x4,%sil
>   6: 0f 84 72 06 00 00     je     0x67e
>   c: 41 0f b6 54 24 05     movzbl 0x5(%r12),%edx
>  12: 83 e2 03              and    $0x3,%edx
>  15: ff                    .byte 0xff
> [ 2257.454121] RSP: 002b:00007ffc04d3e890 EFLAGS: 00000206
> [ 2257.454215] RAX: 0000000000000012 RBX: 00007f0cdc444db8 RCX: 00007f0cdc4e6e60
> [ 2257.454329] RDX: 0000000000000009 RSI: 00007f0cdc57ef30 RDI: 00007f0cdc42c808
> [ 2257.454442] RBP: 00007ffc04d3e9b0 R08: 00007f0cdc445028 R09: 00007ffc04d3e940
> [ 2257.454555] R10: 00007f0cdbf00be8 R11: 0000000000000000 R12: 00007f0cdc42c898
> [ 2257.454670] R13: 0000000000000006 R14: 0000000600000006 R15: 00007f0cdc581000
> [ 2257.454784]  </TASK>
> [ 2257.454869] ---[ end trace 0000000000000000 ]—
> 
> 
> dmesg2 : 
> 
> [ 2567.167952] ------------[ cut here ]------------
> [ 2567.168053] WARNING: CPU: 11 PID: 0 at lib/rcuref.c:294 rcuref_put_slowpath+0x2f/0x70
> [ 2567.168175] Modules linked in: nft_limit nf_conntrack_netlink pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding ixgbe mdio i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
> [ 2567.168445] CPU: 11 PID: 0 Comm: swapper/11 Tainted: G           O       6.6.9 #1
> [ 2567.168561] Hardware name: Supermicro X10SRD-F/X10SRD-F, BIOS 3.4 06/05/2021
> [ 2567.168675] RIP: 0010:rcuref_put_slowpath+0x2f/0x70
> [ 2567.168767] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4c e3 00
> [ 2567.168924] RSP: 0018:ffffaeaf80418d00 EFLAGS: 00010246
> [ 2567.169017] RAX: 0000000000000000 RBX: ffff9fef84d6a940 RCX: 0000000000000074
> [ 2567.169132] RDX: ffff9fefe2e30000 RSI: 0000000000000000 RDI: ffff9fef84d6a940
> [ 2567.169246] RBP: ffff9fefe2e306c0 R08: 0000000000000000 R09: 0000000000029300
> [ 2567.169359] R10: 0000000000029300 R11: ffffaeaf80418d90 R12: ffff9fef8aebe000
> [ 2567.169473] R13: ffff9fef80896800 R14: ffff9fef85335200 R15: ffff9fef8ae07080
> [ 2567.169586] FS:  0000000000000000(0000) GS:ffff9ff6dfcc0000(0000) knlGS:0000000000000000
> [ 2567.169702] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2567.169795] CR2: 00007f4eaa7e6650 CR3: 0000000156dcd006 CR4: 00000000003706e0
> [ 2567.169908] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2567.170022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2567.170137] Call Trace:
> [ 2567.170224]  <IRQ>
> [ 2567.170309]  ? __warn+0x6c/0x130
> [ 2567.170399]  ? report_bug+0x1b8/0x200
> [ 2567.170488]  ? handle_bug+0x36/0x70
> [ 2567.170577]  ? exc_invalid_op+0x17/0x1a0
> [ 2567.170667]  ? asm_exc_invalid_op+0x16/0x20
> [ 2567.170758]  ? rcuref_put_slowpath+0x2f/0x70
> [ 2567.170850]  dst_release+0x1c/0x40
> [ 2567.170939]  __dev_queue_xmit+0x598/0xce0
> [ 2567.171029]  vlan_dev_hard_start_xmit+0x82/0xc0
> [ 2567.171122]  dev_hard_start_xmit+0x95/0xe0
> [ 2567.171216]  __dev_queue_xmit+0x863/0xce0
> [ 2567.171305]  ? eth_header+0x25/0xc0
> [ 2567.171394]  ip_finish_output2+0x1a0/0x530
> [ 2567.171485]  process_backlog+0x107/0x210
> [ 2567.171575]  __napi_poll+0x20/0x180
> [ 2567.171663]  net_rx_action+0x29f/0x380
> [ 2567.171752]  ? rebalance_domains+0x14c/0x300
> [ 2567.171843]  __do_softirq+0xd0/0x202
> [ 2567.171932]  irq_exit_rcu+0x82/0xa0
> [ 2567.172022]  common_interrupt+0x7a/0xa0
> [ 2567.172111]  </IRQ>
> [ 2567.172198]  <TASK>
> [ 2567.172283]  asm_common_interrupt+0x22/0x40
> [ 2567.172374] RIP: 0010:cpuidle_enter_state+0xa3/0x6a0
> [ 2567.172467] Code: 46 40 40 0f 84 02 01 00 00 e8 c9 a0 70 ff e8 d4 f6 ff ff 31 ff 49 89 c6 e8 0a b9 6f ff 45 84 ff 0f 85 d9 00 00 00 fb 45 85 ed <0f> 88 b8 00 00 00 49 63 cd 48 8b 04 24 48 6b f1 68 49 29 c6 48 8d
> [ 2567.172623] RSP: 0018:ffffaeaf80177e98 EFLAGS: 00000202
> [ 2567.172715] RAX: ffff9ff6dfce3a80 RBX: ffff9fef81338000 RCX: 000000000000001f
> [ 2567.172828] RDX: 00000255b721ed84 RSI: 00000000238e3b7a RDI: 0000000000000000
> [ 2567.172942] RBP: ffffffffba216ea0 R08: 0000000000000004 R09: ffff9ff6dfcdef00
> [ 2567.173055] R10: ffff9ff6dfcdef00 R11: 0000000000000007 R12: 0000000000000001
> [ 2567.173168] R13: 0000000000000001 R14: 00000255b721ed84 R15: 0000000000000000
> [ 2567.173283]  ? cpuidle_enter_state+0x96/0x6a0
> [ 2567.173374]  cpuidle_enter+0x24/0x40
> [ 2567.173464]  do_idle+0x1a7/0x210
> [ 2567.173552]  cpu_startup_entry+0x21/0x30
> [ 2567.173642]  start_secondary+0xe1/0xf0
> [ 2567.173732]  secondary_startup_64_no_verify+0x178/0x17b
> [ 2567.173825]  </TASK>
> [ 2567.173910] ---[ end trace 0000000000000000 ]—
> 
> 
> [ 2567.167952] ------------[ cut here ]------------
> [ 2567.168053] WARNING: CPU: 11 PID: 0 at lib/rcuref.c:294 rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
> [ 2567.168175] Modules linked in: nft_limit nf_conntrack_netlink pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding ixgbe mdio i40e nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
> [ 2567.168445] CPU: 11 PID: 0 Comm: swapper/11 Tainted: G           O       6.6.9 #1
> [ 2567.168561] Hardware name: Supermicro X10SRD-F/X10SRD-F, BIOS 3.4 06/05/2021
> [ 2567.168675] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
> [ 2567.168767] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4c e3 00
> All code
> ========
>   0: 07                    (bad)
>   1: 83 f8 ff              cmp    $0xffffffff,%eax
>   4: 75 19                 jne    0x1f
>   6: ba 00 00 00 e0        mov    $0xe0000000,%edx
>   b: f0 0f b1 17           lock cmpxchg %edx,(%rdi)
>   f: 83 f8 ff              cmp    $0xffffffff,%eax
>  12: 74 04                 je     0x18
>  14: 31 c0                 xor    %eax,%eax
>  16: 5b                    pop    %rbx
>  17: c3                    ret
>  18: b8 01 00 00 00        mov    $0x1,%eax
>  1d: 5b                    pop    %rbx
>  1e: c3                    ret
>  1f: 3d ff ff ff bf        cmp    $0xbfffffff,%eax
>  24: 77 14                 ja     0x3a
>  26: 85 c0                 test   %eax,%eax
>  28: 78 06                 js     0x30
>  2a:* 0f 0b                 ud2     <-- trapping instruction
>  2c: 31 c0                 xor    %eax,%eax
>  2e: eb e6                 jmp    0x16
>  30: c7 07 00 00 00 a0     movl   $0xa0000000,(%rdi)
>  36: 31 c0                 xor    %eax,%eax
>  38: eb dc                 jmp    0x16
>  3a: 80                    .byte 0x80
>  3b: 3d e2 4c e3 00        cmp    $0xe34ce2,%eax
> 
> Code starting with the faulting instruction
> ===========================================
>   0: 0f 0b                 ud2
>   2: 31 c0                 xor    %eax,%eax
>   4: eb e6                 jmp    0xffffffffffffffec
>   6: c7 07 00 00 00 a0     movl   $0xa0000000,(%rdi)
>   c: 31 c0                 xor    %eax,%eax
>   e: eb dc                 jmp    0xffffffffffffffec
>  10: 80                    .byte 0x80
>  11: 3d e2 4c e3 00        cmp    $0xe34ce2,%eax
> [ 2567.168924] RSP: 0018:ffffaeaf80418d00 EFLAGS: 00010246
> [ 2567.169017] RAX: 0000000000000000 RBX: ffff9fef84d6a940 RCX: 0000000000000074
> [ 2567.169132] RDX: ffff9fefe2e30000 RSI: 0000000000000000 RDI: ffff9fef84d6a940
> [ 2567.169246] RBP: ffff9fefe2e306c0 R08: 0000000000000000 R09: 0000000000029300
> [ 2567.169359] R10: 0000000000029300 R11: ffffaeaf80418d90 R12: ffff9fef8aebe000
> [ 2567.169473] R13: ffff9fef80896800 R14: ffff9fef85335200 R15: ffff9fef8ae07080
> [ 2567.169586] FS:  0000000000000000(0000) GS:ffff9ff6dfcc0000(0000) knlGS:0000000000000000
> [ 2567.169702] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2567.169795] CR2: 00007f4eaa7e6650 CR3: 0000000156dcd006 CR4: 00000000003706e0
> [ 2567.169908] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2567.170022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2567.170137] Call Trace:
> [ 2567.170224]  <IRQ>
> [ 2567.170309] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
> [ 2567.170399] ? report_bug (lib/bug.c:180 lib/bug.c:219)
> [ 2567.170488] ? handle_bug (arch/x86/kernel/traps.c:237)
> [ 2567.170577] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
> [ 2567.170667] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
> [ 2567.170758] ? rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
> [ 2567.170850] dst_release (net/core/dst.c:166 (discriminator 1))
> [ 2567.170939] __dev_queue_xmit (./include/net/dst.h:283 net/core/dev.c:4327)
> [ 2567.171029] vlan_dev_hard_start_xmit (net/8021q/vlan_dev.c:130)
> [ 2567.171122] dev_hard_start_xmit (./include/linux/netdevice.h:4926 net/core/dev.c:3576 net/core/dev.c:3592)
> [ 2567.171216] __dev_queue_xmit (./include/linux/netdevice.h:3300 (discriminator 25) net/core/dev.c:4373 (discriminator 25))
> [ 2567.171305] ? eth_header (net/ethernet/eth.c:85)
> [ 2567.171394] ip_finish_output2 (./include/net/neighbour.h:542 (discriminator 2) net/ipv4/ip_output.c:233 (discriminator 2))
> [ 2567.171485] process_backlog (net/core/dev.c:6000)
> [ 2567.171575] __napi_poll (net/core/dev.c:6559)
> [ 2567.171663] net_rx_action (net/core/dev.c:6628 net/core/dev.c:6759)
> [ 2567.171752] ? rebalance_domains (kernel/sched/fair.c:11719 kernel/sched/fair.c:11895)
> [ 2567.171843] __do_softirq (./arch/x86/include/asm/preempt.h:27 kernel/softirq.c:564)
> [ 2567.171932] irq_exit_rcu (kernel/softirq.c:436 kernel/softirq.c:641 kernel/softirq.c:653)
> [ 2567.172022] common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 47))
> [ 2567.172111]  </IRQ>
> [ 2567.172198]  <TASK>
> [ 2567.172283] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:640)
> [ 2567.172374] RIP: 0010:cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291)
> [ 2567.172467] Code: 46 40 40 0f 84 02 01 00 00 e8 c9 a0 70 ff e8 d4 f6 ff ff 31 ff 49 89 c6 e8 0a b9 6f ff 45 84 ff 0f 85 d9 00 00 00 fb 45 85 ed <0f> 88 b8 00 00 00 49 63 cd 48 8b 04 24 48 6b f1 68 49 29 c6 48 8d
> All code
> ========
>   0: 46                    rex.RX
>   1: 40                    rex
>   2: 40 0f 84 02 01 00 00 rex je 0x10b
>   9: e8 c9 a0 70 ff        call   0xffffffffff70a0d7
>   e: e8 d4 f6 ff ff        call   0xfffffffffffff6e7
>  13: 31 ff                 xor    %edi,%edi
>  15: 49 89 c6              mov    %rax,%r14
>  18: e8 0a b9 6f ff        call   0xffffffffff6fb927
>  1d: 45 84 ff              test   %r15b,%r15b
>  20: 0f 85 d9 00 00 00     jne    0xff
>  26: fb                    sti
>  27: 45 85 ed              test   %r13d,%r13d
>  2a:* 0f 88 b8 00 00 00     js     0xe8 <-- trapping instruction
>  30: 49 63 cd              movslq %r13d,%rcx
>  33: 48 8b 04 24           mov    (%rsp),%rax
>  37: 48 6b f1 68           imul   $0x68,%rcx,%rsi
>  3b: 49 29 c6              sub    %rax,%r14
>  3e: 48                    rex.W
>  3f: 8d                    .byte 0x8d
> 
> Code starting with the faulting instruction
> ===========================================
>   0: 0f 88 b8 00 00 00     js     0xbe
>   6: 49 63 cd              movslq %r13d,%rcx
>   9: 48 8b 04 24           mov    (%rsp),%rax
>   d: 48 6b f1 68           imul   $0x68,%rcx,%rsi
>  11: 49 29 c6              sub    %rax,%r14
>  14: 48                    rex.W
>  15: 8d                    .byte 0x8d
> [ 2567.172623] RSP: 0018:ffffaeaf80177e98 EFLAGS: 00000202
> [ 2567.172715] RAX: ffff9ff6dfce3a80 RBX: ffff9fef81338000 RCX: 000000000000001f
> [ 2567.172828] RDX: 00000255b721ed84 RSI: 00000000238e3b7a RDI: 0000000000000000
> [ 2567.172942] RBP: ffffffffba216ea0 R08: 0000000000000004 R09: ffff9ff6dfcdef00
> [ 2567.173055] R10: ffff9ff6dfcdef00 R11: 0000000000000007 R12: 0000000000000001
> [ 2567.173168] R13: 0000000000000001 R14: 00000255b721ed84 R15: 0000000000000000
> [ 2567.173283] ? cpuidle_enter_state (drivers/cpuidle/cpuidle.c:285)
> [ 2567.173374] cpuidle_enter (drivers/cpuidle/cpuidle.c:390 (discriminator 2))
> [ 2567.173464] do_idle (kernel/sched/idle.c:134 kernel/sched/idle.c:215 kernel/sched/idle.c:282)
> [ 2567.173552] cpu_startup_entry (kernel/sched/idle.c:379)
> [ 2567.173642] start_secondary (arch/x86/kernel/smpboot.c:326)
> [ 2567.173732] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:449)
> [ 2567.173825]  </TASK>
> [ 2567.173910] ---[ end trace 0000000000000000 ]—
> 
> best regards,
> Martin
> 
> 
>> On 29 Dec 2023, at 14:00, Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> Hi Thomas,
>> 
>> One more report from second machine:
>> 
>> [21299.954952] ------------[ cut here ]------------
>> [21299.955047] WARNING: CPU: 15 PID: 0 at lib/rcuref.c:294 rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
>> [21299.955153] Modules linked in: nft_limit nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp virtio_net net_failover failover virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring e1000e e1000 vmxnet3 i40e ixgbe mdio bnxt_en nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rtc_cmos
>> [21299.955378] CPU: 15 PID: 0 Comm: swapper/15 Tainted: G           O       6.6.8 #1
>> [21299.955475] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 02/09/2023
>> [21299.955575] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
>> [21299.955662] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4e e3 00
>> All code
>> ========
>>  0: 07                    (bad)
>>  1: 83 f8 ff              cmp    $0xffffffff,%eax
>>  4: 75 19                 jne    0x1f
>>  6: ba 00 00 00 e0        mov    $0xe0000000,%edx
>>  b: f0 0f b1 17           lock cmpxchg %edx,(%rdi)
>>  f: 83 f8 ff              cmp    $0xffffffff,%eax
>> 12: 74 04                 je     0x18
>> 14: 31 c0                 xor    %eax,%eax
>> 16: 5b                    pop    %rbx
>> 17: c3                    ret
>> 18: b8 01 00 00 00        mov    $0x1,%eax
>> 1d: 5b                    pop    %rbx
>> 1e: c3                    ret
>> 1f: 3d ff ff ff bf        cmp    $0xbfffffff,%eax
>> 24: 77 14                 ja     0x3a
>> 26: 85 c0                 test   %eax,%eax
>> 28: 78 06                 js     0x30
>> 2a:* 0f 0b                 ud2     <-- trapping instruction
>> 2c: 31 c0                 xor    %eax,%eax
>> 2e: eb e6                 jmp    0x16
>> 30: c7 07 00 00 00 a0     movl   $0xa0000000,(%rdi)
>> 36: 31 c0                 xor    %eax,%eax
>> 38: eb dc                 jmp    0x16
>> 3a: 80                    .byte 0x80
>> 3b: 3d e2 4e e3 00        cmp    $0xe34ee2,%eax
>> 
>> Code starting with the faulting instruction
>> ===========================================
>>  0: 0f 0b                 ud2
>>  2: 31 c0                 xor    %eax,%eax
>>  4: eb e6                 jmp    0xffffffffffffffec
>>  6: c7 07 00 00 00 a0     movl   $0xa0000000,(%rdi)
>>  c: 31 c0                 xor    %eax,%eax
>>  e: eb dc                 jmp    0xffffffffffffffec
>> 10: 80                    .byte 0x80
>> 11: 3d e2 4e e3 00        cmp    $0xe34ee2,%eax
>> [21299.955793] RSP: 0018:ffff96a7c0578c30 EFLAGS: 00010246
>> [21299.955879] RAX: 0000000000000000 RBX: ffff8b75d1e49a80 RCX: ffff8b75c6667c80
>> [21299.955974] RDX: ffff8b84bfbe4f08 RSI: 00000000fffffe01 RDI: ffff8b75d1e49a80
>> [21299.956070] RBP: ffff8b84bfbe4f08 R08: ffff8b84bfbe4f08 R09: 0000000000000001
>> [21299.956167] R10: 0000000000028530 R11: 0000000000000001 R12: ffff8b75d1e49a40
>> [21299.956261] R13: ffff8b75d1e49aa8 R14: ffff8b84bfbe4f08 R15: 00000000c26ab667
>> [21299.956358] FS:  0000000000000000(0000) GS:ffff8b84bfbc0000(0000) knlGS:0000000000000000
>> [21299.956457] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [21299.956540] CR2: 00007f2e185c73c8 CR3: 0000000950014003 CR4: 00000000003706e0
>> [21299.956635] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [21299.956730] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [21299.956826] Call Trace:
>> [21299.956905]  <IRQ>
>> [21299.956983] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
>> [21299.957065] ? report_bug (lib/bug.c:180 lib/bug.c:219)
>> [21299.957147] ? handle_bug (arch/x86/kernel/traps.c:237)
>> [21299.957228] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
>> [21299.957308] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
>> [21299.957393] ? rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
>> [21299.957476] dst_release (net/core/dst.c:166 (discriminator 1))
>> [21299.957559] rt_cache_route (net/ipv4/route.c:1499)
>> [21299.957641] rt_set_nexthop.isra.0 (net/ipv4/route.c:1606 (discriminator 1))
>> [21299.957722] ip_route_input_slow (./include/net/lwtunnel.h:140 net/ipv4/route.c:1875 net/ipv4/route.c:2154 net/ipv4/route.c:2337)
>> [21299.957804] ? free_unref_page (./include/linux/list.h:150 (discriminator 1) ./include/linux/list.h:169 (discriminator 1) mm/page_alloc.c:2377 (discriminator 1) mm/page_alloc.c:2428 (discriminator 1))
>> [21299.957889] ip_route_input_noref (net/ipv4/route.c:2499)
>> [21299.957972] ip_rcv_finish_core.isra.0 (net/ipv4/ip_input.c:367 (discriminator 1))
>> [21299.958058] ip_rcv (net/ipv4/ip_input.c:448 ./include/linux/netfilter.h:304 ./include/linux/netfilter.h:298 net/ipv4/ip_input.c:569)
>> [21299.958139] ? ip_rcv_core (net/ipv4/ip_input.c:436)
>> [21299.958220] process_backlog (net/core/dev.c:5997)
>> [21299.958302] __napi_poll (net/core/dev.c:6556)
>> [21299.958384] net_rx_action (net/core/dev.c:6625 net/core/dev.c:6756)
>> [21299.958466] __do_softirq (./arch/x86/include/asm/preempt.h:27 kernel/softirq.c:564)
>> [21299.958549] irq_exit_rcu (kernel/softirq.c:436 kernel/softirq.c:641 kernel/softirq.c:653)
>> [21299.958631] sysvec_call_function_single (arch/x86/kernel/smp.c:262 (discriminator 47))
>> [21299.958714]  </IRQ>
>> [21299.958792]  <TASK>
>> [21299.958869] asm_sysvec_call_function_single (./arch/x86/include/asm/idtentry.h:656)
>> [21299.958953] RIP: 0010:acpi_safe_halt (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:72 drivers/acpi/processor_idle.c:113)
>> [21299.959038] Code: ed c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 65 48 8b 04 25 40 32 02 00 48 8b 00 a8 08 75 0c eb 07 0f 00 2d 57 0f 2c 00 fb f4 <fa> c3 0f 1f 00 0f b6 47 08 3c 01 74 0b 3c 02 74 05 8b 7f 04 eb 9f
>> All code
>> ========
>>  0: ed                    in     (%dx),%eax
>>  1: c3                    ret
>>  2: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
>>  9: 00 00 00 00
>>  d: 66 90                 xchg   %ax,%ax
>>  f: 65 48 8b 04 25 40 32 mov    %gs:0x23240,%rax
>> 16: 02 00
>> 18: 48 8b 00              mov    (%rax),%rax
>> 1b: a8 08                 test   $0x8,%al
>> 1d: 75 0c                 jne    0x2b
>> 1f: eb 07                 jmp    0x28
>> 21: 0f 00 2d 57 0f 2c 00 verw   0x2c0f57(%rip)        # 0x2c0f7f
>> 28: fb                    sti
>> 29: f4                    hlt
>> 2a:* fa                    cli     <-- trapping instruction
>> 2b: c3                    ret
>> 2c: 0f 1f 00              nopl   (%rax)
>> 2f: 0f b6 47 08           movzbl 0x8(%rdi),%eax
>> 33: 3c 01                 cmp    $0x1,%al
>> 35: 74 0b                 je     0x42
>> 37: 3c 02                 cmp    $0x2,%al
>> 39: 74 05                 je     0x40
>> 3b: 8b 7f 04              mov    0x4(%rdi),%edi
>> 3e: eb 9f                 jmp    0xffffffffffffffdf
>> 
>> Code starting with the faulting instruction
>> ===========================================
>>  0: fa                    cli
>>  1: c3                    ret
>>  2: 0f 1f 00              nopl   (%rax)
>>  5: 0f b6 47 08           movzbl 0x8(%rdi),%eax
>>  9: 3c 01                 cmp    $0x1,%al
>>  b: 74 0b                 je     0x18
>>  d: 3c 02                 cmp    $0x2,%al
>>  f: 74 05                 je     0x16
>> 11: 8b 7f 04              mov    0x4(%rdi),%edi
>> 14: eb 9f                 jmp    0xffffffffffffffb5
>> [21299.959162] RSP: 0018:ffff96a7c015be80 EFLAGS: 00000246
>> [21299.959247] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
>> [21299.959343] RDX: ffff8b84bfbc0000 RSI: ffff8b75c76ba000 RDI: ffff8b75c76ba064
>> [21299.959437] RBP: ffffffffae216ea0 R08: ffffffffae216ea0 R09: 0000000000000003
>> [21299.959533] R10: 0000000000000002 R11: 0000000000000008 R12: 0000000000000001
>> [21299.959630] R13: ffffffffae216f08 R14: ffffffffae216f20 R15: 0000000000000000
>> [21299.959725] acpi_idle_enter (drivers/acpi/processor_idle.c:709)
>> [21299.959807] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:267)
>> [21299.959890] cpuidle_enter (drivers/cpuidle/cpuidle.c:390 (discriminator 2))
>> [21299.959975] do_idle (kernel/sched/idle.c:134 kernel/sched/idle.c:215 kernel/sched/idle.c:282)
>> [21299.960058] cpu_startup_entry (kernel/sched/idle.c:379)
>> [21299.960140] start_secondary (arch/x86/kernel/smpboot.c:326)
>> [21299.960223] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
>> [21299.960306]  </TASK>
>> [21299.960384] ---[ end trace 0000000000000000 ]---
>> 
>>> On 22 Dec 2023, at 19:26, Martin Zaharinov <micron10@gmail.com> wrote:
>>> 
>>> Hi Thomas,
>>> 
>>> this is with applyed patch from you.
>>> See logs
>>> 
>>> 
>>> [43040.198064] ------------[ cut here ]------------
>>> [43040.198407] WARNING: CPU: 47 PID: 0 at lib/rcuref.c:294 rcuref_put_slowpath+0x2f/0x70
>>> [43040.198685] Modules linked in: pppoe pppox ppp_generic slhc nft_limit nft_ct nft_nat nft_chain_nat nf_tables netconsole tg3 igb i2c_algo_bit e1000e bnxt_en mlx5_core mlxfw mlx4_en mlx4_core i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_devintf ipmi_msghandler rtc_cmos
>>> [43040.199478] CPU: 47 PID: 0 Comm: swapper/47 Tainted: G           O       6.6.8 #1
>>> [43040.199660] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
>>> [43040.199886] RIP: 0010:rcuref_put_slowpath+0x2f/0x70
>>> [43040.200028] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4e e3 00
>>> [43040.200387] RSP: 0018:ffffa39d83e88c30 EFLAGS: 00010246
>>> [43040.200528] RAX: 0000000000000000 RBX: ffff9c58e966b840 RCX: ffff9c5bc4e35680
>>> [43040.200700] RDX: ffff9c5fafde4f08 RSI: 00000000fffffe01 RDI: ffff9c58e966b840
>>> [43040.200871] RBP: ffff9c5fafde4f08 R08: ffff9c5fafde4f08 R09: 0000000000000001
>>> [43040.201044] R10: 00000000000286e0 R11: 0000000000000001 R12: ffff9c58e966b800
>>> [43040.201255] R13: ffff9c58e966b868 R14: ffff9c5fafde4f08 R15: 000000008f5de42b
>>> [43040.201439] FS:  0000000000000000(0000) GS:ffff9c5fafdc0000(0000) knlGS:0000000000000000
>>> [43040.201642] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [43040.201799] CR2: 00007f1401217714 CR3: 0000000464b94003 CR4: 00000000001706e0
>>> [43040.201994] Call Trace:
>>> [43040.202095]  <IRQ>
>>> [43040.202187]  ? __warn+0x6c/0x130
>>> [43040.202301]  ? report_bug+0x1b8/0x200
>>> [43040.202418]  ? handle_bug+0x36/0x70
>>> [43040.202534]  ? exc_invalid_op+0x17/0x1a0
>>> [43040.202652]  ? asm_exc_invalid_op+0x16/0x20
>>> [43040.202781]  ? rcuref_put_slowpath+0x2f/0x70
>>> [43040.202909]  dst_release+0x1c/0x40
>>> [43040.203026]  rt_cache_route+0xbd/0xf0
>>> [43040.203143]  rt_set_nexthop.isra.0+0x1b6/0x450
>>> [43040.203272]  ip_route_input_slow+0x5d9/0xcc0
>>> [43040.203401]  ? nf_conntrack_udp_packet+0x17c/0x240 [nf_conntrack]
>>> [43040.203581]  ip_route_input_noref+0xe0/0xf0
>>> [43040.203704]  ip_rcv_finish_core.isra.0+0xbb/0x440
>>> [43040.203855]  ip_rcv+0xd5/0x110
>>> [43040.203962]  ? ip_rcv_core+0x360/0x360
>>> [43040.204079]  process_backlog+0x107/0x210
>>> [43040.204201]  __napi_poll+0x20/0x180
>>> [43040.204315]  net_rx_action+0x29f/0x380
>>> [43040.204432]  __do_softirq+0xd0/0x202
>>> [43040.204549]  irq_exit_rcu+0x82/0xa0
>>> [43040.204667]  common_interrupt+0x7a/0xa0
>>> [43040.204786]  </IRQ>
>>> [43040.204876]  <TASK>
>>> [43040.204965]  asm_common_interrupt+0x22/0x40
>>> [43040.205090] RIP: 0010:acpi_safe_halt+0x1b/0x20
>>> [43040.205220] Code: ed c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 65 48 8b 04 25 40 32 02 00 48 8b 00 a8 08 75 0c eb 07 0f 00 2d 57 0f 2c 00 fb f4 <fa> c3 0f 1f 00 0f b6 47 08 3c 01 74 0b 3c 02 74 05 8b 7f 04 eb 9f
>>> [43040.205578] RSP: 0018:ffffa39d8234fe80 EFLAGS: 00000246
>>> [43040.205718] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
>>> [43040.205890] RDX: ffff9c5fafdc0000 RSI: ffff9c5882e95800 RDI: ffff9c5882e95864
>>> [43040.206063] RBP: ffffffffa9216ea0 R08: ffffffffa9216ea0 R09: 0000000000000003
>>> [43040.206246] R10: 0000000000000002 R11: 0000000000000008 R12: 0000000000000001
>>> [43040.206419] R13: ffffffffa9216f08 R14: ffffffffa9216f20 R15: 0000000000000000
>>> [43040.206593]  acpi_idle_enter+0x77/0xc0
>>> [43040.206711]  cpuidle_enter_state+0x69/0x6a0
>>> [43040.206835]  cpuidle_enter+0x24/0x40
>>> [43040.206954]  do_idle+0x1a7/0x210
>>> [43040.207066]  cpu_startup_entry+0x21/0x30
>>> [43040.207188]  start_secondary+0xe1/0xf0
>>> [43040.207310]  secondary_startup_64_no_verify+0x166/0x16b
>>> [43040.207451]  </TASK>
>>> [43040.207542] ---[ end trace 0000000000000000 ]---
>>> 
>>> 
>>> 
>>> [43040.198064] ------------[ cut here ]------------
>>> [43040.198407] WARNING: CPU: 47 PID: 0 at lib/rcuref.c:294 rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
>>> [43040.198685] Modules linked in: pppoe pppox ppp_generic slhc nft_limit nft_ct nft_nat nft_chain_nat nf_tables netconsole tg3 igb i2c_algo_bit e1000e bnxt_en mlx5_core mlxfw mlx4_en mlx4_core i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_devintf ipmi_msghandler rtc_cmos
>>> [43040.199478] CPU: 47 PID: 0 Comm: swapper/47 Tainted: G           O       6.6.8 #1
>>> [43040.199660] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
>>> [43040.199886] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
>>> [43040.200028] Code: 07 83 f8 ff 75 19 ba 00 00 00 e0 f0 0f b1 17 83 f8 ff 74 04 31 c0 5b c3 b8 01 00 00 00 5b c3 3d ff ff ff bf 77 14 85 c0 78 06 <0f> 0b 31 c0 eb e6 c7 07 00 00 00 a0 31 c0 eb dc 80 3d e2 4e e3 00
>>> All code
>>> ========
>>> 0: 07                    (bad)
>>> 1: 83 f8 ff              cmp    $0xffffffff,%eax
>>> 4: 75 19                 jne    0x1f
>>> 6: ba 00 00 00 e0        mov    $0xe0000000,%edx
>>> b: f0 0f b1 17           lock cmpxchg %edx,(%rdi)
>>> f: 83 f8 ff              cmp    $0xffffffff,%eax
>>> 12: 74 04                 je     0x18
>>> 14: 31 c0                 xor    %eax,%eax
>>> 16: 5b                    pop    %rbx
>>> 17: c3                    ret
>>> 18: b8 01 00 00 00        mov    $0x1,%eax
>>> 1d: 5b                    pop    %rbx
>>> 1e: c3                    ret
>>> 1f: 3d ff ff ff bf        cmp    $0xbfffffff,%eax
>>> 24: 77 14                 ja     0x3a
>>> 26: 85 c0                 test   %eax,%eax
>>> 28: 78 06                 js     0x30
>>> 2a:* 0f 0b                 ud2     <-- trapping instruction
>>> 2c: 31 c0                 xor    %eax,%eax
>>> 2e: eb e6                 jmp    0x16
>>> 30: c7 07 00 00 00 a0     movl   $0xa0000000,(%rdi)
>>> 36: 31 c0                 xor    %eax,%eax
>>> 38: eb dc                 jmp    0x16
>>> 3a: 80                    .byte 0x80
>>> 3b: 3d e2 4e e3 00        cmp    $0xe34ee2,%eax
>>> 
>>> Code starting with the faulting instruction
>>> ===========================================
>>> 0: 0f 0b                 ud2
>>> 2: 31 c0                 xor    %eax,%eax
>>> 4: eb e6                 jmp    0xffffffffffffffec
>>> 6: c7 07 00 00 00 a0     movl   $0xa0000000,(%rdi)
>>> c: 31 c0                 xor    %eax,%eax
>>> e: eb dc                 jmp    0xffffffffffffffec
>>> 10: 80                    .byte 0x80
>>> 11: 3d e2 4e e3 00        cmp    $0xe34ee2,%eax
>>> [43040.200387] RSP: 0018:ffffa39d83e88c30 EFLAGS: 00010246
>>> [43040.200528] RAX: 0000000000000000 RBX: ffff9c58e966b840 RCX: ffff9c5bc4e35680
>>> [43040.200700] RDX: ffff9c5fafde4f08 RSI: 00000000fffffe01 RDI: ffff9c58e966b840
>>> [43040.200871] RBP: ffff9c5fafde4f08 R08: ffff9c5fafde4f08 R09: 0000000000000001
>>> [43040.201044] R10: 00000000000286e0 R11: 0000000000000001 R12: ffff9c58e966b800
>>> [43040.201255] R13: ffff9c58e966b868 R14: ffff9c5fafde4f08 R15: 000000008f5de42b
>>> [43040.201439] FS:  0000000000000000(0000) GS:ffff9c5fafdc0000(0000) knlGS:0000000000000000
>>> [43040.201642] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [43040.201799] CR2: 00007f1401217714 CR3: 0000000464b94003 CR4: 00000000001706e0
>>> [43040.201994] Call Trace:
>>> [43040.202095]  <IRQ>
>>> [43040.202187] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
>>> [43040.202301] ? report_bug (lib/bug.c:180 lib/bug.c:219)
>>> [43040.202418] ? handle_bug (arch/x86/kernel/traps.c:237)
>>> [43040.202534] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
>>> [43040.202652] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
>>> [43040.202781] ? rcuref_put_slowpath (lib/rcuref.c:294 (discriminator 1))
>>> [43040.202909] dst_release (net/core/dst.c:166 (discriminator 1))
>>> [43040.203026] rt_cache_route (net/ipv4/route.c:1499)
>>> [43040.203143] rt_set_nexthop.isra.0 (net/ipv4/route.c:1606 (discriminator 1))
>>> [43040.203272] ip_route_input_slow (./include/net/lwtunnel.h:140 net/ipv4/route.c:1875 net/ipv4/route.c:2154 net/ipv4/route.c:2337)
>>> [43040.203401] ? nf_conntrack_udp_packet (net/netfilter/nf_conntrack_proto_udp.c:124) nf_conntrack
>>> [43040.203581] ip_route_input_noref (net/ipv4/route.c:2499)
>>> [43040.203704] ip_rcv_finish_core.isra.0 (net/ipv4/ip_input.c:367 (discriminator 1))
>>> [43040.203855] ip_rcv (net/ipv4/ip_input.c:448 ./include/linux/netfilter.h:304 ./include/linux/netfilter.h:298 net/ipv4/ip_input.c:569)
>>> [43040.203962] ? ip_rcv_core (net/ipv4/ip_input.c:436)
>>> [43040.204079] process_backlog (net/core/dev.c:5997)
>>> [43040.204201] __napi_poll (net/core/dev.c:6556)
>>> [43040.204315] net_rx_action (net/core/dev.c:6625 net/core/dev.c:6756)
>>> [43040.204432] __do_softirq (./arch/x86/include/asm/preempt.h:27 kernel/softirq.c:564)
>>> [43040.204549] irq_exit_rcu (kernel/softirq.c:436 kernel/softirq.c:641 kernel/softirq.c:653)
>>> [43040.204667] common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 47))
>>> [43040.204786]  </IRQ>
>>> [43040.204876]  <TASK>
>>> [43040.204965] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:640)
>>> [43040.205090] RIP: 0010:acpi_safe_halt (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:72 drivers/acpi/processor_idle.c:113)
>>> [43040.205220] Code: ed c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 65 48 8b 04 25 40 32 02 00 48 8b 00 a8 08 75 0c eb 07 0f 00 2d 57 0f 2c 00 fb f4 <fa> c3 0f 1f 00 0f b6 47 08 3c 01 74 0b 3c 02 74 05 8b 7f 04 eb 9f
>>> All code
>>> ========
>>> 0: ed                    in     (%dx),%eax
>>> 1: c3                    ret
>>> 2: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
>>> 9: 00 00 00 00
>>> d: 66 90                 xchg   %ax,%ax
>>> f: 65 48 8b 04 25 40 32 mov    %gs:0x23240,%rax
>>> 16: 02 00
>>> 18: 48 8b 00              mov    (%rax),%rax
>>> 1b: a8 08                 test   $0x8,%al
>>> 1d: 75 0c                 jne    0x2b
>>> 1f: eb 07                 jmp    0x28
>>> 21: 0f 00 2d 57 0f 2c 00 verw   0x2c0f57(%rip)        # 0x2c0f7f
>>> 28: fb                    sti
>>> 29: f4                    hlt
>>> 2a:* fa                    cli     <-- trapping instruction
>>> 2b: c3                    ret
>>> 2c: 0f 1f 00              nopl   (%rax)
>>> 2f: 0f b6 47 08           movzbl 0x8(%rdi),%eax
>>> 33: 3c 01                 cmp    $0x1,%al
>>> 35: 74 0b                 je     0x42
>>> 37: 3c 02                 cmp    $0x2,%al
>>> 39: 74 05                 je     0x40
>>> 3b: 8b 7f 04              mov    0x4(%rdi),%edi
>>> 3e: eb 9f                 jmp    0xffffffffffffffdf
>>> 
>>> Code starting with the faulting instruction
>>> ===========================================
>>> 0: fa                    cli
>>> 1: c3                    ret
>>> 2: 0f 1f 00              nopl   (%rax)
>>> 5: 0f b6 47 08           movzbl 0x8(%rdi),%eax
>>> 9: 3c 01                 cmp    $0x1,%al
>>> b: 74 0b                 je     0x18
>>> d: 3c 02                 cmp    $0x2,%al
>>> f: 74 05                 je     0x16
>>> 11: 8b 7f 04              mov    0x4(%rdi),%edi
>>> 14: eb 9f                 jmp    0xffffffffffffffb5
>>> [43040.205578] RSP: 0018:ffffa39d8234fe80 EFLAGS: 00000246
>>> [43040.205718] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
>>> [43040.205890] RDX: ffff9c5fafdc0000 RSI: ffff9c5882e95800 RDI: ffff9c5882e95864
>>> [43040.206063] RBP: ffffffffa9216ea0 R08: ffffffffa9216ea0 R09: 0000000000000003
>>> [43040.206246] R10: 0000000000000002 R11: 0000000000000008 R12: 0000000000000001
>>> [43040.206419] R13: ffffffffa9216f08 R14: ffffffffa9216f20 R15: 0000000000000000
>>> [43040.206593] acpi_idle_enter (drivers/acpi/processor_idle.c:709)
>>> [43040.206711] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:267)
>>> [43040.206835] cpuidle_enter (drivers/cpuidle/cpuidle.c:390 (discriminator 2))
>>> [43040.206954] do_idle (kernel/sched/idle.c:134 kernel/sched/idle.c:215 kernel/sched/idle.c:282)
>>> [43040.207066] cpu_startup_entry (kernel/sched/idle.c:379)
>>> [43040.207188] start_secondary (arch/x86/kernel/smpboot.c:326)
>>> [43040.207310] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
>>> [43040.207451]  </TASK>
>>> [43040.207542] ---[ end trace 0000000000000000 ]---
>>> 
>>>> On 19 Dec 2023, at 16:26, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>> 
>>>> On Tue, Dec 19 2023 at 11:25, Martin Zaharinov wrote:
>>>>>> On 12 Dec 2023, at 20:16, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>>>> Btw, how easy is this to reproduce?
>>>>> 
>>>>> Its not easy this report is generate on machine with 5-6k users , with
>>>>> traffic and one time is show on 1 day , other show after 4-5 days…
>>>> 
>>>> I love those bugs ...
>>>> 
>>>>> Apply this patch and will upload image on one machine as fast as
>>>>> possible and when get any reports will send you.
>>>> 
>>>> Let's see how that goes!
>>>> 
>>>> Thanks,
>>>> 
>>>>     tglx
>>> 
>> 
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2024-01-07 11:03 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-15  4:05 Urgent Bug Report Kernel crash 6.5.2 Martin Zaharinov
2023-09-15  6:45 ` Eric Dumazet
2023-09-15 22:23   ` Martin Zaharinov
2023-11-16 14:17   ` Martin Zaharinov
2023-12-06 22:26     ` Martin Zaharinov
     [not found]       ` <5E63894D-913B-416C-B901-F628BB6C00E0@gmail.com>
2023-12-08 22:20         ` Thomas Gleixner
2023-12-08 23:01           ` Martin Zaharinov
2023-12-12 18:16             ` Thomas Gleixner
2023-12-19  9:25               ` Martin Zaharinov
2023-12-19 14:26                 ` Thomas Gleixner
2023-12-22 17:26                   ` Martin Zaharinov
2023-12-29 12:00                     ` Martin Zaharinov
2024-01-04 20:51                       ` Martin Zaharinov
2024-01-07 11:03                         ` Martin Zaharinov
2023-09-15 23:00 ` Martin Zaharinov
2023-09-15 23:11   ` Martin Zaharinov
2023-09-16  8:27     ` Paolo Abeni
     [not found]       ` <CALidq=UR=3rOHZczCnb1bEhbt9So60UZ5y60Cdh4aP41FkB5Tw@mail.gmail.com>
2023-09-17 11:35         ` Martin Zaharinov
2023-09-17 11:40         ` Martin Zaharinov
2023-09-17 11:55           ` Martin Zaharinov
2023-09-17 12:04             ` Holger Hoffstätte
2023-09-18  8:09             ` Eric Dumazet
2023-09-19 20:09             ` Martin Zaharinov
2023-09-20  3:59               ` Eric Dumazet
2023-09-20  6:05                 ` Martin Zaharinov
2023-09-20  6:16                   ` Bagas Sanjaya
2023-09-20  7:03                     ` Martin Zaharinov
2023-09-20  7:25                       ` Eric Dumazet
2023-09-20  7:29                         ` Eric Dumazet
2023-09-20  7:32                           ` Martin Zaharinov
2023-09-21  7:50                             ` Bagas Sanjaya
2023-09-21  8:13                               ` Martin Zaharinov
2023-09-22  3:06                                 ` Bagas Sanjaya
2023-09-22  9:50                                   ` Linux regression tracking (Thorsten Leemhuis)
2023-09-22 11:09                                     ` Bagas Sanjaya

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.