All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: soft lockup while deleting tap interface from vlan aware bridge
@ 2020-04-29 20:52 Stefan Priebe - Profihost AG
  2020-04-29 21:23   ` [Bridge] " Nikolay Aleksandrov
  2020-04-30 10:55   ` [Bridge] " Ido Schimmel
  0 siblings, 2 replies; 15+ messages in thread
From: Stefan Priebe - Profihost AG @ 2020-04-29 20:52 UTC (permalink / raw)
  To: roopa; +Cc: nikolay, davem, bridge, netdev

Hello,

while running a stable vanilla kernel 4.19.115 i'm reproducably get this
one:

watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [bridge:3570653]

...

Call
Trace:nbp_vlan_delete+0x59/0xa0br_vlan_info+0x66/0xd0br_afspec+0x18c/0x1d0br_dellink+0x74/0xd0rtnl_bridge_dellink+0x110/0x220rtnetlink_rcv_msg+0x283/0x360
     ?
rtnl_calcit.isra.29+0x110/0x110netlink_rcv_skb+0xde/0x110netlink_unicast+0x1be/0x250netlink_sendmsg+0x2c1/0x3b0___sys_sendmsg+0x308/0x320
     ?
__sys_sendmsg+0x63/0xa0__sys_sendmsg+0x63/0xa0do_syscall_64+0x5b/0x190entry_SYSCALL_64_after_hwframe+0x44/0xa9

full dump below. This only happens if there is enough load on the
machine. I can't reproduce this on a test machine while running only 1-5
VMs with tap devices enslaved to a bridge. But i was able to shut down 3
qemu nodes in a row while just executing:
bridge vlan del dev tap123 vid 1-4094

Is this a known bug in 4.19.X?

Thanks for your help.

Greets,
Stefan

watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [bridge:3570653]
Modules linked in: act_police cls_basic sch_ingress sch_htb netconsole
ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter
ip6_tables xt_physdev xt_comment xt_tcpudp xt_mark xt_addrtype
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set iptable_filter
bpfilter 8021q garp bonding fuse ipmi_watchdog nfnetlink_log nfnetlink
tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper dcdbas edac_mce_amd
dell_smbios kvm_amd dell_wmi_descriptor drm kvm fb_sys_fops syscopyarea
sysfillrect ipmi_si irqbypass sysimgblt ipmi_devintf crc32_pclmul
i2c_algo_bit ghash_clmulni_intel sg wmi k10temp ipmi_msghandler button
vhost_net tun vhost tap ip_tables x_tables btrfs zstd_decompress
zstd_compress xxhash raid10 raid456 async_raid6_recov
async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid0
multipath linear raid1 mlx5_core md_mod devlink tg3 sd_mod xhci_pci ptp
dm_mod xhci_hcd nvme i2c_piix4 libphy pps_core nvme_core usbcore ahci
i2c_core usb_common libahci [last unloaded: cpuid]
CPU: 38 PID: 3570653 Comm: bridge Not tainted 4.19.115+53-ph #1
Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
RIP: 0010:br_fdb_delete_by_port+0x48/0xb0
Code: d5 48 89 ef e8 09 51 08 00 49 8b 84 24 98 04 00 00 48 85 c0 75 0a
eb 44 48 85 db 74 3f 48 89 d8 48 83 e8 18 74 36 48 8b 70 08 <48> 8b 58
18 4c 39 fe 75 e5 45 85 f6 0f b6 50 28 75 0b f6 c2 02 75
RSP: 0018:ffffbbc9f21afa58 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: ffff9cabf1a74a80 RBX: ffff9cabf1a74a98 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9cae289f8a00 RDI: ffff9baebe6a9844
RBP: ffff9baebe6a9844 R08: 0000000000000000 R09: ffffffff933d3840
R10: fffff675235b9800 R11: 0000000000000001 R12: ffff9baebe6a9840
R13: 0000000000000e23 R14: 0000000000000000 R15: ffff9cad832f5c00
FS: 00007f446531a540(0000) GS:ffff9baebff80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f44652c4443 CR3: 000000127b3f8000 CR4: 0000000000340ee0
Call
Trace:nbp_vlan_delete+0x59/0xa0br_vlan_info+0x66/0xd0br_afspec+0x18c/0x1d0br_dellink+0x74/0xd0rtnl_bridge_dellink+0x110/0x220rtnetlink_rcv_msg+0x283/0x360
?
rtnl_calcit.isra.29+0x110/0x110netlink_rcv_skb+0xde/0x110netlink_unicast+0x1be/0x250netlink_sendmsg+0x2c1/0x3b0___sys_sendmsg+0x308/0x320
?
__sys_sendmsg+0x63/0xa0__sys_sendmsg+0x63/0xa0do_syscall_64+0x5b/0x190entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f4465252c74
Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00
48 8d 05 89 5a 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53
RSP: 002b:00007fff564feba8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000005ea9d931 RCX: 00007f4465252c74
RDX: 0000000000000000 RSI: 00007fff564febf0 RDI: 0000000000000003
RBP: 00007fff564febf0 R08: 0000000000000004 R09: 0000000000000008
R10: fffffffffffffb8e R11: 0000000000000246 R12: 00007fff564fec30
R13: 000056355724f2c0 R14: 00007fff56506ca0 R15: 0000000000000000
Kernel panic - not syncing: softlockup: hung tasks
CPU: 38 PID: 3570653 Comm: bridge Tainted: G L 4.19.115+53-ph #1
Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
Call
Trace:<IRQ>dump_stack+0x66/0x8bpanic+0xe4/0x252watchdog_timer_fn+0x263/0x270
?
softlockup_fn+0x40/0x40__hrtimer_run_queues+0xfc/0x270hrtimer_interrupt+0x101/0x240smp_apic_timer_interrupt+0x6a/0x130apic_timer_interrupt+0xf/0x20</IRQ>
RIP: 0010:br_fdb_delete_by_port+0x48/0xb0
Code: d5 48 89 ef e8 09 51 08 00 49 8b 84 24 98 04 00 00 48 85 c0 75 0a
eb 44 48 85 db 74 3f 48 89 d8 48 83 e8 18 74 36 48 8b 70 08 <48> 8b 58
18 4c 39 fe 75 e5 45 85 f6 0f b6 50 28 75 0b f6 c2 02 75
RSP: 0018:ffffbbc9f21afa58 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: ffff9cabf1a74a80 RBX: ffff9cabf1a74a98 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9cae289f8a00 RDI: ffff9baebe6a9844
RBP: ffff9baebe6a9844 R08: 0000000000000000 R09: ffffffff933d3840
R10: fffff675235b9800 R11: 0000000000000001 R12: ffff9baebe6a9840
R13: 0000000000000e23 R14: 0000000000000000 R15: ffff9cad832f5c00
?
br_fdb_delete_by_port+0x27/0xb0nbp_vlan_delete+0x59/0xa0br_vlan_info+0x66/0xd0br_afspec+0x18c/0x1d0br_dellink+0x74/0xd0rtnl_bridge_dellink+0x110/0x220rtnetlink_rcv_msg+0x283/0x360
?
rtnl_calcit.isra.29+0x110/0x110netlink_rcv_skb+0xde/0x110netlink_unicast+0x1be/0x250netlink_sendmsg+0x2c1/0x3b0___sys_sendmsg+0x308/0x320
?
__sys_sendmsg+0x63/0xa0__sys_sendmsg+0x63/0xa0do_syscall_64+0x5b/0x190entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f4465252c74
Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00
48 8d 05 89 5a 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53
RSP: 002b:00007fff564feba8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000005ea9d931 RCX: 00007f4465252c74
RDX: 0000000000000000 RSI: 00007fff564febf0 RDI: 0000000000000003
RBP: 00007fff564febf0 R08: 0000000000000004 R09: 0000000000000008
R10: fffffffffffffb8e R11: 0000000000000246 R12: 00007fff564fec30
R13: 000056355724f2c0 R14: 00007fff56506ca0 R15: 0000000000000000
------------[ cut here ]------------
WARNING: CPU: 35 PID: 2224815 at kernel/sched/core.c:1168
set_task_cpu+0x173/0x180
------------[ cut here ]------------
sched: Unexpected reschedule of offline CPU#13!
WARNING: CPU: 46 PID: 0 at arch/x86/kernel/smp.c:128
native_smp_send_reschedule+0x34/0x40
------------[ cut here ]------------
Modules linked in:
sched: Unexpected reschedule of offline CPU#0!act_police
WARNING: CPU: 33 PID: 2699426 at arch/x86/kernel/smp.c:128
native_smp_send_reschedule+0x34/0x40cls_basic
Modules linked
in:sch_ingressact_policesch_htbcls_basicnetconsolesch_ingressebtable_filtersch_htbebtablesnetconsoleip6t_REJECTebtable_filternf_reject_ipv6ebtablesip6table_filterip6t_REJECT
------------[ cut here ]------------ip6_tablesnf_reject_ipv6xt_physdev
sched: Unexpected reschedule of offline CPU#37!ip6table_filterxt_comment
WARNING: CPU: 39 PID: 0 at arch/x86/kernel/smp.c:128
native_smp_send_reschedule+0x34/0x40ip6_tablesxt_tcpudpxt_physdev
Modules linked
in:xt_markxt_commentact_policext_addrtypext_tcpudpcls_basicxt_conntrackxt_marksch_ingressnf_conntrackxt_addrtypesch_htbnf_defrag_ipv6xt_conntracknetconsolenf_defrag_ipv4nf_conntrackebtable_filterxt_multiportnf_defrag_ipv6ebtablesipt_REJECTnf_defrag_ipv4ip6t_REJECTnf_reject_ipv4xt_multiportnf_reject_ipv6xt_setipt_REJECTip6table_filterip_set_hash_netnf_reject_ipv4ip6_tablesip_setxt_setxt_physdeviptable_filterip_set_hash_netxt_commentbpfilterip_setxt_tcpudp8021qiptable_filterxt_markgarpbpfilterxt_addrtypebonding8021qxt_conntrackfusegarpnf_conntrackipmi_watchdogbondingnf_defrag_ipv6nfnetlink_logfusenf_defrag_ipv4nfnetlinkipmi_watchdogxt_multiporttcp_htcpnfnetlink_logipt_REJECTsch_fq_codelnfnetlinknf_reject_ipv4mgag200tcp_htcpxt_setttmsch_fq_codelip_set_hash_netdrm_kms_helpermgag200ip_setdcdbasttmiptable_filteredac_mce_amddrm_kms_helperbpfilterdell_smbiosdcdbas8021qkvm_amdedac_mce_amdgarpdell_wmi_descriptordell_smbiosbondingdrmkvm_amdfusekvmdell_wmi_descriptoripmi_watchdogfb_sys_fopsdrmnfnetlink_logsyscopyareakvmnfnetlinksysfillrectfb_sys_fopstcp_htcpipmi_sisyscopyareasch_fq_codelirqbypasssysfillrectmgag200sysimgbltipmi_sittmipmi_devintfirqbypassdrm_kms_helpercrc32_pclmulsysimgbltdcdbasi2c_algo_bitipmi_devintfedac_mce_amdghash_clmulni_intelcrc32_pclmuldell_smbiossgi2c_algo_bitkvm_amdwmighash_clmulni_inteldell_wmi_descriptork10tempsgdrmipmi_msghandlerwmikvmbuttonk10temp
? __x64_sys_futex+0x88/0x180
RDX: 00047009e9d4a290 RSI: 00000000334bd47f RDI:
0000000000000000act_policeksys_ioctl+0x70/0x80
RBP: ffffffff9337a340 R08: 0000000000000002 R09:
0000000000020100cls_basic__x64_sys_ioctl+0x16/0x20
R10: 00000000ffffffff R11: 0000000000000081 R12:
ffff9b704145a580sch_ingressdo_syscall_64+0x5b/0x190
R13: 0000000000000001 R14: 0000000000000001 R15:
00047009e9d4a290sch_htbentry_SYSCALL_64_after_hwframe+0x44/0xa9netconsoledo_idle+0x1de/0x240
RIP: 0033:0x7f56000eb777ebtable_filtercpu_startup_entry+0x6f/0x80
Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff
ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01
48ebtablesstart_secondary+0x1a7/0x200
RSP: 002b:00007f55db7fb628 EFLAGS: 00000246ip6t_REJECT
ORIG_RAX: 0000000000000010secondary_startup_64+0xa4/0xb0nf_reject_ipv6
RAX: ffffffffffffffda RBX: 000000000000ae80 RCX:
00007f56000eb777ip6table_filter
---[ end trace 0d5897b92aff0c87 ]---
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000025ip6_tables
RBP: 0000000000000000 R08: 00007f55ec0d2330 R09: 00000000ffffffffxt_physdev
R10: 00005585bae5c8e0 R11: 0000000000000246 R12: 00007f55e8d8b000xt_comment
R13: 00007f5604388000 R14: 0000000000000000 R15: 00007f55e8d8b000
xt_tcpudp xt_mark
---[ end trace 0d5897b92aff0c88 ]---
xt_addrtype xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
xt_multiport ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set
iptable_filter bpfilter 8021q garp bonding fuse ipmi_watchdog
nfnetlink_log nfnetlink tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper
dcdbas edac_mce_amd dell_smbios kvm_amd dell_wmi_descriptor drm kvm
fb_sys_fops syscopyarea sysfillrect ipmi_si irqbypass sysimgblt
ipmi_devintf crc32_pclmul i2c_algo_bit ghash_clmulni_intel
------------[ cut here ]------------sg
sched: Unexpected reschedule of offline CPU#1!wmi
WARNING: CPU: 33 PID: 2699426 at arch/x86/kernel/smp.c:128
native_smp_send_reschedule+0x34/0x40k10temp
Modules linked
in:ipmi_msghandleract_policebuttoncls_basicvhost_netsch_ingresstunsch_htbvhostnetconsoletapebtable_filterip_tableshrtimer_interrupt+0x101/0x240smp_apic_timer_interrupt+0x6a/0x130apic_timer_interrupt+0xf/0x20</IRQ>
RIP: 0010:cpuidle_enter_state+0xb2/0x330
Code: 64 65 9a 6d e8 ef b3 a5 ff 49 89 c7 0f 1f 44 00 00 31 ff e8 30 c4
a5 ff 80 7c 24 17 00 0f 85 1c 02 00 00 fb 66 0f 1f 44 00 00 <4c> 2b 7c
24 08 48 ba cf f7 53 e3 a5 9b c4 20 4c 89 f8 49 c1 ff 3f
RSP: 0018:ffffbbc9d8b03e70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: ffff9beebfba0840 RBX: ffff9baebe0ff800 RCX: 000000000000001f
RDX: 00047009e9d427d5 RSI: 00000000334bd47f RDI: 0000000000000000
RBP: ffffffff9337a340 R08: 0000000000000002 R09: 0000000000020100
R10: 00000000ffffffff R11: 000000000000012f R12: ffff9bafc1ec4b00
R13: 0000000000000001 R14: 0000000000000001 R15:
00047009e9d427d5do_idle+0x1de/0x240cpu_startup_entry+0x6f/0x80start_secondary+0x1a7/0x200secondary_startup_64+0xa4/0xb0
---[ end trace 0d5897b92aff0c8b ]---
Modules linked in: act_police cls_basic sch_ingress sch_htb netconsole
ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter
ip6_tables xt_physdev xt_comment xt_tcpudp xt_mark xt_addrtype
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set iptable_filter
bpfilter 8021q garp bonding fuse ipmi_watchdog nfnetlink_log nfnetlink
tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper dcdbas edac_mce_amd
dell_smbios kvm_amd dell_wmi_descriptor drm kvm fb_sys_fops syscopyarea
sysfillrect ipmi_si irqbypass sysimgblt ipmi_devintf crc32_pclmul
i2c_algo_bit ghash_clmulni_intel sg wmi k10temp ipmi_msghandler button
vhost_net tun vhost tap ip_tables x_tables btrfs zstd_decompress
zstd_compress xxhash raid10 raid456 async_raid6_recov
WARNING: CPU: 34 PID: 0 at arch/x86/kernel/smp.c:128
native_smp_send_reschedule+0x34/0x40
async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid0
multipath linear raid1 mlx5_core md_mod devlink tg3 sd_mod xhci_pci ptp
dm_mod xhci_hcd nvme i2c_piix4 libphy pps_core nvme_core usbcore ahci
i2c_core usb_common libahci [last unloaded: cpuid]
Modules linked in: act_police cls_basic sch_ingress sch_htb netconsole
ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter
ip6_tables xt_physdev xt_comment xt_tcpudp xt_mark xt_addrtype
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set iptable_filter
bpfilter 8021q garp bonding fuse ipmi_watchdog nfnetlink_log nfnetlink
tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper dcdbas edac_mce_amd
dell_smbios kvm_amd dell_wmi_descriptor drm kvm fb_sys_fops syscopyarea
sysfillrect ipmi_si irqbypass sysimgblt ipmi_devintf crc32_pclmul
i2c_algo_bit ghash_clmulni_intel sg wmi k10temp ipmi_msghandler button
vhost_net tun vhost tap ip_tables x_tables btrfs zstd_decompress
zstd_compress xxhash raid10 raid456 async_raid6_recov
CPU: 35 PID: 2224815 Comm: kvm Tainted: G W L 4.19.115+53-ph #1
async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid0
multipath linear raid1 mlx5_core md_mod devlink tg3 sd_mod xhci_pci ptp
dm_mod xhci_hcd nvme i2c_piix4 libphy pps_core nvme_core usbcore ahci
i2c_core usb_common libahci [last unloaded: cpuid]
Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
RIP: 0010:set_task_cpu+0x173/0x180
CPU: 34 PID: 0 Comm: swapper/34 Tainted: G W L 4.19.115+53-ph #1
Code: e9 54 ff ff ff 0f 0b e9 dc fe ff ff 8b 43 60 85 c0 0f 84 e8 fe ff
ff 8b 43 60 83 f8 02 0f 84 dc fe ff ff 0f 0b e9 d5 fe ff ff <0f> 0b e9
df fe ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 41 55 41 54
Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
RIP: 0010:native_smp_send_reschedule+0x34/0x40
RSP: 0018:ffff9baebfec3e50 EFLAGS: 00010006
Code: 05 01 ed 3f 01 73 15 48 8b 05 f8 94 09 01 be fd 00 00 00 48 8b 40
30 e9 4a 29 bc 00 89 fe 48 c7 c7 c0 13 03 93 e8 ec 0d 05 00 <0f> 0b c3
66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 83 ec 20
RAX: 0000000000000200 RBX: ffff9be89db94b00 RCX: ffff9be89db94b00
RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff9be89db94b00
RSP: 0018:ffff9baebfe837f8 EFLAGS: 00010082
RBP: ffff9be89db94e68 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
RAX: 0000000000000000 RBX: ffff9ca1948da580 RCX: 0000000000000006
RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff9baebfe955c0
R13: 0000000000000003 R14: 0000000000000046 R15: 0000000000000003
FS: 0000000000000000(0000) GS:ffff9baebfec0000(0000) knlGS:0000000000000000
RBP: 0000000000020840 R08: 0000000000000001 R09: 0000000000000000
R10: 00000000f4240000 R11: 000000000000002f R12: ffff9ca1948db0cc
------------[ cut here ]------------
sched: Unexpected reschedule of offline CPU#15!
WARNING: CPU: 35 PID: 2224815 at arch/x86/kernel/smp.c:128
native_smp_send_reschedule+0x34/0x40
Modules linked in: act_police cls_basic sch_ingress sch_htb netconsole
ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter
ip6_tables xt_physdev xt_comment xt_tcpudp xt_mark xt_addrtype
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set iptable_filter
bpfilter 8021q garp bonding fuse ipmi_watchdog nfnetlink_log nfnetlink
tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper dcdbas edac_mce_amd
dell_smbios kvm_amd dell_wmi_descriptor drm kvm fb_sys_fops syscopyarea
sysfillrect ipmi_si irqbypass sysimgblt ipmi_devintf crc32_pclmul
i2c_algo_bit ghash_clmulni_intel sg wmi k10temp ipmi_msghandler button
vhost_net tun vhost tap ip_tables x_tables btrfs zstd_decompress
zstd_compress xxhash raid10 raid456 async_raid6_recov
async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid0
multipath linear raid1 mlx5_core md_mod devlink tg3 sd_mod xhci_pci ptp
dm_mod xhci_hcd nvme i2c_piix4 libphy pps_core nvme_core usbcore ahci
i2c_core usb_common libahci [last unloaded: cpuid]
CPU: 35 PID: 2224815 Comm: kvm Tainted: G W L 4.19.115+53-ph #1
Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
RIP: 0010:native_smp_send_reschedule+0x34/0x40
Code: 05 01 ed 3f 01 73 15 48 8b 05 f8 94 09 01 be fd 00 00 00 48 8b 40
30 e9 4a 29 bc 00 89 fe 48 c7 c7 c0 13 03 93 e8 ec 0d 05 00 <0f> 0b c3
66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 83 ec 20
RSP: 0018:ffff9baebfec3d78 EFLAGS: 00010086
RAX: 0000000000000000 RBX: ffff9bae9a8fcb00 RCX: 0000000000000006
RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff9baebfed55c0
RBP: 0000000000020840 R08: 0000000000000001 R09: 0000000000000000
R10: ffff9baebfec3d28 R11: 0000000000000034 R12: ffff9bae9a8fd64c
R13: 0000000000000004 R14: 0000000000000046 R15: 000000000000000f
FS: 0000000000000000(0000) GS:ffff9baebfec0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007efdd36ffab8 CR3: 0000012c3faa2000 CR4: 0000000000340ee0
Call
Trace:<IRQ>try_to_wake_up+0x3fc/0x510autoremove_wake_function+0x11/0x50__wake_up_common+0x96/0x180__wake_up_common_lock+0x7c/0xc0irq_work_run_list+0x50/0x80
?
tick_sched_do_timer+0x80/0x80update_process_times+0x3b/0x50tick_sched_handle+0x22/0x60tick_sched_timer+0x37/0x70__hrtimer_run_queues+0xfc/0x270hrtimer_interrupt+0x101/0x240smp_apic_timer_interrupt+0x6a/0x130apic_timer_interrupt+0xf/0x20</IRQ>
RIP: 0010:svm_handle_external_intr+0xc/0x20 [kvm_amd]
Code: 83 0c 03 00 00 20 75 02 5b c3 48 89 df 5b e9 cb e4 ff ff 90 66 2e
0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 fb 66 0f 1f 44 00 00 <90> fa 66
0f 1f 44 00 00 c3 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
RSP: 0018:ffffbbc9f1117d18 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: ffffffffc0930790 RBX: ffff9c7f9d2c0000 RCX: 0000000100000000
RDX: 000b133f62ad4458 RSI: fff6d37b61b9824a RDI: ffff9c7f9d2c0000
RBP: ffffbbc9f1117da0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 000b133f62ad4458
R13: 0000000000000000 R14: ffff9c7f9d2c4328 R15: ffff9be510f24000
? svm_sync_dirty_debug_regs+0xc0/0xc0 [kvm_amd]
vcpu_enter_guest+0x3e9/0x1520 [kvm]
? ktime_get+0x3e/0xb0
? __apic_accept_irq+0x2ae/0x320 [kvm]
? kvm_arch_vcpu_ioctl_run+0x2f6/0x520 [kvm]
kvm_arch_vcpu_ioctl_run+0x2f6/0x520 [kvm]
kvm_vcpu_ioctl+0x388/0x5d0 [kvm]
? do_futex+0x12c/0x5d0do_vfs_ioctl+0xa2/0x640
Kernel Offset: 0x11000000 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)
Rebooting in 20 seconds..

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup while deleting tap interface from vlan aware bridge
  2020-04-29 20:52 BUG: soft lockup while deleting tap interface from vlan aware bridge Stefan Priebe - Profihost AG
@ 2020-04-29 21:23   ` Nikolay Aleksandrov
  2020-04-30 10:55   ` [Bridge] " Ido Schimmel
  1 sibling, 0 replies; 15+ messages in thread
From: Nikolay Aleksandrov @ 2020-04-29 21:23 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, roopa; +Cc: davem, bridge, netdev

On 4/29/20 11:52 PM, Stefan Priebe - Profihost AG wrote:
> Hello,
> 
> while running a stable vanilla kernel 4.19.115 i'm reproducably get this
> one:
> 
> watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [bridge:3570653]
> 
> ...
> 
> Call
> Trace:nbp_vlan_delete+0x59/0xa0br_vlan_info+0x66/0xd0br_afspec+0x18c/0x1d0br_dellink+0x74/0xd0rtnl_bridge_dellink+0x110/0x220rtnetlink_rcv_msg+0x283/0x360
>       ?
> rtnl_calcit.isra.29+0x110/0x110netlink_rcv_skb+0xde/0x110netlink_unicast+0x1be/0x250netlink_sendmsg+0x2c1/0x3b0___sys_sendmsg+0x308/0x320
>       ?
> __sys_sendmsg+0x63/0xa0__sys_sendmsg+0x63/0xa0do_syscall_64+0x5b/0x190entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> full dump below. This only happens if there is enough load on the
> machine. I can't reproduce this on a test machine while running only 1-5
> VMs with tap devices enslaved to a bridge. But i was able to shut down 3
> qemu nodes in a row while just executing:
> bridge vlan del dev tap123 vid 1-4094
> 
> Is this a known bug in 4.19.X?
> 
> Thanks for your help.
> > Greets,
> Stefan
> 

Hi Stefan,
It seems something (your email client?) has mangled the email below and has made
it harder to understand the traces. I can guess a few ways this might've happened
(e.g. way too many fdb entries per-vlan with thousands of vlans, although I've tested
with over 10 million and couldn't reach it). Could you please share more about the setup ?
How many fdb entries does it have ? How many ports per bridge, how many vlans per port ?

Most importantly could you please get a trace on the latest kernel ?

Also what does "enough load" mean ?

Thank you,
  Nik

> watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [bridge:3570653]
> Modules linked in: act_police cls_basic sch_ingress sch_htb netconsole
> ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter
> ip6_tables xt_physdev xt_comment xt_tcpudp xt_mark xt_addrtype
> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
> ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set iptable_filter
> bpfilter 8021q garp bonding fuse ipmi_watchdog nfnetlink_log nfnetlink
> tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper dcdbas edac_mce_amd
> dell_smbios kvm_amd dell_wmi_descriptor drm kvm fb_sys_fops syscopyarea
> sysfillrect ipmi_si irqbypass sysimgblt ipmi_devintf crc32_pclmul
> i2c_algo_bit ghash_clmulni_intel sg wmi k10temp ipmi_msghandler button
> vhost_net tun vhost tap ip_tables x_tables btrfs zstd_decompress
> zstd_compress xxhash raid10 raid456 async_raid6_recov
> async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid0
> multipath linear raid1 mlx5_core md_mod devlink tg3 sd_mod xhci_pci ptp
> dm_mod xhci_hcd nvme i2c_piix4 libphy pps_core nvme_core usbcore ahci
> i2c_core usb_common libahci [last unloaded: cpuid]
> CPU: 38 PID: 3570653 Comm: bridge Not tainted 4.19.115+53-ph #1
> Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
> RIP: 0010:br_fdb_delete_by_port+0x48/0xb0
> Code: d5 48 89 ef e8 09 51 08 00 49 8b 84 24 98 04 00 00 48 85 c0 75 0a
> eb 44 48 85 db 74 3f 48 89 d8 48 83 e8 18 74 36 48 8b 70 08 <48> 8b 58
> 18 4c 39 fe 75 e5 45 85 f6 0f b6 50 28 75 0b f6 c2 02 75
> RSP: 0018:ffffbbc9f21afa58 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
> RAX: ffff9cabf1a74a80 RBX: ffff9cabf1a74a98 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffff9cae289f8a00 RDI: ffff9baebe6a9844
> RBP: ffff9baebe6a9844 R08: 0000000000000000 R09: ffffffff933d3840
> R10: fffff675235b9800 R11: 0000000000000001 R12: ffff9baebe6a9840
> R13: 0000000000000e23 R14: 0000000000000000 R15: ffff9cad832f5c00
> FS: 00007f446531a540(0000) GS:ffff9baebff80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f44652c4443 CR3: 000000127b3f8000 CR4: 0000000000340ee0
> Call
> Trace:nbp_vlan_delete+0x59/0xa0br_vlan_info+0x66/0xd0br_afspec+0x18c/0x1d0br_dellink+0x74/0xd0rtnl_bridge_dellink+0x110/0x220rtnetlink_rcv_msg+0x283/0x360
> ?
> rtnl_calcit.isra.29+0x110/0x110netlink_rcv_skb+0xde/0x110netlink_unicast+0x1be/0x250netlink_sendmsg+0x2c1/0x3b0___sys_sendmsg+0x308/0x320
> ?
> __sys_sendmsg+0x63/0xa0__sys_sendmsg+0x63/0xa0do_syscall_64+0x5b/0x190entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f4465252c74
> Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00
> 48 8d 05 89 5a 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00
> f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53
> RSP: 002b:00007fff564feba8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 000000005ea9d931 RCX: 00007f4465252c74
> RDX: 0000000000000000 RSI: 00007fff564febf0 RDI: 0000000000000003
> RBP: 00007fff564febf0 R08: 0000000000000004 R09: 0000000000000008
> R10: fffffffffffffb8e R11: 0000000000000246 R12: 00007fff564fec30
> R13: 000056355724f2c0 R14: 00007fff56506ca0 R15: 0000000000000000
> Kernel panic - not syncing: softlockup: hung tasks
> CPU: 38 PID: 3570653 Comm: bridge Tainted: G L 4.19.115+53-ph #1
> Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
> Call
> Trace:<IRQ>dump_stack+0x66/0x8bpanic+0xe4/0x252watchdog_timer_fn+0x263/0x270
> ?
> softlockup_fn+0x40/0x40__hrtimer_run_queues+0xfc/0x270hrtimer_interrupt+0x101/0x240smp_apic_timer_interrupt+0x6a/0x130apic_timer_interrupt+0xf/0x20</IRQ>
> RIP: 0010:br_fdb_delete_by_port+0x48/0xb0
> Code: d5 48 89 ef e8 09 51 08 00 49 8b 84 24 98 04 00 00 48 85 c0 75 0a
> eb 44 48 85 db 74 3f 48 89 d8 48 83 e8 18 74 36 48 8b 70 08 <48> 8b 58
> 18 4c 39 fe 75 e5 45 85 f6 0f b6 50 28 75 0b f6 c2 02 75
> RSP: 0018:ffffbbc9f21afa58 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
> RAX: ffff9cabf1a74a80 RBX: ffff9cabf1a74a98 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffff9cae289f8a00 RDI: ffff9baebe6a9844
> RBP: ffff9baebe6a9844 R08: 0000000000000000 R09: ffffffff933d3840
> R10: fffff675235b9800 R11: 0000000000000001 R12: ffff9baebe6a9840
> R13: 0000000000000e23 R14: 0000000000000000 R15: ffff9cad832f5c00
> ?
> br_fdb_delete_by_port+0x27/0xb0nbp_vlan_delete+0x59/0xa0br_vlan_info+0x66/0xd0br_afspec+0x18c/0x1d0br_dellink+0x74/0xd0rtnl_bridge_dellink+0x110/0x220rtnetlink_rcv_msg+0x283/0x360
> ?
> rtnl_calcit.isra.29+0x110/0x110netlink_rcv_skb+0xde/0x110netlink_unicast+0x1be/0x250netlink_sendmsg+0x2c1/0x3b0___sys_sendmsg+0x308/0x320
> ?
> __sys_sendmsg+0x63/0xa0__sys_sendmsg+0x63/0xa0do_syscall_64+0x5b/0x190entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f4465252c74
> Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00
> 48 8d 05 89 5a 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00
> f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53
> RSP: 002b:00007fff564feba8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 000000005ea9d931 RCX: 00007f4465252c74
> RDX: 0000000000000000 RSI: 00007fff564febf0 RDI: 0000000000000003
> RBP: 00007fff564febf0 R08: 0000000000000004 R09: 0000000000000008
> R10: fffffffffffffb8e R11: 0000000000000246 R12: 00007fff564fec30
> R13: 000056355724f2c0 R14: 00007fff56506ca0 R15: 0000000000000000
> ------------[ cut here ]------------
> WARNING: CPU: 35 PID: 2224815 at kernel/sched/core.c:1168
> set_task_cpu+0x173/0x180
> ------------[ cut here ]------------
> sched: Unexpected reschedule of offline CPU#13!
> WARNING: CPU: 46 PID: 0 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x34/0x40
> ------------[ cut here ]------------
> Modules linked in:
> sched: Unexpected reschedule of offline CPU#0!act_police
> WARNING: CPU: 33 PID: 2699426 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x34/0x40cls_basic
> Modules linked
> in:sch_ingressact_policesch_htbcls_basicnetconsolesch_ingressebtable_filtersch_htbebtablesnetconsoleip6t_REJECTebtable_filternf_reject_ipv6ebtablesip6table_filterip6t_REJECT
> ------------[ cut here ]------------ip6_tablesnf_reject_ipv6xt_physdev
> sched: Unexpected reschedule of offline CPU#37!ip6table_filterxt_comment
> WARNING: CPU: 39 PID: 0 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x34/0x40ip6_tablesxt_tcpudpxt_physdev
> Modules linked
> in:xt_markxt_commentact_policext_addrtypext_tcpudpcls_basicxt_conntrackxt_marksch_ingressnf_conntrackxt_addrtypesch_htbnf_defrag_ipv6xt_conntracknetconsolenf_defrag_ipv4nf_conntrackebtable_filterxt_multiportnf_defrag_ipv6ebtablesipt_REJECTnf_defrag_ipv4ip6t_REJECTnf_reject_ipv4xt_multiportnf_reject_ipv6xt_setipt_REJECTip6table_filterip_set_hash_netnf_reject_ipv4ip6_tablesip_setxt_setxt_physdeviptable_filterip_set_hash_netxt_commentbpfilterip_setxt_tcpudp8021qiptable_filterxt_markgarpbpfilterxt_addrtypebonding8021qxt_conntrackfusegarpnf_conntrackipmi_watchdogbondingnf_defrag_ipv6nfnetlink_logfusenf_defrag_ipv4nfnetlinkipmi_watchdogxt_multiporttcp_htcpnfnetlink_logipt_REJECTsch_fq_codelnfnetlinknf_reject_ipv4mgag200tcp_htcpxt_setttmsch_fq_codelip_set_hash_netdrm_kms_helpermgag200ip_setdcdbasttmiptable_filteredac_mce_amddrm_kms_helperbpfilterdell_smbiosdcdbas8021qkvm_amdedac_mce_amdgarpdell_wmi_descriptordell_smbiosbondingdrmkvm_amdfusekvmdell_wmi_descriptoripmi_watchdogfb_sys_fopsdrmnfnetlink_logsyscopyareakvmnfnetlinksysfillrectfb_sys_fopstcp_htcpipmi_sisyscopyareasch_fq_codelirqbypasssysfillrectmgag200sysimgbltipmi_sittmipmi_devintfirqbypassdrm_kms_helpercrc32_pclmulsysimgbltdcdbasi2c_algo_bitipmi_devintfedac_mce_amdghash_clmulni_intelcrc32_pclmuldell_smbiossgi2c_algo_bitkvm_amdwmighash_clmulni_inteldell_wmi_descriptork10tempsgdrmipmi_msghandlerwmikvmbuttonk10temp
> ? __x64_sys_futex+0x88/0x180
> RDX: 00047009e9d4a290 RSI: 00000000334bd47f RDI:
> 0000000000000000act_policeksys_ioctl+0x70/0x80
> RBP: ffffffff9337a340 R08: 0000000000000002 R09:
> 0000000000020100cls_basic__x64_sys_ioctl+0x16/0x20
> R10: 00000000ffffffff R11: 0000000000000081 R12:
> ffff9b704145a580sch_ingressdo_syscall_64+0x5b/0x190
> R13: 0000000000000001 R14: 0000000000000001 R15:
> 00047009e9d4a290sch_htbentry_SYSCALL_64_after_hwframe+0x44/0xa9netconsoledo_idle+0x1de/0x240
> RIP: 0033:0x7f56000eb777ebtable_filtercpu_startup_entry+0x6f/0x80
> Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff
> ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01
> f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01
> 48ebtablesstart_secondary+0x1a7/0x200
> RSP: 002b:00007f55db7fb628 EFLAGS: 00000246ip6t_REJECT
> ORIG_RAX: 0000000000000010secondary_startup_64+0xa4/0xb0nf_reject_ipv6
> RAX: ffffffffffffffda RBX: 000000000000ae80 RCX:
> 00007f56000eb777ip6table_filter
> ---[ end trace 0d5897b92aff0c87 ]---
> RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000025ip6_tables
> RBP: 0000000000000000 R08: 00007f55ec0d2330 R09: 00000000ffffffffxt_physdev
> R10: 00005585bae5c8e0 R11: 0000000000000246 R12: 00007f55e8d8b000xt_comment
> R13: 00007f5604388000 R14: 0000000000000000 R15: 00007f55e8d8b000
> xt_tcpudp xt_mark
> ---[ end trace 0d5897b92aff0c88 ]---
> xt_addrtype xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
> xt_multiport ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set
> iptable_filter bpfilter 8021q garp bonding fuse ipmi_watchdog
> nfnetlink_log nfnetlink tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper
> dcdbas edac_mce_amd dell_smbios kvm_amd dell_wmi_descriptor drm kvm
> fb_sys_fops syscopyarea sysfillrect ipmi_si irqbypass sysimgblt
> ipmi_devintf crc32_pclmul i2c_algo_bit ghash_clmulni_intel
> ------------[ cut here ]------------sg
> sched: Unexpected reschedule of offline CPU#1!wmi
> WARNING: CPU: 33 PID: 2699426 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x34/0x40k10temp
> Modules linked
> in:ipmi_msghandleract_policebuttoncls_basicvhost_netsch_ingresstunsch_htbvhostnetconsoletapebtable_filterip_tableshrtimer_interrupt+0x101/0x240smp_apic_timer_interrupt+0x6a/0x130apic_timer_interrupt+0xf/0x20</IRQ>
> RIP: 0010:cpuidle_enter_state+0xb2/0x330
> Code: 64 65 9a 6d e8 ef b3 a5 ff 49 89 c7 0f 1f 44 00 00 31 ff e8 30 c4
> a5 ff 80 7c 24 17 00 0f 85 1c 02 00 00 fb 66 0f 1f 44 00 00 <4c> 2b 7c
> 24 08 48 ba cf f7 53 e3 a5 9b c4 20 4c 89 f8 49 c1 ff 3f
> RSP: 0018:ffffbbc9d8b03e70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> RAX: ffff9beebfba0840 RBX: ffff9baebe0ff800 RCX: 000000000000001f
> RDX: 00047009e9d427d5 RSI: 00000000334bd47f RDI: 0000000000000000
> RBP: ffffffff9337a340 R08: 0000000000000002 R09: 0000000000020100
> R10: 00000000ffffffff R11: 000000000000012f R12: ffff9bafc1ec4b00
> R13: 0000000000000001 R14: 0000000000000001 R15:
> 00047009e9d427d5do_idle+0x1de/0x240cpu_startup_entry+0x6f/0x80start_secondary+0x1a7/0x200secondary_startup_64+0xa4/0xb0
> ---[ end trace 0d5897b92aff0c8b ]---
> Modules linked in: act_police cls_basic sch_ingress sch_htb netconsole
> ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter
> ip6_tables xt_physdev xt_comment xt_tcpudp xt_mark xt_addrtype
> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
> ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set iptable_filter
> bpfilter 8021q garp bonding fuse ipmi_watchdog nfnetlink_log nfnetlink
> tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper dcdbas edac_mce_amd
> dell_smbios kvm_amd dell_wmi_descriptor drm kvm fb_sys_fops syscopyarea
> sysfillrect ipmi_si irqbypass sysimgblt ipmi_devintf crc32_pclmul
> i2c_algo_bit ghash_clmulni_intel sg wmi k10temp ipmi_msghandler button
> vhost_net tun vhost tap ip_tables x_tables btrfs zstd_decompress
> zstd_compress xxhash raid10 raid456 async_raid6_recov
> WARNING: CPU: 34 PID: 0 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x34/0x40
> async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid0
> multipath linear raid1 mlx5_core md_mod devlink tg3 sd_mod xhci_pci ptp
> dm_mod xhci_hcd nvme i2c_piix4 libphy pps_core nvme_core usbcore ahci
> i2c_core usb_common libahci [last unloaded: cpuid]
> Modules linked in: act_police cls_basic sch_ingress sch_htb netconsole
> ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter
> ip6_tables xt_physdev xt_comment xt_tcpudp xt_mark xt_addrtype
> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
> ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set iptable_filter
> bpfilter 8021q garp bonding fuse ipmi_watchdog nfnetlink_log nfnetlink
> tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper dcdbas edac_mce_amd
> dell_smbios kvm_amd dell_wmi_descriptor drm kvm fb_sys_fops syscopyarea
> sysfillrect ipmi_si irqbypass sysimgblt ipmi_devintf crc32_pclmul
> i2c_algo_bit ghash_clmulni_intel sg wmi k10temp ipmi_msghandler button
> vhost_net tun vhost tap ip_tables x_tables btrfs zstd_decompress
> zstd_compress xxhash raid10 raid456 async_raid6_recov
> CPU: 35 PID: 2224815 Comm: kvm Tainted: G W L 4.19.115+53-ph #1
> async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid0
> multipath linear raid1 mlx5_core md_mod devlink tg3 sd_mod xhci_pci ptp
> dm_mod xhci_hcd nvme i2c_piix4 libphy pps_core nvme_core usbcore ahci
> i2c_core usb_common libahci [last unloaded: cpuid]
> Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
> RIP: 0010:set_task_cpu+0x173/0x180
> CPU: 34 PID: 0 Comm: swapper/34 Tainted: G W L 4.19.115+53-ph #1
> Code: e9 54 ff ff ff 0f 0b e9 dc fe ff ff 8b 43 60 85 c0 0f 84 e8 fe ff
> ff 8b 43 60 83 f8 02 0f 84 dc fe ff ff 0f 0b e9 d5 fe ff ff <0f> 0b e9
> df fe ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 41 55 41 54
> Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
> RIP: 0010:native_smp_send_reschedule+0x34/0x40
> RSP: 0018:ffff9baebfec3e50 EFLAGS: 00010006
> Code: 05 01 ed 3f 01 73 15 48 8b 05 f8 94 09 01 be fd 00 00 00 48 8b 40
> 30 e9 4a 29 bc 00 89 fe 48 c7 c7 c0 13 03 93 e8 ec 0d 05 00 <0f> 0b c3
> 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 83 ec 20
> RAX: 0000000000000200 RBX: ffff9be89db94b00 RCX: ffff9be89db94b00
> RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff9be89db94b00
> RSP: 0018:ffff9baebfe837f8 EFLAGS: 00010082
> RBP: ffff9be89db94e68 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
> RAX: 0000000000000000 RBX: ffff9ca1948da580 RCX: 0000000000000006
> RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff9baebfe955c0
> R13: 0000000000000003 R14: 0000000000000046 R15: 0000000000000003
> FS: 0000000000000000(0000) GS:ffff9baebfec0000(0000) knlGS:0000000000000000
> RBP: 0000000000020840 R08: 0000000000000001 R09: 0000000000000000
> R10: 00000000f4240000 R11: 000000000000002f R12: ffff9ca1948db0cc
> ------------[ cut here ]------------
> sched: Unexpected reschedule of offline CPU#15!
> WARNING: CPU: 35 PID: 2224815 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x34/0x40
> Modules linked in: act_police cls_basic sch_ingress sch_htb netconsole
> ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter
> ip6_tables xt_physdev xt_comment xt_tcpudp xt_mark xt_addrtype
> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
> ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set iptable_filter
> bpfilter 8021q garp bonding fuse ipmi_watchdog nfnetlink_log nfnetlink
> tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper dcdbas edac_mce_amd
> dell_smbios kvm_amd dell_wmi_descriptor drm kvm fb_sys_fops syscopyarea
> sysfillrect ipmi_si irqbypass sysimgblt ipmi_devintf crc32_pclmul
> i2c_algo_bit ghash_clmulni_intel sg wmi k10temp ipmi_msghandler button
> vhost_net tun vhost tap ip_tables x_tables btrfs zstd_decompress
> zstd_compress xxhash raid10 raid456 async_raid6_recov
> async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid0
> multipath linear raid1 mlx5_core md_mod devlink tg3 sd_mod xhci_pci ptp
> dm_mod xhci_hcd nvme i2c_piix4 libphy pps_core nvme_core usbcore ahci
> i2c_core usb_common libahci [last unloaded: cpuid]
> CPU: 35 PID: 2224815 Comm: kvm Tainted: G W L 4.19.115+53-ph #1
> Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
> RIP: 0010:native_smp_send_reschedule+0x34/0x40
> Code: 05 01 ed 3f 01 73 15 48 8b 05 f8 94 09 01 be fd 00 00 00 48 8b 40
> 30 e9 4a 29 bc 00 89 fe 48 c7 c7 c0 13 03 93 e8 ec 0d 05 00 <0f> 0b c3
> 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 83 ec 20
> RSP: 0018:ffff9baebfec3d78 EFLAGS: 00010086
> RAX: 0000000000000000 RBX: ffff9bae9a8fcb00 RCX: 0000000000000006
> RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff9baebfed55c0
> RBP: 0000000000020840 R08: 0000000000000001 R09: 0000000000000000
> R10: ffff9baebfec3d28 R11: 0000000000000034 R12: ffff9bae9a8fd64c
> R13: 0000000000000004 R14: 0000000000000046 R15: 000000000000000f
> FS: 0000000000000000(0000) GS:ffff9baebfec0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007efdd36ffab8 CR3: 0000012c3faa2000 CR4: 0000000000340ee0
> Call
> Trace:<IRQ>try_to_wake_up+0x3fc/0x510autoremove_wake_function+0x11/0x50__wake_up_common+0x96/0x180__wake_up_common_lock+0x7c/0xc0irq_work_run_list+0x50/0x80
> ?
> tick_sched_do_timer+0x80/0x80update_process_times+0x3b/0x50tick_sched_handle+0x22/0x60tick_sched_timer+0x37/0x70__hrtimer_run_queues+0xfc/0x270hrtimer_interrupt+0x101/0x240smp_apic_timer_interrupt+0x6a/0x130apic_timer_interrupt+0xf/0x20</IRQ>
> RIP: 0010:svm_handle_external_intr+0xc/0x20 [kvm_amd]
> Code: 83 0c 03 00 00 20 75 02 5b c3 48 89 df 5b e9 cb e4 ff ff 90 66 2e
> 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 fb 66 0f 1f 44 00 00 <90> fa 66
> 0f 1f 44 00 00 c3 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> RSP: 0018:ffffbbc9f1117d18 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> RAX: ffffffffc0930790 RBX: ffff9c7f9d2c0000 RCX: 0000000100000000
> RDX: 000b133f62ad4458 RSI: fff6d37b61b9824a RDI: ffff9c7f9d2c0000
> RBP: ffffbbc9f1117da0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 000b133f62ad4458
> R13: 0000000000000000 R14: ffff9c7f9d2c4328 R15: ffff9be510f24000
> ? svm_sync_dirty_debug_regs+0xc0/0xc0 [kvm_amd]
> vcpu_enter_guest+0x3e9/0x1520 [kvm]
> ? ktime_get+0x3e/0xb0
> ? __apic_accept_irq+0x2ae/0x320 [kvm]
> ? kvm_arch_vcpu_ioctl_run+0x2f6/0x520 [kvm]
> kvm_arch_vcpu_ioctl_run+0x2f6/0x520 [kvm]
> kvm_vcpu_ioctl+0x388/0x5d0 [kvm]
> ? do_futex+0x12c/0x5d0do_vfs_ioctl+0xa2/0x640
> Kernel Offset: 0x11000000 from 0xffffffff81000000 (relocation range:
> 0xffffffff80000000-0xffffffffbfffffff)
> Rebooting in 20 seconds..
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bridge] BUG: soft lockup while deleting tap interface from vlan aware bridge
@ 2020-04-29 21:23   ` Nikolay Aleksandrov
  0 siblings, 0 replies; 15+ messages in thread
From: Nikolay Aleksandrov @ 2020-04-29 21:23 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, roopa; +Cc: netdev, bridge, davem

On 4/29/20 11:52 PM, Stefan Priebe - Profihost AG wrote:
> Hello,
> 
> while running a stable vanilla kernel 4.19.115 i'm reproducably get this
> one:
> 
> watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [bridge:3570653]
> 
> ...
> 
> Call
> Trace:nbp_vlan_delete+0x59/0xa0br_vlan_info+0x66/0xd0br_afspec+0x18c/0x1d0br_dellink+0x74/0xd0rtnl_bridge_dellink+0x110/0x220rtnetlink_rcv_msg+0x283/0x360
>       ?
> rtnl_calcit.isra.29+0x110/0x110netlink_rcv_skb+0xde/0x110netlink_unicast+0x1be/0x250netlink_sendmsg+0x2c1/0x3b0___sys_sendmsg+0x308/0x320
>       ?
> __sys_sendmsg+0x63/0xa0__sys_sendmsg+0x63/0xa0do_syscall_64+0x5b/0x190entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> full dump below. This only happens if there is enough load on the
> machine. I can't reproduce this on a test machine while running only 1-5
> VMs with tap devices enslaved to a bridge. But i was able to shut down 3
> qemu nodes in a row while just executing:
> bridge vlan del dev tap123 vid 1-4094
> 
> Is this a known bug in 4.19.X?
> 
> Thanks for your help.
> > Greets,
> Stefan
> 

Hi Stefan,
It seems something (your email client?) has mangled the email below and has made
it harder to understand the traces. I can guess a few ways this might've happened
(e.g. way too many fdb entries per-vlan with thousands of vlans, although I've tested
with over 10 million and couldn't reach it). Could you please share more about the setup ?
How many fdb entries does it have ? How many ports per bridge, how many vlans per port ?

Most importantly could you please get a trace on the latest kernel ?

Also what does "enough load" mean ?

Thank you,
  Nik

> watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [bridge:3570653]
> Modules linked in: act_police cls_basic sch_ingress sch_htb netconsole
> ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter
> ip6_tables xt_physdev xt_comment xt_tcpudp xt_mark xt_addrtype
> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
> ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set iptable_filter
> bpfilter 8021q garp bonding fuse ipmi_watchdog nfnetlink_log nfnetlink
> tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper dcdbas edac_mce_amd
> dell_smbios kvm_amd dell_wmi_descriptor drm kvm fb_sys_fops syscopyarea
> sysfillrect ipmi_si irqbypass sysimgblt ipmi_devintf crc32_pclmul
> i2c_algo_bit ghash_clmulni_intel sg wmi k10temp ipmi_msghandler button
> vhost_net tun vhost tap ip_tables x_tables btrfs zstd_decompress
> zstd_compress xxhash raid10 raid456 async_raid6_recov
> async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid0
> multipath linear raid1 mlx5_core md_mod devlink tg3 sd_mod xhci_pci ptp
> dm_mod xhci_hcd nvme i2c_piix4 libphy pps_core nvme_core usbcore ahci
> i2c_core usb_common libahci [last unloaded: cpuid]
> CPU: 38 PID: 3570653 Comm: bridge Not tainted 4.19.115+53-ph #1
> Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
> RIP: 0010:br_fdb_delete_by_port+0x48/0xb0
> Code: d5 48 89 ef e8 09 51 08 00 49 8b 84 24 98 04 00 00 48 85 c0 75 0a
> eb 44 48 85 db 74 3f 48 89 d8 48 83 e8 18 74 36 48 8b 70 08 <48> 8b 58
> 18 4c 39 fe 75 e5 45 85 f6 0f b6 50 28 75 0b f6 c2 02 75
> RSP: 0018:ffffbbc9f21afa58 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
> RAX: ffff9cabf1a74a80 RBX: ffff9cabf1a74a98 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffff9cae289f8a00 RDI: ffff9baebe6a9844
> RBP: ffff9baebe6a9844 R08: 0000000000000000 R09: ffffffff933d3840
> R10: fffff675235b9800 R11: 0000000000000001 R12: ffff9baebe6a9840
> R13: 0000000000000e23 R14: 0000000000000000 R15: ffff9cad832f5c00
> FS: 00007f446531a540(0000) GS:ffff9baebff80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f44652c4443 CR3: 000000127b3f8000 CR4: 0000000000340ee0
> Call
> Trace:nbp_vlan_delete+0x59/0xa0br_vlan_info+0x66/0xd0br_afspec+0x18c/0x1d0br_dellink+0x74/0xd0rtnl_bridge_dellink+0x110/0x220rtnetlink_rcv_msg+0x283/0x360
> ?
> rtnl_calcit.isra.29+0x110/0x110netlink_rcv_skb+0xde/0x110netlink_unicast+0x1be/0x250netlink_sendmsg+0x2c1/0x3b0___sys_sendmsg+0x308/0x320
> ?
> __sys_sendmsg+0x63/0xa0__sys_sendmsg+0x63/0xa0do_syscall_64+0x5b/0x190entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f4465252c74
> Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00
> 48 8d 05 89 5a 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00
> f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53
> RSP: 002b:00007fff564feba8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 000000005ea9d931 RCX: 00007f4465252c74
> RDX: 0000000000000000 RSI: 00007fff564febf0 RDI: 0000000000000003
> RBP: 00007fff564febf0 R08: 0000000000000004 R09: 0000000000000008
> R10: fffffffffffffb8e R11: 0000000000000246 R12: 00007fff564fec30
> R13: 000056355724f2c0 R14: 00007fff56506ca0 R15: 0000000000000000
> Kernel panic - not syncing: softlockup: hung tasks
> CPU: 38 PID: 3570653 Comm: bridge Tainted: G L 4.19.115+53-ph #1
> Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
> Call
> Trace:<IRQ>dump_stack+0x66/0x8bpanic+0xe4/0x252watchdog_timer_fn+0x263/0x270
> ?
> softlockup_fn+0x40/0x40__hrtimer_run_queues+0xfc/0x270hrtimer_interrupt+0x101/0x240smp_apic_timer_interrupt+0x6a/0x130apic_timer_interrupt+0xf/0x20</IRQ>
> RIP: 0010:br_fdb_delete_by_port+0x48/0xb0
> Code: d5 48 89 ef e8 09 51 08 00 49 8b 84 24 98 04 00 00 48 85 c0 75 0a
> eb 44 48 85 db 74 3f 48 89 d8 48 83 e8 18 74 36 48 8b 70 08 <48> 8b 58
> 18 4c 39 fe 75 e5 45 85 f6 0f b6 50 28 75 0b f6 c2 02 75
> RSP: 0018:ffffbbc9f21afa58 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
> RAX: ffff9cabf1a74a80 RBX: ffff9cabf1a74a98 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffff9cae289f8a00 RDI: ffff9baebe6a9844
> RBP: ffff9baebe6a9844 R08: 0000000000000000 R09: ffffffff933d3840
> R10: fffff675235b9800 R11: 0000000000000001 R12: ffff9baebe6a9840
> R13: 0000000000000e23 R14: 0000000000000000 R15: ffff9cad832f5c00
> ?
> br_fdb_delete_by_port+0x27/0xb0nbp_vlan_delete+0x59/0xa0br_vlan_info+0x66/0xd0br_afspec+0x18c/0x1d0br_dellink+0x74/0xd0rtnl_bridge_dellink+0x110/0x220rtnetlink_rcv_msg+0x283/0x360
> ?
> rtnl_calcit.isra.29+0x110/0x110netlink_rcv_skb+0xde/0x110netlink_unicast+0x1be/0x250netlink_sendmsg+0x2c1/0x3b0___sys_sendmsg+0x308/0x320
> ?
> __sys_sendmsg+0x63/0xa0__sys_sendmsg+0x63/0xa0do_syscall_64+0x5b/0x190entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f4465252c74
> Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00
> 48 8d 05 89 5a 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00
> f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53
> RSP: 002b:00007fff564feba8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 000000005ea9d931 RCX: 00007f4465252c74
> RDX: 0000000000000000 RSI: 00007fff564febf0 RDI: 0000000000000003
> RBP: 00007fff564febf0 R08: 0000000000000004 R09: 0000000000000008
> R10: fffffffffffffb8e R11: 0000000000000246 R12: 00007fff564fec30
> R13: 000056355724f2c0 R14: 00007fff56506ca0 R15: 0000000000000000
> ------------[ cut here ]------------
> WARNING: CPU: 35 PID: 2224815 at kernel/sched/core.c:1168
> set_task_cpu+0x173/0x180
> ------------[ cut here ]------------
> sched: Unexpected reschedule of offline CPU#13!
> WARNING: CPU: 46 PID: 0 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x34/0x40
> ------------[ cut here ]------------
> Modules linked in:
> sched: Unexpected reschedule of offline CPU#0!act_police
> WARNING: CPU: 33 PID: 2699426 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x34/0x40cls_basic
> Modules linked
> in:sch_ingressact_policesch_htbcls_basicnetconsolesch_ingressebtable_filtersch_htbebtablesnetconsoleip6t_REJECTebtable_filternf_reject_ipv6ebtablesip6table_filterip6t_REJECT
> ------------[ cut here ]------------ip6_tablesnf_reject_ipv6xt_physdev
> sched: Unexpected reschedule of offline CPU#37!ip6table_filterxt_comment
> WARNING: CPU: 39 PID: 0 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x34/0x40ip6_tablesxt_tcpudpxt_physdev
> Modules linked
> in:xt_markxt_commentact_policext_addrtypext_tcpudpcls_basicxt_conntrackxt_marksch_ingressnf_conntrackxt_addrtypesch_htbnf_defrag_ipv6xt_conntracknetconsolenf_defrag_ipv4nf_conntrackebtable_filterxt_multiportnf_defrag_ipv6ebtablesipt_REJECTnf_defrag_ipv4ip6t_REJECTnf_reject_ipv4xt_multiportnf_reject_ipv6xt_setipt_REJECTip6table_filterip_set_hash_netnf_reject_ipv4ip6_tablesip_setxt_setxt_physdeviptable_filterip_set_hash_netxt_commentbpfilterip_setxt_tcpudp8021qiptable_filterxt_markgarpbpfilterxt_addrtypebonding8021qxt_conntrackfusegarpnf_conntrackipmi_watchdogbondingnf_defrag_ipv6nfnetlink_logfusenf_defrag_ipv4nfnetlinkipmi_watchdogxt_multiporttcp_htcpnfnetlink_logipt_REJECTsch_fq_codelnfnetlinknf_reject_ipv4mgag200tcp_htcpxt_setttmsch_fq_codelip_set_hash_netdrm_kms_helpermgag200ip_setdcdbasttmiptable_filteredac_mce_amddrm_kms_helperbpfilterdell_smbiosdcdbas8021qkvm_amdedac_mce_amdgarpdell_wmi_descriptordell_smbiosbondingdrmkvm_amdfusekvmdell_wmi_descriptoripmi_watchdogfb_sys_fopsdrmnfnetlink_logsyscopyareakvmnfnetlinksysfillrectfb_sys_fopstcp_htcpipmi_sisyscopyareasch_fq_codelirqbypasssysfillrectmgag200sysimgbltipmi_sittmipmi_devintfirqbypassdrm_kms_helpercrc32_pclmulsysimgbltdcdbasi2c_algo_bitipmi_devintfedac_mce_amdghash_clmulni_intelcrc32_pclmuldell_smbiossgi2c_algo_bitkvm_amdwmighash_clmulni_inteldell_wmi_descriptork10tempsgdrmipmi_msghandlerwmikvmbuttonk10temp
> ? __x64_sys_futex+0x88/0x180
> RDX: 00047009e9d4a290 RSI: 00000000334bd47f RDI:
> 0000000000000000act_policeksys_ioctl+0x70/0x80
> RBP: ffffffff9337a340 R08: 0000000000000002 R09:
> 0000000000020100cls_basic__x64_sys_ioctl+0x16/0x20
> R10: 00000000ffffffff R11: 0000000000000081 R12:
> ffff9b704145a580sch_ingressdo_syscall_64+0x5b/0x190
> R13: 0000000000000001 R14: 0000000000000001 R15:
> 00047009e9d4a290sch_htbentry_SYSCALL_64_after_hwframe+0x44/0xa9netconsoledo_idle+0x1de/0x240
> RIP: 0033:0x7f56000eb777ebtable_filtercpu_startup_entry+0x6f/0x80
> Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff
> ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01
> f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01
> 48ebtablesstart_secondary+0x1a7/0x200
> RSP: 002b:00007f55db7fb628 EFLAGS: 00000246ip6t_REJECT
> ORIG_RAX: 0000000000000010secondary_startup_64+0xa4/0xb0nf_reject_ipv6
> RAX: ffffffffffffffda RBX: 000000000000ae80 RCX:
> 00007f56000eb777ip6table_filter
> ---[ end trace 0d5897b92aff0c87 ]---
> RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000025ip6_tables
> RBP: 0000000000000000 R08: 00007f55ec0d2330 R09: 00000000ffffffffxt_physdev
> R10: 00005585bae5c8e0 R11: 0000000000000246 R12: 00007f55e8d8b000xt_comment
> R13: 00007f5604388000 R14: 0000000000000000 R15: 00007f55e8d8b000
> xt_tcpudp xt_mark
> ---[ end trace 0d5897b92aff0c88 ]---
> xt_addrtype xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
> xt_multiport ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set
> iptable_filter bpfilter 8021q garp bonding fuse ipmi_watchdog
> nfnetlink_log nfnetlink tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper
> dcdbas edac_mce_amd dell_smbios kvm_amd dell_wmi_descriptor drm kvm
> fb_sys_fops syscopyarea sysfillrect ipmi_si irqbypass sysimgblt
> ipmi_devintf crc32_pclmul i2c_algo_bit ghash_clmulni_intel
> ------------[ cut here ]------------sg
> sched: Unexpected reschedule of offline CPU#1!wmi
> WARNING: CPU: 33 PID: 2699426 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x34/0x40k10temp
> Modules linked
> in:ipmi_msghandleract_policebuttoncls_basicvhost_netsch_ingresstunsch_htbvhostnetconsoletapebtable_filterip_tableshrtimer_interrupt+0x101/0x240smp_apic_timer_interrupt+0x6a/0x130apic_timer_interrupt+0xf/0x20</IRQ>
> RIP: 0010:cpuidle_enter_state+0xb2/0x330
> Code: 64 65 9a 6d e8 ef b3 a5 ff 49 89 c7 0f 1f 44 00 00 31 ff e8 30 c4
> a5 ff 80 7c 24 17 00 0f 85 1c 02 00 00 fb 66 0f 1f 44 00 00 <4c> 2b 7c
> 24 08 48 ba cf f7 53 e3 a5 9b c4 20 4c 89 f8 49 c1 ff 3f
> RSP: 0018:ffffbbc9d8b03e70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> RAX: ffff9beebfba0840 RBX: ffff9baebe0ff800 RCX: 000000000000001f
> RDX: 00047009e9d427d5 RSI: 00000000334bd47f RDI: 0000000000000000
> RBP: ffffffff9337a340 R08: 0000000000000002 R09: 0000000000020100
> R10: 00000000ffffffff R11: 000000000000012f R12: ffff9bafc1ec4b00
> R13: 0000000000000001 R14: 0000000000000001 R15:
> 00047009e9d427d5do_idle+0x1de/0x240cpu_startup_entry+0x6f/0x80start_secondary+0x1a7/0x200secondary_startup_64+0xa4/0xb0
> ---[ end trace 0d5897b92aff0c8b ]---
> Modules linked in: act_police cls_basic sch_ingress sch_htb netconsole
> ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter
> ip6_tables xt_physdev xt_comment xt_tcpudp xt_mark xt_addrtype
> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
> ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set iptable_filter
> bpfilter 8021q garp bonding fuse ipmi_watchdog nfnetlink_log nfnetlink
> tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper dcdbas edac_mce_amd
> dell_smbios kvm_amd dell_wmi_descriptor drm kvm fb_sys_fops syscopyarea
> sysfillrect ipmi_si irqbypass sysimgblt ipmi_devintf crc32_pclmul
> i2c_algo_bit ghash_clmulni_intel sg wmi k10temp ipmi_msghandler button
> vhost_net tun vhost tap ip_tables x_tables btrfs zstd_decompress
> zstd_compress xxhash raid10 raid456 async_raid6_recov
> WARNING: CPU: 34 PID: 0 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x34/0x40
> async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid0
> multipath linear raid1 mlx5_core md_mod devlink tg3 sd_mod xhci_pci ptp
> dm_mod xhci_hcd nvme i2c_piix4 libphy pps_core nvme_core usbcore ahci
> i2c_core usb_common libahci [last unloaded: cpuid]
> Modules linked in: act_police cls_basic sch_ingress sch_htb netconsole
> ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter
> ip6_tables xt_physdev xt_comment xt_tcpudp xt_mark xt_addrtype
> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
> ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set iptable_filter
> bpfilter 8021q garp bonding fuse ipmi_watchdog nfnetlink_log nfnetlink
> tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper dcdbas edac_mce_amd
> dell_smbios kvm_amd dell_wmi_descriptor drm kvm fb_sys_fops syscopyarea
> sysfillrect ipmi_si irqbypass sysimgblt ipmi_devintf crc32_pclmul
> i2c_algo_bit ghash_clmulni_intel sg wmi k10temp ipmi_msghandler button
> vhost_net tun vhost tap ip_tables x_tables btrfs zstd_decompress
> zstd_compress xxhash raid10 raid456 async_raid6_recov
> CPU: 35 PID: 2224815 Comm: kvm Tainted: G W L 4.19.115+53-ph #1
> async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid0
> multipath linear raid1 mlx5_core md_mod devlink tg3 sd_mod xhci_pci ptp
> dm_mod xhci_hcd nvme i2c_piix4 libphy pps_core nvme_core usbcore ahci
> i2c_core usb_common libahci [last unloaded: cpuid]
> Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
> RIP: 0010:set_task_cpu+0x173/0x180
> CPU: 34 PID: 0 Comm: swapper/34 Tainted: G W L 4.19.115+53-ph #1
> Code: e9 54 ff ff ff 0f 0b e9 dc fe ff ff 8b 43 60 85 c0 0f 84 e8 fe ff
> ff 8b 43 60 83 f8 02 0f 84 dc fe ff ff 0f 0b e9 d5 fe ff ff <0f> 0b e9
> df fe ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 41 55 41 54
> Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
> RIP: 0010:native_smp_send_reschedule+0x34/0x40
> RSP: 0018:ffff9baebfec3e50 EFLAGS: 00010006
> Code: 05 01 ed 3f 01 73 15 48 8b 05 f8 94 09 01 be fd 00 00 00 48 8b 40
> 30 e9 4a 29 bc 00 89 fe 48 c7 c7 c0 13 03 93 e8 ec 0d 05 00 <0f> 0b c3
> 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 83 ec 20
> RAX: 0000000000000200 RBX: ffff9be89db94b00 RCX: ffff9be89db94b00
> RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff9be89db94b00
> RSP: 0018:ffff9baebfe837f8 EFLAGS: 00010082
> RBP: ffff9be89db94e68 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
> RAX: 0000000000000000 RBX: ffff9ca1948da580 RCX: 0000000000000006
> RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff9baebfe955c0
> R13: 0000000000000003 R14: 0000000000000046 R15: 0000000000000003
> FS: 0000000000000000(0000) GS:ffff9baebfec0000(0000) knlGS:0000000000000000
> RBP: 0000000000020840 R08: 0000000000000001 R09: 0000000000000000
> R10: 00000000f4240000 R11: 000000000000002f R12: ffff9ca1948db0cc
> ------------[ cut here ]------------
> sched: Unexpected reschedule of offline CPU#15!
> WARNING: CPU: 35 PID: 2224815 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x34/0x40
> Modules linked in: act_police cls_basic sch_ingress sch_htb netconsole
> ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter
> ip6_tables xt_physdev xt_comment xt_tcpudp xt_mark xt_addrtype
> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
> ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set iptable_filter
> bpfilter 8021q garp bonding fuse ipmi_watchdog nfnetlink_log nfnetlink
> tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper dcdbas edac_mce_amd
> dell_smbios kvm_amd dell_wmi_descriptor drm kvm fb_sys_fops syscopyarea
> sysfillrect ipmi_si irqbypass sysimgblt ipmi_devintf crc32_pclmul
> i2c_algo_bit ghash_clmulni_intel sg wmi k10temp ipmi_msghandler button
> vhost_net tun vhost tap ip_tables x_tables btrfs zstd_decompress
> zstd_compress xxhash raid10 raid456 async_raid6_recov
> async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid0
> multipath linear raid1 mlx5_core md_mod devlink tg3 sd_mod xhci_pci ptp
> dm_mod xhci_hcd nvme i2c_piix4 libphy pps_core nvme_core usbcore ahci
> i2c_core usb_common libahci [last unloaded: cpuid]
> CPU: 35 PID: 2224815 Comm: kvm Tainted: G W L 4.19.115+53-ph #1
> Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
> RIP: 0010:native_smp_send_reschedule+0x34/0x40
> Code: 05 01 ed 3f 01 73 15 48 8b 05 f8 94 09 01 be fd 00 00 00 48 8b 40
> 30 e9 4a 29 bc 00 89 fe 48 c7 c7 c0 13 03 93 e8 ec 0d 05 00 <0f> 0b c3
> 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 83 ec 20
> RSP: 0018:ffff9baebfec3d78 EFLAGS: 00010086
> RAX: 0000000000000000 RBX: ffff9bae9a8fcb00 RCX: 0000000000000006
> RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff9baebfed55c0
> RBP: 0000000000020840 R08: 0000000000000001 R09: 0000000000000000
> R10: ffff9baebfec3d28 R11: 0000000000000034 R12: ffff9bae9a8fd64c
> R13: 0000000000000004 R14: 0000000000000046 R15: 000000000000000f
> FS: 0000000000000000(0000) GS:ffff9baebfec0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007efdd36ffab8 CR3: 0000012c3faa2000 CR4: 0000000000340ee0
> Call
> Trace:<IRQ>try_to_wake_up+0x3fc/0x510autoremove_wake_function+0x11/0x50__wake_up_common+0x96/0x180__wake_up_common_lock+0x7c/0xc0irq_work_run_list+0x50/0x80
> ?
> tick_sched_do_timer+0x80/0x80update_process_times+0x3b/0x50tick_sched_handle+0x22/0x60tick_sched_timer+0x37/0x70__hrtimer_run_queues+0xfc/0x270hrtimer_interrupt+0x101/0x240smp_apic_timer_interrupt+0x6a/0x130apic_timer_interrupt+0xf/0x20</IRQ>
> RIP: 0010:svm_handle_external_intr+0xc/0x20 [kvm_amd]
> Code: 83 0c 03 00 00 20 75 02 5b c3 48 89 df 5b e9 cb e4 ff ff 90 66 2e
> 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 fb 66 0f 1f 44 00 00 <90> fa 66
> 0f 1f 44 00 00 c3 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> RSP: 0018:ffffbbc9f1117d18 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> RAX: ffffffffc0930790 RBX: ffff9c7f9d2c0000 RCX: 0000000100000000
> RDX: 000b133f62ad4458 RSI: fff6d37b61b9824a RDI: ffff9c7f9d2c0000
> RBP: ffffbbc9f1117da0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 000b133f62ad4458
> R13: 0000000000000000 R14: ffff9c7f9d2c4328 R15: ffff9be510f24000
> ? svm_sync_dirty_debug_regs+0xc0/0xc0 [kvm_amd]
> vcpu_enter_guest+0x3e9/0x1520 [kvm]
> ? ktime_get+0x3e/0xb0
> ? __apic_accept_irq+0x2ae/0x320 [kvm]
> ? kvm_arch_vcpu_ioctl_run+0x2f6/0x520 [kvm]
> kvm_arch_vcpu_ioctl_run+0x2f6/0x520 [kvm]
> kvm_vcpu_ioctl+0x388/0x5d0 [kvm]
> ? do_futex+0x12c/0x5d0do_vfs_ioctl+0xa2/0x640
> Kernel Offset: 0x11000000 from 0xffffffff81000000 (relocation range:
> 0xffffffff80000000-0xffffffffbfffffff)
> Rebooting in 20 seconds..
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup while deleting tap interface from vlan aware bridge
  2020-04-29 21:23   ` [Bridge] " Nikolay Aleksandrov
  (?)
@ 2020-04-30  6:47   ` Stefan Priebe - Profihost AG
  -1 siblings, 0 replies; 15+ messages in thread
From: Stefan Priebe - Profihost AG @ 2020-04-30  6:47 UTC (permalink / raw)
  To: Nikolay Aleksandrov, roopa; +Cc: davem, bridge, netdev

Hello Nikolay,

Am 29.04.20 um 23:23 schrieb Nikolay Aleksandrov:
> On 4/29/20 11:52 PM, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> while running a stable vanilla kernel 4.19.115 i'm reproducably get this
>> one:
>>
>> watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [bridge:3570653]
>>
>> ...
>>
>> Call
>> Trace:nbp_vlan_delete+0x59/0xa0br_vlan_info+0x66/0xd0br_afspec+0x18c/0x1d0br_dellink+0x74/0xd0rtnl_bridge_dellink+0x110/0x220rtnetlink_rcv_msg+0x283/0x360
>>
>>       ?
>> rtnl_calcit.isra.29+0x110/0x110netlink_rcv_skb+0xde/0x110netlink_unicast+0x1be/0x250netlink_sendmsg+0x2c1/0x3b0___sys_sendmsg+0x308/0x320
>>
>>       ?
>> __sys_sendmsg+0x63/0xa0__sys_sendmsg+0x63/0xa0do_syscall_64+0x5b/0x190entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>>
>> full dump below. This only happens if there is enough load on the
>> machine. I can't reproduce this on a test machine while running only 1-5
>> VMs with tap devices enslaved to a bridge. But i was able to shut down 3
>> qemu nodes in a row while just executing:
>> bridge vlan del dev tap123 vid 1-4094
>>
>> Is this a known bug in 4.19.X?
>>
>> Thanks for your help.
>> > Greets,
>> Stefan
>>
> 
> Hi Stefan,
> It seems something (your email client?) has mangled the email below and
> has made
> it harder to understand the traces.

No it is the netconsole which mangles the output. Sadly i've no idea how
to fix this.


> I can guess a few ways this might've
> happened
> (e.g. way too many fdb entries per-vlan with thousands of vlans,
> although I've tested
> with over 10 million and couldn't reach it). Could you please share more
> about the setup ?
> How many fdb entries does it have ? How many ports per bridge, how
> many vlans per port ?

The vlan aware bridge just has 3 vlans. But all tap devices attached to
the bridge were accidently created with
vlan 2-4094

There are 80-100 tap devices per bridge.

I can work around the issue by setting nomaster for the tap device before.

> Most importantly could you please get a trace on the latest kernel ?

I can only reproduce this on production servers to i can't manage to
switch the kernel.

> Also what does "enough load" mean ?

no real idea - i'm just not able to reproduce it on non production
machines having no real load (CPU, Network traffic...).

> 
> Thank you,
>  Nik
> 
>> watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [bridge:3570653]
>> Modules linked in: act_police cls_basic sch_ingress sch_htb netconsole
>> ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter
>> ip6_tables xt_physdev xt_comment xt_tcpudp xt_mark xt_addrtype
>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
>> ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set iptable_filter
>> bpfilter 8021q garp bonding fuse ipmi_watchdog nfnetlink_log nfnetlink
>> tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper dcdbas edac_mce_amd
>> dell_smbios kvm_amd dell_wmi_descriptor drm kvm fb_sys_fops syscopyarea
>> sysfillrect ipmi_si irqbypass sysimgblt ipmi_devintf crc32_pclmul
>> i2c_algo_bit ghash_clmulni_intel sg wmi k10temp ipmi_msghandler button
>> vhost_net tun vhost tap ip_tables x_tables btrfs zstd_decompress
>> zstd_compress xxhash raid10 raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid0
>> multipath linear raid1 mlx5_core md_mod devlink tg3 sd_mod xhci_pci ptp
>> dm_mod xhci_hcd nvme i2c_piix4 libphy pps_core nvme_core usbcore ahci
>> i2c_core usb_common libahci [last unloaded: cpuid]
>> CPU: 38 PID: 3570653 Comm: bridge Not tainted 4.19.115+53-ph #1
>> Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
>> RIP: 0010:br_fdb_delete_by_port+0x48/0xb0
>> Code: d5 48 89 ef e8 09 51 08 00 49 8b 84 24 98 04 00 00 48 85 c0 75 0a
>> eb 44 48 85 db 74 3f 48 89 d8 48 83 e8 18 74 36 48 8b 70 08 <48> 8b 58
>> 18 4c 39 fe 75 e5 45 85 f6 0f b6 50 28 75 0b f6 c2 02 75
>> RSP: 0018:ffffbbc9f21afa58 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
>> RAX: ffff9cabf1a74a80 RBX: ffff9cabf1a74a98 RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: ffff9cae289f8a00 RDI: ffff9baebe6a9844
>> RBP: ffff9baebe6a9844 R08: 0000000000000000 R09: ffffffff933d3840
>> R10: fffff675235b9800 R11: 0000000000000001 R12: ffff9baebe6a9840
>> R13: 0000000000000e23 R14: 0000000000000000 R15: ffff9cad832f5c00
>> FS: 00007f446531a540(0000) GS:ffff9baebff80000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f44652c4443 CR3: 000000127b3f8000 CR4: 0000000000340ee0
>> Call
>> Trace:nbp_vlan_delete+0x59/0xa0br_vlan_info+0x66/0xd0br_afspec+0x18c/0x1d0br_dellink+0x74/0xd0rtnl_bridge_dellink+0x110/0x220rtnetlink_rcv_msg+0x283/0x360
>>
>> ?
>> rtnl_calcit.isra.29+0x110/0x110netlink_rcv_skb+0xde/0x110netlink_unicast+0x1be/0x250netlink_sendmsg+0x2c1/0x3b0___sys_sendmsg+0x308/0x320
>>
>> ?
>> __sys_sendmsg+0x63/0xa0__sys_sendmsg+0x63/0xa0do_syscall_64+0x5b/0x190entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> RIP: 0033:0x7f4465252c74
>> Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00
>> 48 8d 05 89 5a 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00
>> f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53
>> RSP: 002b:00007fff564feba8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>> RAX: ffffffffffffffda RBX: 000000005ea9d931 RCX: 00007f4465252c74
>> RDX: 0000000000000000 RSI: 00007fff564febf0 RDI: 0000000000000003
>> RBP: 00007fff564febf0 R08: 0000000000000004 R09: 0000000000000008
>> R10: fffffffffffffb8e R11: 0000000000000246 R12: 00007fff564fec30
>> R13: 000056355724f2c0 R14: 00007fff56506ca0 R15: 0000000000000000
>> Kernel panic - not syncing: softlockup: hung tasks
>> CPU: 38 PID: 3570653 Comm: bridge Tainted: G L 4.19.115+53-ph #1
>> Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
>> Call
>> Trace:<IRQ>dump_stack+0x66/0x8bpanic+0xe4/0x252watchdog_timer_fn+0x263/0x270
>>
>> ?
>> softlockup_fn+0x40/0x40__hrtimer_run_queues+0xfc/0x270hrtimer_interrupt+0x101/0x240smp_apic_timer_interrupt+0x6a/0x130apic_timer_interrupt+0xf/0x20</IRQ>
>>
>> RIP: 0010:br_fdb_delete_by_port+0x48/0xb0
>> Code: d5 48 89 ef e8 09 51 08 00 49 8b 84 24 98 04 00 00 48 85 c0 75 0a
>> eb 44 48 85 db 74 3f 48 89 d8 48 83 e8 18 74 36 48 8b 70 08 <48> 8b 58
>> 18 4c 39 fe 75 e5 45 85 f6 0f b6 50 28 75 0b f6 c2 02 75
>> RSP: 0018:ffffbbc9f21afa58 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
>> RAX: ffff9cabf1a74a80 RBX: ffff9cabf1a74a98 RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: ffff9cae289f8a00 RDI: ffff9baebe6a9844
>> RBP: ffff9baebe6a9844 R08: 0000000000000000 R09: ffffffff933d3840
>> R10: fffff675235b9800 R11: 0000000000000001 R12: ffff9baebe6a9840
>> R13: 0000000000000e23 R14: 0000000000000000 R15: ffff9cad832f5c00
>> ?
>> br_fdb_delete_by_port+0x27/0xb0nbp_vlan_delete+0x59/0xa0br_vlan_info+0x66/0xd0br_afspec+0x18c/0x1d0br_dellink+0x74/0xd0rtnl_bridge_dellink+0x110/0x220rtnetlink_rcv_msg+0x283/0x360
>>
>> ?
>> rtnl_calcit.isra.29+0x110/0x110netlink_rcv_skb+0xde/0x110netlink_unicast+0x1be/0x250netlink_sendmsg+0x2c1/0x3b0___sys_sendmsg+0x308/0x320
>>
>> ?
>> __sys_sendmsg+0x63/0xa0__sys_sendmsg+0x63/0xa0do_syscall_64+0x5b/0x190entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> RIP: 0033:0x7f4465252c74
>> Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00
>> 48 8d 05 89 5a 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00
>> f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53
>> RSP: 002b:00007fff564feba8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>> RAX: ffffffffffffffda RBX: 000000005ea9d931 RCX: 00007f4465252c74
>> RDX: 0000000000000000 RSI: 00007fff564febf0 RDI: 0000000000000003
>> RBP: 00007fff564febf0 R08: 0000000000000004 R09: 0000000000000008
>> R10: fffffffffffffb8e R11: 0000000000000246 R12: 00007fff564fec30
>> R13: 000056355724f2c0 R14: 00007fff56506ca0 R15: 0000000000000000
>> ------------[ cut here ]------------
>> WARNING: CPU: 35 PID: 2224815 at kernel/sched/core.c:1168
>> set_task_cpu+0x173/0x180
>> ------------[ cut here ]------------
>> sched: Unexpected reschedule of offline CPU#13!
>> WARNING: CPU: 46 PID: 0 at arch/x86/kernel/smp.c:128
>> native_smp_send_reschedule+0x34/0x40
>> ------------[ cut here ]------------
>> Modules linked in:
>> sched: Unexpected reschedule of offline CPU#0!act_police
>> WARNING: CPU: 33 PID: 2699426 at arch/x86/kernel/smp.c:128
>> native_smp_send_reschedule+0x34/0x40cls_basic
>> Modules linked
>> in:sch_ingressact_policesch_htbcls_basicnetconsolesch_ingressebtable_filtersch_htbebtablesnetconsoleip6t_REJECTebtable_filternf_reject_ipv6ebtablesip6table_filterip6t_REJECT
>>
>> ------------[ cut here ]------------ip6_tablesnf_reject_ipv6xt_physdev
>> sched: Unexpected reschedule of offline CPU#37!ip6table_filterxt_comment
>> WARNING: CPU: 39 PID: 0 at arch/x86/kernel/smp.c:128
>> native_smp_send_reschedule+0x34/0x40ip6_tablesxt_tcpudpxt_physdev
>> Modules linked
>> in:xt_markxt_commentact_policext_addrtypext_tcpudpcls_basicxt_conntrackxt_marksch_ingressnf_conntrackxt_addrtypesch_htbnf_defrag_ipv6xt_conntracknetconsolenf_defrag_ipv4nf_conntrackebtable_filterxt_multiportnf_defrag_ipv6ebtablesipt_REJECTnf_defrag_ipv4ip6t_REJECTnf_reject_ipv4xt_multiportnf_reject_ipv6xt_setipt_REJECTip6table_filterip_set_hash_netnf_reject_ipv4ip6_tablesip_setxt_setxt_physdeviptable_filterip_set_hash_netxt_commentbpfilterip_setxt_tcpudp8021qiptable_filterxt_markgarpbpfilterxt_addrtypebonding8021qxt_conntrackfusegarpnf_conntrackipmi_watchdogbondingnf_defrag_ipv6nfnetlink_logfusenf_defrag_ipv4nfnetlinkipmi_watchdogxt_multiporttcp_htcpnfnetlink_logipt_REJECTsch_fq_codelnfnetlinknf_reject_ipv4mgag200tcp_htcpxt_setttmsch_fq_codelip_set_hash_netdrm_kms_helpermgag200ip_setdcdbasttmiptable_filteredac_mce_amddrm_kms_helperbpfilterdell_smbiosdcdbas8021qkvm_amdedac_mce_amdgarpdell_wmi_descriptordell_smbiosbondingdrmkvm_amdfusekvmdell_wmi_descriptoripmi_watchdogfb_sys_fopsdrmnfnetlink_logsyscopyareakvmnfnetlinksysfillrectfb_sys_fopstcp_htcpipmi_sisyscopyareasch_fq_codelirqbypasssysfillrectmgag200sysimgbltipmi_sittmipmi_devintfirqbypassdrm_kms_helpercrc32_pclmulsysimgbltdcdbasi2c_algo_bitipmi_devintfedac_mce_amdghash_clmulni_intelcrc32_pclmuldell_smbiossgi2c_algo_bitkvm_amdwmighash_clmulni_inteldell_wmi_descriptork10tempsgdrmipmi_msghandlerwmikvmbuttonk10temp
>>
>> ? __x64_sys_futex+0x88/0x180
>> RDX: 00047009e9d4a290 RSI: 00000000334bd47f RDI:
>> 0000000000000000act_policeksys_ioctl+0x70/0x80
>> RBP: ffffffff9337a340 R08: 0000000000000002 R09:
>> 0000000000020100cls_basic__x64_sys_ioctl+0x16/0x20
>> R10: 00000000ffffffff R11: 0000000000000081 R12:
>> ffff9b704145a580sch_ingressdo_syscall_64+0x5b/0x190
>> R13: 0000000000000001 R14: 0000000000000001 R15:
>> 00047009e9d4a290sch_htbentry_SYSCALL_64_after_hwframe+0x44/0xa9netconsoledo_idle+0x1de/0x240
>>
>> RIP: 0033:0x7f56000eb777ebtable_filtercpu_startup_entry+0x6f/0x80
>> Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff
>> ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01
>> f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01
>> 48ebtablesstart_secondary+0x1a7/0x200
>> RSP: 002b:00007f55db7fb628 EFLAGS: 00000246ip6t_REJECT
>> ORIG_RAX: 0000000000000010secondary_startup_64+0xa4/0xb0nf_reject_ipv6
>> RAX: ffffffffffffffda RBX: 000000000000ae80 RCX:
>> 00007f56000eb777ip6table_filter
>> ---[ end trace 0d5897b92aff0c87 ]---
>> RDX: 0000000000000000 RSI: 000000000000ae80 RDI:
>> 0000000000000025ip6_tables
>> RBP: 0000000000000000 R08: 00007f55ec0d2330 R09:
>> 00000000ffffffffxt_physdev
>> R10: 00005585bae5c8e0 R11: 0000000000000246 R12:
>> 00007f55e8d8b000xt_comment
>> R13: 00007f5604388000 R14: 0000000000000000 R15: 00007f55e8d8b000
>> xt_tcpudp xt_mark
>> ---[ end trace 0d5897b92aff0c88 ]---
>> xt_addrtype xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
>> xt_multiport ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set
>> iptable_filter bpfilter 8021q garp bonding fuse ipmi_watchdog
>> nfnetlink_log nfnetlink tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper
>> dcdbas edac_mce_amd dell_smbios kvm_amd dell_wmi_descriptor drm kvm
>> fb_sys_fops syscopyarea sysfillrect ipmi_si irqbypass sysimgblt
>> ipmi_devintf crc32_pclmul i2c_algo_bit ghash_clmulni_intel
>> ------------[ cut here ]------------sg
>> sched: Unexpected reschedule of offline CPU#1!wmi
>> WARNING: CPU: 33 PID: 2699426 at arch/x86/kernel/smp.c:128
>> native_smp_send_reschedule+0x34/0x40k10temp
>> Modules linked
>> in:ipmi_msghandleract_policebuttoncls_basicvhost_netsch_ingresstunsch_htbvhostnetconsoletapebtable_filterip_tableshrtimer_interrupt+0x101/0x240smp_apic_timer_interrupt+0x6a/0x130apic_timer_interrupt+0xf/0x20</IRQ>
>>
>> RIP: 0010:cpuidle_enter_state+0xb2/0x330
>> Code: 64 65 9a 6d e8 ef b3 a5 ff 49 89 c7 0f 1f 44 00 00 31 ff e8 30 c4
>> a5 ff 80 7c 24 17 00 0f 85 1c 02 00 00 fb 66 0f 1f 44 00 00 <4c> 2b 7c
>> 24 08 48 ba cf f7 53 e3 a5 9b c4 20 4c 89 f8 49 c1 ff 3f
>> RSP: 0018:ffffbbc9d8b03e70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
>> RAX: ffff9beebfba0840 RBX: ffff9baebe0ff800 RCX: 000000000000001f
>> RDX: 00047009e9d427d5 RSI: 00000000334bd47f RDI: 0000000000000000
>> RBP: ffffffff9337a340 R08: 0000000000000002 R09: 0000000000020100
>> R10: 00000000ffffffff R11: 000000000000012f R12: ffff9bafc1ec4b00
>> R13: 0000000000000001 R14: 0000000000000001 R15:
>> 00047009e9d427d5do_idle+0x1de/0x240cpu_startup_entry+0x6f/0x80start_secondary+0x1a7/0x200secondary_startup_64+0xa4/0xb0
>>
>> ---[ end trace 0d5897b92aff0c8b ]---
>> Modules linked in: act_police cls_basic sch_ingress sch_htb netconsole
>> ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter
>> ip6_tables xt_physdev xt_comment xt_tcpudp xt_mark xt_addrtype
>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
>> ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set iptable_filter
>> bpfilter 8021q garp bonding fuse ipmi_watchdog nfnetlink_log nfnetlink
>> tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper dcdbas edac_mce_amd
>> dell_smbios kvm_amd dell_wmi_descriptor drm kvm fb_sys_fops syscopyarea
>> sysfillrect ipmi_si irqbypass sysimgblt ipmi_devintf crc32_pclmul
>> i2c_algo_bit ghash_clmulni_intel sg wmi k10temp ipmi_msghandler button
>> vhost_net tun vhost tap ip_tables x_tables btrfs zstd_decompress
>> zstd_compress xxhash raid10 raid456 async_raid6_recov
>> WARNING: CPU: 34 PID: 0 at arch/x86/kernel/smp.c:128
>> native_smp_send_reschedule+0x34/0x40
>> async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid0
>> multipath linear raid1 mlx5_core md_mod devlink tg3 sd_mod xhci_pci ptp
>> dm_mod xhci_hcd nvme i2c_piix4 libphy pps_core nvme_core usbcore ahci
>> i2c_core usb_common libahci [last unloaded: cpuid]
>> Modules linked in: act_police cls_basic sch_ingress sch_htb netconsole
>> ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter
>> ip6_tables xt_physdev xt_comment xt_tcpudp xt_mark xt_addrtype
>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
>> ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set iptable_filter
>> bpfilter 8021q garp bonding fuse ipmi_watchdog nfnetlink_log nfnetlink
>> tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper dcdbas edac_mce_amd
>> dell_smbios kvm_amd dell_wmi_descriptor drm kvm fb_sys_fops syscopyarea
>> sysfillrect ipmi_si irqbypass sysimgblt ipmi_devintf crc32_pclmul
>> i2c_algo_bit ghash_clmulni_intel sg wmi k10temp ipmi_msghandler button
>> vhost_net tun vhost tap ip_tables x_tables btrfs zstd_decompress
>> zstd_compress xxhash raid10 raid456 async_raid6_recov
>> CPU: 35 PID: 2224815 Comm: kvm Tainted: G W L 4.19.115+53-ph #1
>> async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid0
>> multipath linear raid1 mlx5_core md_mod devlink tg3 sd_mod xhci_pci ptp
>> dm_mod xhci_hcd nvme i2c_piix4 libphy pps_core nvme_core usbcore ahci
>> i2c_core usb_common libahci [last unloaded: cpuid]
>> Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
>> RIP: 0010:set_task_cpu+0x173/0x180
>> CPU: 34 PID: 0 Comm: swapper/34 Tainted: G W L 4.19.115+53-ph #1
>> Code: e9 54 ff ff ff 0f 0b e9 dc fe ff ff 8b 43 60 85 c0 0f 84 e8 fe ff
>> ff 8b 43 60 83 f8 02 0f 84 dc fe ff ff 0f 0b e9 d5 fe ff ff <0f> 0b e9
>> df fe ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 41 55 41 54
>> Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
>> RIP: 0010:native_smp_send_reschedule+0x34/0x40
>> RSP: 0018:ffff9baebfec3e50 EFLAGS: 00010006
>> Code: 05 01 ed 3f 01 73 15 48 8b 05 f8 94 09 01 be fd 00 00 00 48 8b 40
>> 30 e9 4a 29 bc 00 89 fe 48 c7 c7 c0 13 03 93 e8 ec 0d 05 00 <0f> 0b c3
>> 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 83 ec 20
>> RAX: 0000000000000200 RBX: ffff9be89db94b00 RCX: ffff9be89db94b00
>> RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff9be89db94b00
>> RSP: 0018:ffff9baebfe837f8 EFLAGS: 00010082
>> RBP: ffff9be89db94e68 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
>> RAX: 0000000000000000 RBX: ffff9ca1948da580 RCX: 0000000000000006
>> RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff9baebfe955c0
>> R13: 0000000000000003 R14: 0000000000000046 R15: 0000000000000003
>> FS: 0000000000000000(0000) GS:ffff9baebfec0000(0000)
>> knlGS:0000000000000000
>> RBP: 0000000000020840 R08: 0000000000000001 R09: 0000000000000000
>> R10: 00000000f4240000 R11: 000000000000002f R12: ffff9ca1948db0cc
>> ------------[ cut here ]------------
>> sched: Unexpected reschedule of offline CPU#15!
>> WARNING: CPU: 35 PID: 2224815 at arch/x86/kernel/smp.c:128
>> native_smp_send_reschedule+0x34/0x40
>> Modules linked in: act_police cls_basic sch_ingress sch_htb netconsole
>> ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter
>> ip6_tables xt_physdev xt_comment xt_tcpudp xt_mark xt_addrtype
>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
>> ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_net ip_set iptable_filter
>> bpfilter 8021q garp bonding fuse ipmi_watchdog nfnetlink_log nfnetlink
>> tcp_htcp sch_fq_codel mgag200 ttm drm_kms_helper dcdbas edac_mce_amd
>> dell_smbios kvm_amd dell_wmi_descriptor drm kvm fb_sys_fops syscopyarea
>> sysfillrect ipmi_si irqbypass sysimgblt ipmi_devintf crc32_pclmul
>> i2c_algo_bit ghash_clmulni_intel sg wmi k10temp ipmi_msghandler button
>> vhost_net tun vhost tap ip_tables x_tables btrfs zstd_decompress
>> zstd_compress xxhash raid10 raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid0
>> multipath linear raid1 mlx5_core md_mod devlink tg3 sd_mod xhci_pci ptp
>> dm_mod xhci_hcd nvme i2c_piix4 libphy pps_core nvme_core usbcore ahci
>> i2c_core usb_common libahci [last unloaded: cpuid]
>> CPU: 35 PID: 2224815 Comm: kvm Tainted: G W L 4.19.115+53-ph #1
>> Hardware name: Dell Inc. PowerEdge R7515/04F3CJ, BIOS 1.2.14 01/28/2020
>> RIP: 0010:native_smp_send_reschedule+0x34/0x40
>> Code: 05 01 ed 3f 01 73 15 48 8b 05 f8 94 09 01 be fd 00 00 00 48 8b 40
>> 30 e9 4a 29 bc 00 89 fe 48 c7 c7 c0 13 03 93 e8 ec 0d 05 00 <0f> 0b c3
>> 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 83 ec 20
>> RSP: 0018:ffff9baebfec3d78 EFLAGS: 00010086
>> RAX: 0000000000000000 RBX: ffff9bae9a8fcb00 RCX: 0000000000000006
>> RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff9baebfed55c0
>> RBP: 0000000000020840 R08: 0000000000000001 R09: 0000000000000000
>> R10: ffff9baebfec3d28 R11: 0000000000000034 R12: ffff9bae9a8fd64c
>> R13: 0000000000000004 R14: 0000000000000046 R15: 000000000000000f
>> FS: 0000000000000000(0000) GS:ffff9baebfec0000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007efdd36ffab8 CR3: 0000012c3faa2000 CR4: 0000000000340ee0
>> Call
>> Trace:<IRQ>try_to_wake_up+0x3fc/0x510autoremove_wake_function+0x11/0x50__wake_up_common+0x96/0x180__wake_up_common_lock+0x7c/0xc0irq_work_run_list+0x50/0x80
>>
>> ?
>> tick_sched_do_timer+0x80/0x80update_process_times+0x3b/0x50tick_sched_handle+0x22/0x60tick_sched_timer+0x37/0x70__hrtimer_run_queues+0xfc/0x270hrtimer_interrupt+0x101/0x240smp_apic_timer_interrupt+0x6a/0x130apic_timer_interrupt+0xf/0x20</IRQ>
>>
>> RIP: 0010:svm_handle_external_intr+0xc/0x20 [kvm_amd]
>> Code: 83 0c 03 00 00 20 75 02 5b c3 48 89 df 5b e9 cb e4 ff ff 90 66 2e
>> 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 fb 66 0f 1f 44 00 00 <90> fa 66
>> 0f 1f 44 00 00 c3 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
>> RSP: 0018:ffffbbc9f1117d18 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
>> RAX: ffffffffc0930790 RBX: ffff9c7f9d2c0000 RCX: 0000000100000000
>> RDX: 000b133f62ad4458 RSI: fff6d37b61b9824a RDI: ffff9c7f9d2c0000
>> RBP: ffffbbc9f1117da0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: 000b133f62ad4458
>> R13: 0000000000000000 R14: ffff9c7f9d2c4328 R15: ffff9be510f24000
>> ? svm_sync_dirty_debug_regs+0xc0/0xc0 [kvm_amd]
>> vcpu_enter_guest+0x3e9/0x1520 [kvm]
>> ? ktime_get+0x3e/0xb0
>> ? __apic_accept_irq+0x2ae/0x320 [kvm]
>> ? kvm_arch_vcpu_ioctl_run+0x2f6/0x520 [kvm]
>> kvm_arch_vcpu_ioctl_run+0x2f6/0x520 [kvm]
>> kvm_vcpu_ioctl+0x388/0x5d0 [kvm]
>> ? do_futex+0x12c/0x5d0do_vfs_ioctl+0xa2/0x640
>> Kernel Offset: 0x11000000 from 0xffffffff81000000 (relocation range:
>> 0xffffffff80000000-0xffffffffbfffffff)
>> Rebooting in 20 seconds..
>>
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup while deleting tap interface from vlan aware bridge
  2020-04-29 20:52 BUG: soft lockup while deleting tap interface from vlan aware bridge Stefan Priebe - Profihost AG
@ 2020-04-30 10:55   ` Ido Schimmel
  2020-04-30 10:55   ` [Bridge] " Ido Schimmel
  1 sibling, 0 replies; 15+ messages in thread
From: Ido Schimmel @ 2020-04-30 10:55 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: roopa, nikolay, davem, bridge, netdev

On Wed, Apr 29, 2020 at 10:52:35PM +0200, Stefan Priebe - Profihost AG wrote:
> Hello,
> 
> while running a stable vanilla kernel 4.19.115 i'm reproducably get this
> one:
> 
> watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [bridge:3570653]
> 
> ...
> 
> Call
> Trace:nbp_vlan_delete+0x59/0xa0br_vlan_info+0x66/0xd0br_afspec+0x18c/0x1d0br_dellink+0x74/0xd0rtnl_bridge_dellink+0x110/0x220rtnetlink_rcv_msg+0x283/0x360

Nik, Stefan,

My theory is that 4K VLANs are deleted in a batch and preemption is
disabled (please confirm). For each VLAN the kernel needs to go over the
entire FDB and delete affected entries. If the FDB is very large or the
FDB lock is contended this can cause the kernel to loop for more than 20
seconds without calling schedule().

To reproduce I added mdelay(100) in br_fdb_delete_by_port() and ran
this:

ip link add name br10 up type bridge vlan_filtering 1
ip link add name dummy10 up type dummy
ip link set dev dummy10 master br10
bridge vlan add vid 1-4094 dev dummy10 master
bridge vlan del vid 1-4094 dev dummy10 master

Got a similar trace to Stefan's. Seems to be fixed by attached:

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index a774e19c41bb..240e260e3461 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -615,6 +615,7 @@ int br_process_vlan_info(struct net_bridge *br,
                                               v - 1, rtm_cmd);
                                v_change_start = 0;
                        }
+                       cond_resched();
                }
                /* v_change_start is set only if the last/whole range changed */
                if (v_change_start)

WDYT?

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Bridge] BUG: soft lockup while deleting tap interface from vlan aware bridge
@ 2020-04-30 10:55   ` Ido Schimmel
  0 siblings, 0 replies; 15+ messages in thread
From: Ido Schimmel @ 2020-04-30 10:55 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: nikolay, netdev, roopa, bridge, davem

On Wed, Apr 29, 2020 at 10:52:35PM +0200, Stefan Priebe - Profihost AG wrote:
> Hello,
> 
> while running a stable vanilla kernel 4.19.115 i'm reproducably get this
> one:
> 
> watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [bridge:3570653]
> 
> ...
> 
> Call
> Trace:nbp_vlan_delete+0x59/0xa0br_vlan_info+0x66/0xd0br_afspec+0x18c/0x1d0br_dellink+0x74/0xd0rtnl_bridge_dellink+0x110/0x220rtnetlink_rcv_msg+0x283/0x360

Nik, Stefan,

My theory is that 4K VLANs are deleted in a batch and preemption is
disabled (please confirm). For each VLAN the kernel needs to go over the
entire FDB and delete affected entries. If the FDB is very large or the
FDB lock is contended this can cause the kernel to loop for more than 20
seconds without calling schedule().

To reproduce I added mdelay(100) in br_fdb_delete_by_port() and ran
this:

ip link add name br10 up type bridge vlan_filtering 1
ip link add name dummy10 up type dummy
ip link set dev dummy10 master br10
bridge vlan add vid 1-4094 dev dummy10 master
bridge vlan del vid 1-4094 dev dummy10 master

Got a similar trace to Stefan's. Seems to be fixed by attached:

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index a774e19c41bb..240e260e3461 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -615,6 +615,7 @@ int br_process_vlan_info(struct net_bridge *br,
                                               v - 1, rtm_cmd);
                                v_change_start = 0;
                        }
+                       cond_resched();
                }
                /* v_change_start is set only if the last/whole range changed */
                if (v_change_start)

WDYT?

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup while deleting tap interface from vlan aware bridge
  2020-04-30 10:55   ` [Bridge] " Ido Schimmel
@ 2020-04-30 11:20     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 15+ messages in thread
From: Nikolay Aleksandrov @ 2020-04-30 11:20 UTC (permalink / raw)
  To: Ido Schimmel, Stefan Priebe - Profihost AG; +Cc: roopa, davem, bridge, netdev

On 30/04/2020 13:55, Ido Schimmel wrote:
> On Wed, Apr 29, 2020 at 10:52:35PM +0200, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> while running a stable vanilla kernel 4.19.115 i'm reproducably get this
>> one:
>>
>> watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [bridge:3570653]
>>
>> ...
>>
>> Call
>> Trace:nbp_vlan_delete+0x59/0xa0br_vlan_info+0x66/0xd0br_afspec+0x18c/0x1d0br_dellink+0x74/0xd0rtnl_bridge_dellink+0x110/0x220rtnetlink_rcv_msg+0x283/0x360
> 
> Nik, Stefan,
> 
> My theory is that 4K VLANs are deleted in a batch and preemption is
> disabled (please confirm). For each VLAN the kernel needs to go over the

Right, that's what I was expecting. :-)

> entire FDB and delete affected entries. If the FDB is very large or the
> FDB lock is contended this can cause the kernel to loop for more than 20
> seconds without calling schedule().

Indeed, we already have that issue also with expire which goes over all entries.
I have rough patches that improve the situation from way back, will have to go over and
polish them to submit when I got more time. Long ago I've tested it with expiring 10 million
entries but on a rather powerful CPU.

> 
> To reproduce I added mdelay(100) in br_fdb_delete_by_port() and ran
> this:
> 
> ip link add name br10 up type bridge vlan_filtering 1
> ip link add name dummy10 up type dummy
> ip link set dev dummy10 master br10
> bridge vlan add vid 1-4094 dev dummy10 master
> bridge vlan del vid 1-4094 dev dummy10 master
> 
> Got a similar trace to Stefan's. Seems to be fixed by attached:
> 
> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> index a774e19c41bb..240e260e3461 100644
> --- a/net/bridge/br_netlink.c
> +++ b/net/bridge/br_netlink.c
> @@ -615,6 +615,7 @@ int br_process_vlan_info(struct net_bridge *br,
>                                                v - 1, rtm_cmd);
>                                 v_change_start = 0;
>                         }
> +                       cond_resched();
>                 }
>                 /* v_change_start is set only if the last/whole range changed */
>                 if (v_change_start)
> 
> WDYT?
> 

Maybe we can batch the deletes at say 32 at a time?
Otherwise looks good to me, thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bridge] BUG: soft lockup while deleting tap interface from vlan aware bridge
@ 2020-04-30 11:20     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 15+ messages in thread
From: Nikolay Aleksandrov @ 2020-04-30 11:20 UTC (permalink / raw)
  To: Ido Schimmel, Stefan Priebe - Profihost AG; +Cc: netdev, roopa, bridge, davem

On 30/04/2020 13:55, Ido Schimmel wrote:
> On Wed, Apr 29, 2020 at 10:52:35PM +0200, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> while running a stable vanilla kernel 4.19.115 i'm reproducably get this
>> one:
>>
>> watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [bridge:3570653]
>>
>> ...
>>
>> Call
>> Trace:nbp_vlan_delete+0x59/0xa0br_vlan_info+0x66/0xd0br_afspec+0x18c/0x1d0br_dellink+0x74/0xd0rtnl_bridge_dellink+0x110/0x220rtnetlink_rcv_msg+0x283/0x360
> 
> Nik, Stefan,
> 
> My theory is that 4K VLANs are deleted in a batch and preemption is
> disabled (please confirm). For each VLAN the kernel needs to go over the

Right, that's what I was expecting. :-)

> entire FDB and delete affected entries. If the FDB is very large or the
> FDB lock is contended this can cause the kernel to loop for more than 20
> seconds without calling schedule().

Indeed, we already have that issue also with expire which goes over all entries.
I have rough patches that improve the situation from way back, will have to go over and
polish them to submit when I got more time. Long ago I've tested it with expiring 10 million
entries but on a rather powerful CPU.

> 
> To reproduce I added mdelay(100) in br_fdb_delete_by_port() and ran
> this:
> 
> ip link add name br10 up type bridge vlan_filtering 1
> ip link add name dummy10 up type dummy
> ip link set dev dummy10 master br10
> bridge vlan add vid 1-4094 dev dummy10 master
> bridge vlan del vid 1-4094 dev dummy10 master
> 
> Got a similar trace to Stefan's. Seems to be fixed by attached:
> 
> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> index a774e19c41bb..240e260e3461 100644
> --- a/net/bridge/br_netlink.c
> +++ b/net/bridge/br_netlink.c
> @@ -615,6 +615,7 @@ int br_process_vlan_info(struct net_bridge *br,
>                                                v - 1, rtm_cmd);
>                                 v_change_start = 0;
>                         }
> +                       cond_resched();
>                 }
>                 /* v_change_start is set only if the last/whole range changed */
>                 if (v_change_start)
> 
> WDYT?
> 

Maybe we can batch the deletes at say 32 at a time?
Otherwise looks good to me, thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup while deleting tap interface from vlan aware bridge
  2020-04-30 10:55   ` [Bridge] " Ido Schimmel
  (?)
  (?)
@ 2020-04-30 13:18   ` Stefan Priebe - Profihost AG
  2020-04-30 15:57       ` [Bridge] " Ido Schimmel
  -1 siblings, 1 reply; 15+ messages in thread
From: Stefan Priebe - Profihost AG @ 2020-04-30 13:18 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: roopa, nikolay, davem, bridge, netdev

uck (Vorsitzender)

Am 30.04.20 um 12:55 schrieb Ido Schimmel:
> On Wed, Apr 29, 2020 at 10:52:35PM +0200, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> while running a stable vanilla kernel 4.19.115 i'm reproducably get this
>> one:
>>
>> watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [bridge:3570653]
>>
>> ...
>>
>> Call
>> Trace:nbp_vlan_delete+0x59/0xa0br_vlan_info+0x66/0xd0br_afspec+0x18c/0x1d0br_dellink+0x74/0xd0rtnl_bridge_dellink+0x110/0x220rtnetlink_rcv_msg+0x283/0x360
> 
> Nik, Stefan,
> 
> My theory is that 4K VLANs are deleted in a batch and preemption is
> disabled (please confirm). For each VLAN the kernel needs to go over the
> entire FDB and delete affected entries. If the FDB is very large or the
> FDB lock is contended this can cause the kernel to loop for more than 20
> seconds without calling schedule().
> 
> To reproduce I added mdelay(100) in br_fdb_delete_by_port() and ran
> this:
> 
> ip link add name br10 up type bridge vlan_filtering 1
> ip link add name dummy10 up type dummy
> ip link set dev dummy10 master br10
> bridge vlan add vid 1-4094 dev dummy10 master
> bridge vlan del vid 1-4094 dev dummy10 master
> 
> Got a similar trace to Stefan's. Seems to be fixed by attached:
> 
> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> index a774e19c41bb..240e260e3461 100644
> --- a/net/bridge/br_netlink.c
> +++ b/net/bridge/br_netlink.c
> @@ -615,6 +615,7 @@ int br_process_vlan_info(struct net_bridge *br,
>                                                v - 1, rtm_cmd);
>                                 v_change_start = 0;
>                         }
> +                       cond_resched();
>                 }
>                 /* v_change_start is set only if the last/whole range changed */
>                 if (v_change_start)
> 
> WDYT?

Great! This indeed solves the problem.

Stefan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup while deleting tap interface from vlan aware bridge
  2020-04-30 11:20     ` [Bridge] " Nikolay Aleksandrov
@ 2020-04-30 15:56       ` Ido Schimmel
  -1 siblings, 0 replies; 15+ messages in thread
From: Ido Schimmel @ 2020-04-30 15:56 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Stefan Priebe - Profihost AG, roopa, davem, bridge, netdev

On Thu, Apr 30, 2020 at 02:20:23PM +0300, Nikolay Aleksandrov wrote:
> Maybe we can batch the deletes at say 32 at a time?

Hi Nik,

Thanks for looking into this!

I don't really feel comfortable hard coding an arbitrary number of
entries before calling cond_resched(). I didn't see a noticeable
difference in execution time with the previous patch versus an unpatched
kernel. Also, in the examples I saw in the networking code
cond_resched() is always called after each loop iteration.

Let me know how you prefer it and I will send a patch.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bridge] BUG: soft lockup while deleting tap interface from vlan aware bridge
@ 2020-04-30 15:56       ` Ido Schimmel
  0 siblings, 0 replies; 15+ messages in thread
From: Ido Schimmel @ 2020-04-30 15:56 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, roopa, bridge, davem, Stefan Priebe - Profihost AG

On Thu, Apr 30, 2020 at 02:20:23PM +0300, Nikolay Aleksandrov wrote:
> Maybe we can batch the deletes at say 32 at a time?

Hi Nik,

Thanks for looking into this!

I don't really feel comfortable hard coding an arbitrary number of
entries before calling cond_resched(). I didn't see a noticeable
difference in execution time with the previous patch versus an unpatched
kernel. Also, in the examples I saw in the networking code
cond_resched() is always called after each loop iteration.

Let me know how you prefer it and I will send a patch.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup while deleting tap interface from vlan aware bridge
  2020-04-30 13:18   ` Stefan Priebe - Profihost AG
@ 2020-04-30 15:57       ` Ido Schimmel
  0 siblings, 0 replies; 15+ messages in thread
From: Ido Schimmel @ 2020-04-30 15:57 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: roopa, nikolay, davem, bridge, netdev

On Thu, Apr 30, 2020 at 03:18:18PM +0200, Stefan Priebe - Profihost AG wrote:
> Great! This indeed solves the problem.

Thanks for reporting and testing!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bridge] BUG: soft lockup while deleting tap interface from vlan aware bridge
@ 2020-04-30 15:57       ` Ido Schimmel
  0 siblings, 0 replies; 15+ messages in thread
From: Ido Schimmel @ 2020-04-30 15:57 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: nikolay, netdev, roopa, bridge, davem

On Thu, Apr 30, 2020 at 03:18:18PM +0200, Stefan Priebe - Profihost AG wrote:
> Great! This indeed solves the problem.

Thanks for reporting and testing!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup while deleting tap interface from vlan aware bridge
  2020-04-30 15:56       ` [Bridge] " Ido Schimmel
@ 2020-04-30 15:57         ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 15+ messages in thread
From: Nikolay Aleksandrov @ 2020-04-30 15:57 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: Stefan Priebe - Profihost AG, roopa, davem, bridge, netdev

On 30/04/2020 18:56, Ido Schimmel wrote:
> On Thu, Apr 30, 2020 at 02:20:23PM +0300, Nikolay Aleksandrov wrote:
>> Maybe we can batch the deletes at say 32 at a time?
> 
> Hi Nik,
> 
> Thanks for looking into this!
> 
> I don't really feel comfortable hard coding an arbitrary number of
> entries before calling cond_resched(). I didn't see a noticeable
> difference in execution time with the previous patch versus an unpatched
> kernel. Also, in the examples I saw in the networking code
> cond_resched() is always called after each loop iteration.
> 
> Let me know how you prefer it and I will send a patch.
> 

that's ok, send it as it is


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bridge] BUG: soft lockup while deleting tap interface from vlan aware bridge
@ 2020-04-30 15:57         ` Nikolay Aleksandrov
  0 siblings, 0 replies; 15+ messages in thread
From: Nikolay Aleksandrov @ 2020-04-30 15:57 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, roopa, bridge, davem, Stefan Priebe - Profihost AG

On 30/04/2020 18:56, Ido Schimmel wrote:
> On Thu, Apr 30, 2020 at 02:20:23PM +0300, Nikolay Aleksandrov wrote:
>> Maybe we can batch the deletes at say 32 at a time?
> 
> Hi Nik,
> 
> Thanks for looking into this!
> 
> I don't really feel comfortable hard coding an arbitrary number of
> entries before calling cond_resched(). I didn't see a noticeable
> difference in execution time with the previous patch versus an unpatched
> kernel. Also, in the examples I saw in the networking code
> cond_resched() is always called after each loop iteration.
> 
> Let me know how you prefer it and I will send a patch.
> 

that's ok, send it as it is


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-04-30 15:58 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-29 20:52 BUG: soft lockup while deleting tap interface from vlan aware bridge Stefan Priebe - Profihost AG
2020-04-29 21:23 ` Nikolay Aleksandrov
2020-04-29 21:23   ` [Bridge] " Nikolay Aleksandrov
2020-04-30  6:47   ` Stefan Priebe - Profihost AG
2020-04-30 10:55 ` Ido Schimmel
2020-04-30 10:55   ` [Bridge] " Ido Schimmel
2020-04-30 11:20   ` Nikolay Aleksandrov
2020-04-30 11:20     ` [Bridge] " Nikolay Aleksandrov
2020-04-30 15:56     ` Ido Schimmel
2020-04-30 15:56       ` [Bridge] " Ido Schimmel
2020-04-30 15:57       ` Nikolay Aleksandrov
2020-04-30 15:57         ` [Bridge] " Nikolay Aleksandrov
2020-04-30 13:18   ` Stefan Priebe - Profihost AG
2020-04-30 15:57     ` Ido Schimmel
2020-04-30 15:57       ` [Bridge] " Ido Schimmel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.