* RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0 @ 2022-08-02 11:27 Bruno Goncalves 2022-08-02 19:23 ` Jakub Kicinski 0 siblings, 1 reply; 11+ messages in thread From: Bruno Goncalves @ 2022-08-02 11:27 UTC (permalink / raw) To: LKML, Networking; +Cc: CKI Project Hello, We've noticed the following panic when booting up kernel 5.19.0 on a specific machine. The panic seems to happen when we build the kernel with debug flags. Below is the first crash we noticed, more logs at [1] and the kernel config is at [2]. [ 59.207684] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 59.212949] CPU: 32 PID: 1967 Comm: NetworkManager Not tainted 5.19.0-rc3 #1 [ 59.220041] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant DL325 Gen10 Plus, BIOS A43 08/09/2021 [ 59.229490] RIP: 0010:qede_load.cold+0x5a1/0x819 [qede] [ 59.234757] Code: 41 88 84 24 b1 00 00 00 41 0f b7 84 24 b6 00 00 00 45 88 b4 24 b3 00 00 00 e9 b8 00 ff ff 48 c7 c1 09 66 46 c1 e9 6f ff ff ff <0f> 0b 49 8b 7c 24 08 e8 82 e2 fe ff 48 89 c1 48 85 c0 74 32 ba 2a [ 59.253639] RSP: 0018:ffffae1e04593688 EFLAGS: 00010206 [ 59.258897] RAX: 000000000000006b RBX: 0000000000000000 RCX: 0000000000000006 [ 59.266073] RDX: ffff8f8f35332be8 RSI: ffffffffaf96411f RDI: ffffffffaf8e4b1e [ 59.273250] RBP: ffff8f8f2a87acd0 R08: 0000000000000001 R09: 0000000000000001 [ 59.280426] R10: 0000000000000000 R11: 000000000f8c087f R12: ffff8f8f2a87ac00 [ 59.287602] R13: ffff8f8f34d7f928 R14: ffffae1e0c039000 R15: 0000000000000000 [ 59.294777] FS: 00007f164509f500(0000) GS:ffff8f9dfd800000(0000) knlGS:0000000000000000 [ 59.302917] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 59.308697] CR2: 00005575f29a5c08 CR3: 0000000163810000 CR4: 0000000000350ee0 [ 59.315875] Call Trace: [ 59.318335] <TASK> [ 59.320458] qede_open+0x3b/0x90 [qede] [ 59.324323] __dev_open+0xf1/0x1c0 [ 59.327748] __dev_change_flags+0x1f8/0x280 [ 59.331957] dev_change_flags+0x22/0x60 [ 59.335816] do_setlink+0x327/0x1140 [ 59.339413] ? lock_is_held_type+0xe3/0x140 [ 59.343625] ? lock_is_held_type+0xe3/0x140 [ 59.347833] ? __nla_validate_parse+0x5f/0xb70 [ 59.352307] ? mark_held_locks+0x49/0x70 [ 59.356256] ? _raw_spin_unlock_irqrestore+0x30/0x60 [ 59.361254] ? lockdep_hardirqs_on+0x7d/0x100 [ 59.365640] __rtnl_newlink+0x59c/0x950 [ 59.369502] ? rtnl_newlink+0x2a/0x60 [ 59.373185] ? rcu_read_lock_sched_held+0x3c/0x70 [ 59.377918] ? trace_kmalloc+0x30/0xf0 [ 59.381692] ? kmem_cache_alloc_trace+0x1ad/0x270 [ 59.386426] rtnl_newlink+0x43/0x60 [ 59.389936] rtnetlink_rcv_msg+0x184/0x540 [ 59.394057] ? lock_acquire+0xe2/0x2e0 [ 59.397830] ? rtnl_stats_set+0x190/0x190 [ 59.401863] netlink_rcv_skb+0x51/0xf0 [ 59.405639] netlink_unicast+0x189/0x260 [ 59.409586] netlink_sendmsg+0x25a/0x4c0 [ 59.413536] sock_sendmsg+0x5c/0x60 [ 59.417045] ____sys_sendmsg+0x22b/0x270 [ 59.420991] ? import_iovec+0x17/0x20 [ 59.424675] ? sendmsg_copy_msghdr+0x78/0xa0 [ 59.428972] ___sys_sendmsg+0x85/0xc0 [ 59.432658] ? lock_is_held_type+0xe3/0x140 [ 59.436867] ? find_held_lock+0x2b/0x80 [ 59.440727] ? lock_release+0x145/0x300 [ 59.444586] ? __fget_files+0xe5/0x170 [ 59.448360] __sys_sendmsg+0x5c/0xb0 [ 59.451961] do_syscall_64+0x5b/0x80 [ 59.455558] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 59.460641] RIP: 0033:0x7f164628539d [ 59.464251] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 0a b1 f7 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 5e b1 f7 ff 48 [ 59.483133] RSP: 002b:00007ffd9bf01520 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 59.490749] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f164628539d [ 59.497925] RDX: 0000000000000000 RSI: 00007ffd9bf01560 RDI: 000000000000000c [ 59.505100] RBP: 00005575f2915040 R08: 0000000000000000 R09: 0000000000000000 [ 59.512275] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 [ 59.519453] R13: 00007ffd9bf016c0 R14: 00007ffd9bf016bc R15: 0000000000000000 [ 59.526637] </TASK> [ 59.528834] Modules linked in: rfkill sunrpc intel_rapl_msr intel_rapl_common vfat fat qede qed edac_mce_amd i2c_piix4 crc8 rapl igb ipmi_ssif ptdma ses enclosure pcspkr dca hpilo k10temp acpi_ipmi acpi_tad ipmi_si acpi_power_meter fuse zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel mgag200 i2c_algo_bit drm_shmem_helper drm_kms_helper ghash_clmulni_intel drm hpwdt smartpqi ccp scsi_transport_sas sp5100_tco wmi ipmi_devintf ipmi_msghandler [ 59.568459] ---[ end trace 0000000000000000 ]--- [ 59.632952] RIP: 0010:qede_load.cold+0x5a1/0x819 [qede] [ 59.632967] Code: 41 88 84 24 b1 00 00 00 41 0f b7 84 24 b6 00 00 00 45 88 b4 24 b3 00 00 00 e9 b8 00 ff ff 48 c7 c1 09 66 46 c1 e9 6f ff ff ff <0f> 0b 49 8b 7c 24 08 e8 82 e2 fe ff 48 89 c1 48 85 c0 74 32 ba 2a [ 59.632970] RSP: 0018:ffffae1e04593688 EFLAGS: 00010206 [ 59.632972] RAX: 000000000000006b RBX: 0000000000000000 RCX: 0000000000000006 [ 59.632974] RDX: ffff8f8f35332be8 RSI: ffffffffaf96411f RDI: ffffffffaf8e4b1e [ 59.632977] RBP: ffff8f8f2a87acd0 R08: 0000000000000001 R09: 0000000000000001 [ 59.632978] R10: 0000000000000000 R11: 000000000f8c087f R12: ffff8f8f2a87ac00 [ 59.632980] R13: ffff8f8f34d7f928 R14: ffffae1e0c039000 R15: 0000000000000000 [ 59.632982] FS: 00007f164509f500(0000) GS:ffff8f9dfd800000(0000) knlGS:0000000000000000 [ 59.632984] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 59.632986] CR2: 00005575f29a5c08 CR3: 0000000163810000 CR4: 0000000000350ee0 [ 59.632989] Kernel panic - not syncing: Fatal exception [ 59.732905] Kernel Offset: 0x2d000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 59.807803] ---[ end Kernel panic - not syncing: Fatal exception ]--- cki issue tracker: https://datawarehouse.cki-project.org/issue/1470 [1] https://datawarehouse.cki-project.org/kcidb/tests/4002370 [2] http://s3.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/06/20/568171088/redhat:568171088/redhat:568171088_x86_64_debug/.config Thanks, Bruno Goncalves ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0 2022-08-02 11:27 RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0 Bruno Goncalves @ 2022-08-02 19:23 ` Jakub Kicinski 2022-08-03 12:13 ` Bruno Goncalves 0 siblings, 1 reply; 11+ messages in thread From: Jakub Kicinski @ 2022-08-02 19:23 UTC (permalink / raw) To: Bruno Goncalves; +Cc: LKML, Networking, CKI Project On Tue, 2 Aug 2022 13:27:32 +0200 Bruno Goncalves wrote: > Hello, > > We've noticed the following panic when booting up kernel 5.19.0 on a > specific machine. > The panic seems to happen when we build the kernel with debug flags. > Below is the first crash we noticed, more logs at [1] and the kernel > config is at [2]. > > [ 59.207684] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > [ 59.212949] CPU: 32 PID: 1967 Comm: NetworkManager Not tainted 5.19.0-rc3 #1 > [ 59.220041] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant > DL325 Gen10 Plus, BIOS A43 08/09/2021 > [ 59.229490] RIP: 0010:qede_load.cold+0x5a1/0x819 [qede] Is it this warning? WARN_ON(xdp_rxq_info_reg(&fp->rxq->xdp_rxq, edev->ndev, Would you be able to run the stacktrace thru scripts/decode_stacktrace.sh ? > [ 59.234757] Code: 41 88 84 24 b1 00 00 00 41 0f b7 84 24 b6 00 00 > 00 45 88 b4 24 b3 00 00 00 e9 b8 00 ff ff 48 c7 c1 09 66 46 c1 e9 6f > ff ff ff <0f> 0b 49 8b 7c 24 08 e8 82 e2 fe ff 48 89 c1 48 85 c0 74 32 > ba 2a > [ 59.253639] RSP: 0018:ffffae1e04593688 EFLAGS: 00010206 > [ 59.258897] RAX: 000000000000006b RBX: 0000000000000000 RCX: 0000000000000006 > [ 59.266073] RDX: ffff8f8f35332be8 RSI: ffffffffaf96411f RDI: ffffffffaf8e4b1e > [ 59.273250] RBP: ffff8f8f2a87acd0 R08: 0000000000000001 R09: 0000000000000001 > [ 59.280426] R10: 0000000000000000 R11: 000000000f8c087f R12: ffff8f8f2a87ac00 > [ 59.287602] R13: ffff8f8f34d7f928 R14: ffffae1e0c039000 R15: 0000000000000000 > [ 59.294777] FS: 00007f164509f500(0000) GS:ffff8f9dfd800000(0000) > knlGS:0000000000000000 > [ 59.302917] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 59.308697] CR2: 00005575f29a5c08 CR3: 0000000163810000 CR4: 0000000000350ee0 > [ 59.315875] Call Trace: > [ 59.318335] <TASK> > [ 59.320458] qede_open+0x3b/0x90 [qede] > [ 59.324323] __dev_open+0xf1/0x1c0 > [ 59.327748] __dev_change_flags+0x1f8/0x280 > [ 59.331957] dev_change_flags+0x22/0x60 > [ 59.335816] do_setlink+0x327/0x1140 > [ 59.339413] ? lock_is_held_type+0xe3/0x140 > [ 59.343625] ? lock_is_held_type+0xe3/0x140 > [ 59.347833] ? __nla_validate_parse+0x5f/0xb70 > [ 59.352307] ? mark_held_locks+0x49/0x70 > [ 59.356256] ? _raw_spin_unlock_irqrestore+0x30/0x60 > [ 59.361254] ? lockdep_hardirqs_on+0x7d/0x100 > [ 59.365640] __rtnl_newlink+0x59c/0x950 > [ 59.369502] ? rtnl_newlink+0x2a/0x60 > [ 59.373185] ? rcu_read_lock_sched_held+0x3c/0x70 > [ 59.377918] ? trace_kmalloc+0x30/0xf0 > [ 59.381692] ? kmem_cache_alloc_trace+0x1ad/0x270 > [ 59.386426] rtnl_newlink+0x43/0x60 > [ 59.389936] rtnetlink_rcv_msg+0x184/0x540 > [ 59.394057] ? lock_acquire+0xe2/0x2e0 > [ 59.397830] ? rtnl_stats_set+0x190/0x190 > [ 59.401863] netlink_rcv_skb+0x51/0xf0 > [ 59.405639] netlink_unicast+0x189/0x260 > [ 59.409586] netlink_sendmsg+0x25a/0x4c0 > [ 59.413536] sock_sendmsg+0x5c/0x60 > [ 59.417045] ____sys_sendmsg+0x22b/0x270 > [ 59.420991] ? import_iovec+0x17/0x20 > [ 59.424675] ? sendmsg_copy_msghdr+0x78/0xa0 > [ 59.428972] ___sys_sendmsg+0x85/0xc0 > [ 59.432658] ? lock_is_held_type+0xe3/0x140 > [ 59.436867] ? find_held_lock+0x2b/0x80 > [ 59.440727] ? lock_release+0x145/0x300 > [ 59.444586] ? __fget_files+0xe5/0x170 > [ 59.448360] __sys_sendmsg+0x5c/0xb0 > [ 59.451961] do_syscall_64+0x5b/0x80 > [ 59.455558] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > [ 59.460641] RIP: 0033:0x7f164628539d > [ 59.464251] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 0a > b1 f7 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 > 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 5e b1 f7 > ff 48 > [ 59.483133] RSP: 002b:00007ffd9bf01520 EFLAGS: 00000293 ORIG_RAX: > 000000000000002e > [ 59.490749] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f164628539d > [ 59.497925] RDX: 0000000000000000 RSI: 00007ffd9bf01560 RDI: 000000000000000c > [ 59.505100] RBP: 00005575f2915040 R08: 0000000000000000 R09: 0000000000000000 > [ 59.512275] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 > [ 59.519453] R13: 00007ffd9bf016c0 R14: 00007ffd9bf016bc R15: 0000000000000000 > [ 59.526637] </TASK> > [ 59.528834] Modules linked in: rfkill sunrpc intel_rapl_msr > intel_rapl_common vfat fat qede qed edac_mce_amd i2c_piix4 crc8 rapl > igb ipmi_ssif ptdma ses enclosure pcspkr dca hpilo k10temp acpi_ipmi > acpi_tad ipmi_si acpi_power_meter fuse zram xfs crct10dif_pclmul > crc32_pclmul crc32c_intel mgag200 i2c_algo_bit drm_shmem_helper > drm_kms_helper ghash_clmulni_intel drm hpwdt smartpqi ccp > scsi_transport_sas sp5100_tco wmi ipmi_devintf ipmi_msghandler > [ 59.568459] ---[ end trace 0000000000000000 ]--- > [ 59.632952] RIP: 0010:qede_load.cold+0x5a1/0x819 [qede] > [ 59.632967] Code: 41 88 84 24 b1 00 00 00 41 0f b7 84 24 b6 00 00 > 00 45 88 b4 24 b3 00 00 00 e9 b8 00 ff ff 48 c7 c1 09 66 46 c1 e9 6f > ff ff ff <0f> 0b 49 8b 7c 24 08 e8 82 e2 fe ff 48 89 c1 48 85 c0 74 32 > ba 2a > [ 59.632970] RSP: 0018:ffffae1e04593688 EFLAGS: 00010206 > [ 59.632972] RAX: 000000000000006b RBX: 0000000000000000 RCX: 0000000000000006 > [ 59.632974] RDX: ffff8f8f35332be8 RSI: ffffffffaf96411f RDI: ffffffffaf8e4b1e > [ 59.632977] RBP: ffff8f8f2a87acd0 R08: 0000000000000001 R09: 0000000000000001 > [ 59.632978] R10: 0000000000000000 R11: 000000000f8c087f R12: ffff8f8f2a87ac00 > [ 59.632980] R13: ffff8f8f34d7f928 R14: ffffae1e0c039000 R15: 0000000000000000 > [ 59.632982] FS: 00007f164509f500(0000) GS:ffff8f9dfd800000(0000) > knlGS:0000000000000000 > [ 59.632984] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 59.632986] CR2: 00005575f29a5c08 CR3: 0000000163810000 CR4: 0000000000350ee0 > [ 59.632989] Kernel panic - not syncing: Fatal exception > [ 59.732905] Kernel Offset: 0x2d000000 from 0xffffffff81000000 > (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > [ 59.807803] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > cki issue tracker: https://datawarehouse.cki-project.org/issue/1470 > > [1] https://datawarehouse.cki-project.org/kcidb/tests/4002370 > [2] http://s3.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/06/20/568171088/redhat:568171088/redhat:568171088_x86_64_debug/.config > > Thanks, > Bruno Goncalves > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0 2022-08-02 19:23 ` Jakub Kicinski @ 2022-08-03 12:13 ` Bruno Goncalves 2022-08-03 15:37 ` Jakub Kicinski 0 siblings, 1 reply; 11+ messages in thread From: Bruno Goncalves @ 2022-08-03 12:13 UTC (permalink / raw) To: Jakub Kicinski; +Cc: LKML, Networking, CKI Project On Tue, 2 Aug 2022 at 21:24, Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 2 Aug 2022 13:27:32 +0200 Bruno Goncalves wrote: > > Hello, > > > > We've noticed the following panic when booting up kernel 5.19.0 on a > > specific machine. > > The panic seems to happen when we build the kernel with debug flags. > > Below is the first crash we noticed, more logs at [1] and the kernel > > config is at [2]. > > > > [ 59.207684] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > [ 59.212949] CPU: 32 PID: 1967 Comm: NetworkManager Not tainted 5.19.0-rc3 #1 > > [ 59.220041] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant > > DL325 Gen10 Plus, BIOS A43 08/09/2021 > > [ 59.229490] RIP: 0010:qede_load.cold+0x5a1/0x819 [qede] > > Is it this warning? > > WARN_ON(xdp_rxq_info_reg(&fp->rxq->xdp_rxq, edev->ndev, > > Would you be able to run the stacktrace thru > scripts/decode_stacktrace.sh ? Got this from the most recent failure (kernel built using commit 0805c6fb39f6): the tarball is https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/603714145/build%20x86_64%20debug/2807738987/artifacts/kernel-mainline.kernel.org-redhat_603714145_x86_64_debug.tar.gz and the call trace from https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/08/02/redhat:603123526/build_x86_64_redhat:603123526_x86_64_debug/tests/1/results_0001/console.log/console.log [ 69.876513] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 69.888521] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant DL325 Gen10 Plus, BIOS A43 08/09/2021 [ 69.897971] RIP: 0010:qede_load.cold (/builds/2807738987/workdir/./include/linux/spinlock.h:389 /builds/2807738987/workdir/./include/linux/netdevice.h:4294 /builds/2807738987/workdir/./include/linux/netdevice.h:4385 /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2594 /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2575) qede [ 69.903242] Code: 41 88 84 24 b1 00 00 00 41 0f b7 84 24 b6 00 00 00 45 88 b4 24 b3 00 00 00 e9 12 ff fe ff 48 c7 c1 09 c6 d7 c0 e9 6f ff ff ff <0f> 0b 49 8b 7c 24 08 e8 8c e0 fe ff 48 89 c1 48 85 c0 74 32 ba 2a All code ======== 0: 41 88 84 24 b1 00 00 mov %al,0xb1(%r12) 7: 00 8: 41 0f b7 84 24 b6 00 movzwl 0xb6(%r12),%eax f: 00 00 11: 45 88 b4 24 b3 00 00 mov %r14b,0xb3(%r12) 18: 00 19: e9 12 ff fe ff jmpq 0xfffffffffffeff30 1e: 48 c7 c1 09 c6 d7 c0 mov $0xffffffffc0d7c609,%rcx 25: e9 6f ff ff ff jmpq 0xffffffffffffff99 2a:* 0f 0b ud2 <-- trapping instruction 2c: 49 8b 7c 24 08 mov 0x8(%r12),%rdi 31: e8 8c e0 fe ff callq 0xfffffffffffee0c2 36: 48 89 c1 mov %rax,%rcx 39: 48 85 c0 test %rax,%rax 3c: 74 32 je 0x70 3e: ba .byte 0xba 3f: 2a .byte 0x2a Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: 49 8b 7c 24 08 mov 0x8(%r12),%rdi 7: e8 8c e0 fe ff callq 0xfffffffffffee098 c: 48 89 c1 mov %rax,%rcx f: 48 85 c0 test %rax,%rax 12: 74 32 je 0x46 14: ba .byte 0xba 15: 2a .byte 0x2a [ 69.922125] RSP: 0018:ffffac3c848a3658 EFLAGS: 00010206 [ 69.927385] RAX: 000000000000006b RBX: 0000000000000000 RCX: 0000000000000006 [ 69.934562] RDX: ffff94a3f3eabba8 RSI: ffffffff9e9578a7 RDI: ffffffff9e8d8176 [ 69.941738] RBP: ffff94a3ee75acd0 R08: 0000000000000001 R09: 0000000000000001 [ 69.948914] R10: 0000000000000000 R11: 00000000f6665eaf R12: ffff94a3ee75ac00 [ 69.956089] R13: ffff94a3f31bb928 R14: ffffac3ca31dd000 R15: 0000000000000000 [ 69.963265] FS: 00007f623da2c500(0000) GS:ffff94b2bc240000(0000) knlGS:0000000000000000 [ 69.971405] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 69.977183] CR2: 000056491c907688 CR3: 000000015ebe2000 CR4: 0000000000350ee0 [ 69.984361] Call Trace: [ 69.986820] <TASK> [ 69.988950] qede_open (/builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2552) qede [ 69.992817] __dev_open (/builds/2807738987/workdir/net/core/dev.c:1434) [ 69.996247] __dev_change_flags (/builds/2807738987/workdir/net/core/dev.c:8537) [ 70.000459] dev_change_flags (/builds/2807738987/workdir/net/core/dev.c:8608) [ 70.004318] do_setlink (/builds/2807738987/workdir/net/core/rtnetlink.c:2780) [ 70.007916] ? lock_is_held_type (/builds/2807738987/workdir/kernel/locking/lockdep.c:466 /builds/2807738987/workdir/kernel/locking/lockdep.c:5710) [ 70.012131] ? lock_is_held_type (/builds/2807738987/workdir/kernel/locking/lockdep.c:466 /builds/2807738987/workdir/kernel/locking/lockdep.c:5710) [ 70.016342] ? __nla_validate_parse (/builds/2807738987/workdir/./include/net/netlink.h:1159 (discriminator 2) /builds/2807738987/workdir/lib/nlattr.c:576 (discriminator 2)) [ 70.020816] ? mark_held_locks (/builds/2807738987/workdir/kernel/locking/lockdep.c:4234) [ 70.024767] ? _raw_spin_unlock_irqrestore (/builds/2807738987/workdir/./arch/x86/include/asm/paravirt.h:704 /builds/2807738987/workdir/./arch/x86/include/asm/irqflags.h:138 /builds/2807738987/workdir/./include/linux/spinlock_api_smp.h:151 /builds/2807738987/workdir/kernel/locking/spinlock.c:194) [ 70.029763] ? lockdep_hardirqs_on (/builds/2807738987/workdir/kernel/locking/lockdep.c:4383) [ 70.034152] __rtnl_newlink (/builds/2807738987/workdir/net/core/rtnetlink.c:3546) [ 70.038020] ? rtnl_newlink (/builds/2807738987/workdir/net/core/rtnetlink.c:3590) [ 70.041702] ? rcu_read_lock_sched_held (/builds/2807738987/workdir/kernel/rcu/update.c:125 /builds/2807738987/workdir/kernel/rcu/update.c:119) [ 70.046437] ? trace_kmalloc (/builds/2807738987/workdir/./include/trace/events/kmem.h:52 /builds/2807738987/workdir/./include/trace/events/kmem.h:52) [ 70.050297] ? kmem_cache_alloc_trace (/builds/2807738987/workdir/mm/slub.c:3286) [ 70.055035] rtnl_newlink (/builds/2807738987/workdir/net/core/rtnetlink.c:3594) [ 70.058544] rtnetlink_rcv_msg (/builds/2807738987/workdir/net/core/rtnetlink.c:6089) [ 70.062667] ? lock_acquire (/builds/2807738987/workdir/kernel/locking/lockdep.c:466 /builds/2807738987/workdir/kernel/locking/lockdep.c:5668 /builds/2807738987/workdir/kernel/locking/lockdep.c:5631) [ 70.066442] ? rtnl_stats_set (/builds/2807738987/workdir/net/core/rtnetlink.c:5986) [ 70.070478] netlink_rcv_skb (/builds/2807738987/workdir/net/netlink/af_netlink.c:2501) [ 70.074345] netlink_unicast (/builds/2807738987/workdir/net/netlink/af_netlink.c:1320 /builds/2807738987/workdir/net/netlink/af_netlink.c:1345) [ 70.078295] netlink_sendmsg (/builds/2807738987/workdir/net/netlink/af_netlink.c:1921) [ 70.082249] sock_sendmsg (/builds/2807738987/workdir/net/socket.c:714 /builds/2807738987/workdir/net/socket.c:734) [ 70.085760] ____sys_sendmsg (/builds/2807738987/workdir/net/socket.c:2488) [ 70.089705] ? import_iovec (/builds/2807738987/workdir/lib/iov_iter.c:2001) [ 70.093389] ? sendmsg_copy_msghdr (/builds/2807738987/workdir/net/socket.c:2429 /builds/2807738987/workdir/net/socket.c:2519) [ 70.097689] ___sys_sendmsg (/builds/2807738987/workdir/net/socket.c:2544) [ 70.101378] ? lock_is_held_type (/builds/2807738987/workdir/kernel/locking/lockdep.c:466 /builds/2807738987/workdir/kernel/locking/lockdep.c:5710) [ 70.105588] ? find_held_lock (/builds/2807738987/workdir/kernel/locking/lockdep.c:5156) [ 70.109451] ? lock_release (/builds/2807738987/workdir/kernel/locking/lockdep.c:466 /builds/2807738987/workdir/kernel/locking/lockdep.c:5688) [ 70.113313] ? __fget_files (/builds/2807738987/workdir/fs/file.c:917) [ 70.117089] __sys_sendmsg (/builds/2807738987/workdir/net/socket.c:2571) [ 70.120692] do_syscall_64 (/builds/2807738987/workdir/arch/x86/entry/common.c:50 /builds/2807738987/workdir/arch/x86/entry/common.c:80) [ 70.124294] entry_SYSCALL_64_after_hwframe (/builds/2807206727/workdir/arch/x86/entry/entry_64.S:120) [ 70.129378] RIP: 0033:0x7f623e42e71d [ 70.132988] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 2a 9b f7 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 7e 9b f7 ff 48 All code ======== 0: 28 89 54 24 1c 48 sub %cl,0x481c2454(%rcx) 6: 89 74 24 10 mov %esi,0x10(%rsp) a: 89 7c 24 08 mov %edi,0x8(%rsp) e: e8 2a 9b f7 ff callq 0xfffffffffff79b3d 13: 8b 54 24 1c mov 0x1c(%rsp),%edx 17: 48 8b 74 24 10 mov 0x10(%rsp),%rsi 1c: 41 89 c0 mov %eax,%r8d 1f: 8b 7c 24 08 mov 0x8(%rsp),%edi 23: b8 2e 00 00 00 mov $0x2e,%eax 28: 0f 05 syscall 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction 30: 77 33 ja 0x65 32: 44 89 c7 mov %r8d,%edi 35: 48 89 44 24 08 mov %rax,0x8(%rsp) 3a: e8 7e 9b f7 ff callq 0xfffffffffff79bbd 3f: 48 rex.W Code starting with the faulting instruction =========================================== 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax 6: 77 33 ja 0x3b 8: 44 89 c7 mov %r8d,%edi b: 48 89 44 24 08 mov %rax,0x8(%rsp) 10: e8 7e 9b f7 ff callq 0xfffffffffff79b93 15: 48 rex.W [ 70.151871] RSP: 002b:00007fff0ccddd70 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 70.159488] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f623e42e71d [ 70.166665] RDX: 0000000000000000 RSI: 00007fff0ccdddb0 RDI: 000000000000000d [ 70.173842] RBP: 0000564bf49cb090 R08: 0000000000000000 R09: 0000000000000000 [ 70.181017] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000000b [ 70.188192] R13: 00007fff0ccddf20 R14: 00007fff0ccddf1c R15: 0000000000000000 [ 70.195379] </TASK> [ 70.197575] Modules linked in: acpi_cpufreq(-) rfkill sunrpc qede vfat fat intel_rapl_msr intel_rapl_common qed ipmi_ssif crc8 edac_mce_amd k10temp pcspkr rapl ptdma acpi_ipmi ses igb hpilo enclosure ipmi_si ipmi_devintf dca i2c_piix4 ipmi_msghandler acpi_tad acpi_power_meter fuse zram xfs mgag200 crct10dif_pclmul i2c_algo_bit crc32_pclmul drm_shmem_helper crc32c_intel drm_kms_helper ghash_clmulni_intel drm hpwdt ccp smartpqi scsi_transport_sas sp5100_tco wmi [ 70.238596] ---[ end trace 0000000000000000 ]--- [ 70.310657] RIP: 0010:qede_load.cold (/builds/2807738987/workdir/./include/linux/spinlock.h:389 /builds/2807738987/workdir/./include/linux/netdevice.h:4294 /builds/2807738987/workdir/./include/linux/netdevice.h:4385 /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2594 /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2575) qede [ 70.316130] Code: 41 88 84 24 b1 00 00 00 41 0f b7 84 24 b6 00 00 00 45 88 b4 24 b3 00 00 00 e9 12 ff fe ff 48 c7 c1 09 c6 d7 c0 e9 6f ff ff ff <0f> 0b 49 8b 7c 24 08 e8 8c e0 fe ff 48 89 c1 48 85 c0 74 32 ba 2a All code ======== 0: 41 88 84 24 b1 00 00 mov %al,0xb1(%r12) 7: 00 8: 41 0f b7 84 24 b6 00 movzwl 0xb6(%r12),%eax f: 00 00 11: 45 88 b4 24 b3 00 00 mov %r14b,0xb3(%r12) 18: 00 19: e9 12 ff fe ff jmpq 0xfffffffffffeff30 1e: 48 c7 c1 09 c6 d7 c0 mov $0xffffffffc0d7c609,%rcx 25: e9 6f ff ff ff jmpq 0xffffffffffffff99 2a:* 0f 0b ud2 <-- trapping instruction 2c: 49 8b 7c 24 08 mov 0x8(%r12),%rdi 31: e8 8c e0 fe ff callq 0xfffffffffffee0c2 36: 48 89 c1 mov %rax,%rcx 39: 48 85 c0 test %rax,%rax 3c: 74 32 je 0x70 3e: ba .byte 0xba 3f: 2a .byte 0x2a Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: 49 8b 7c 24 08 mov 0x8(%r12),%rdi 7: e8 8c e0 fe ff callq 0xfffffffffffee098 c: 48 89 c1 mov %rax,%rcx f: 48 85 c0 test %rax,%rax 12: 74 32 je 0x46 14: ba .byte 0xba 15: 2a .byte 0x2a [ 70.335057] RSP: 0018:ffffac3c848a3658 EFLAGS: 00010206 [ 70.340332] RAX: 000000000000006b RBX: 0000000000000000 RCX: 0000000000000006 [ 70.347554] RDX: ffff94a3f3eabba8 RSI: ffffffff9e9578a7 RDI: ffffffff9e8d8176 [ 70.354747] RBP: ffff94a3ee75acd0 R08: 0000000000000001 R09: 0000000000000001 [ 70.361968] R10: 0000000000000000 R11: 00000000f6665eaf R12: ffff94a3ee75ac00 [ 70.369160] R13: ffff94a3f31bb928 R14: ffffac3ca31dd000 R15: 0000000000000000 [ 70.376385] FS: 00007f623da2c500(0000) GS:ffff94b2bc240000(0000) knlGS:0000000000000000 [ 70.384543] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 70.390336] CR2: 000056491c907688 CR3: 000000015ebe2000 CR4: 0000000000350ee0 [ 70.397531] Kernel panic - not syncing: Fatal exception [ 70.406430] Kernel Offset: 0x1c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 70.484036] ---[ end Kernel panic - not syncing: Fatal exception ]--- Thanks, Bruno > > > [ 59.234757] Code: 41 88 84 24 b1 00 00 00 41 0f b7 84 24 b6 00 00 > > 00 45 88 b4 24 b3 00 00 00 e9 b8 00 ff ff 48 c7 c1 09 66 46 c1 e9 6f > > ff ff ff <0f> 0b 49 8b 7c 24 08 e8 82 e2 fe ff 48 89 c1 48 85 c0 74 32 > > ba 2a > > [ 59.253639] RSP: 0018:ffffae1e04593688 EFLAGS: 00010206 > > [ 59.258897] RAX: 000000000000006b RBX: 0000000000000000 RCX: 0000000000000006 > > [ 59.266073] RDX: ffff8f8f35332be8 RSI: ffffffffaf96411f RDI: ffffffffaf8e4b1e > > [ 59.273250] RBP: ffff8f8f2a87acd0 R08: 0000000000000001 R09: 0000000000000001 > > [ 59.280426] R10: 0000000000000000 R11: 000000000f8c087f R12: ffff8f8f2a87ac00 > > [ 59.287602] R13: ffff8f8f34d7f928 R14: ffffae1e0c039000 R15: 0000000000000000 > > [ 59.294777] FS: 00007f164509f500(0000) GS:ffff8f9dfd800000(0000) > > knlGS:0000000000000000 > > [ 59.302917] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 59.308697] CR2: 00005575f29a5c08 CR3: 0000000163810000 CR4: 0000000000350ee0 > > [ 59.315875] Call Trace: > > [ 59.318335] <TASK> > > [ 59.320458] qede_open+0x3b/0x90 [qede] > > [ 59.324323] __dev_open+0xf1/0x1c0 > > [ 59.327748] __dev_change_flags+0x1f8/0x280 > > [ 59.331957] dev_change_flags+0x22/0x60 > > [ 59.335816] do_setlink+0x327/0x1140 > > [ 59.339413] ? lock_is_held_type+0xe3/0x140 > > [ 59.343625] ? lock_is_held_type+0xe3/0x140 > > [ 59.347833] ? __nla_validate_parse+0x5f/0xb70 > > [ 59.352307] ? mark_held_locks+0x49/0x70 > > [ 59.356256] ? _raw_spin_unlock_irqrestore+0x30/0x60 > > [ 59.361254] ? lockdep_hardirqs_on+0x7d/0x100 > > [ 59.365640] __rtnl_newlink+0x59c/0x950 > > [ 59.369502] ? rtnl_newlink+0x2a/0x60 > > [ 59.373185] ? rcu_read_lock_sched_held+0x3c/0x70 > > [ 59.377918] ? trace_kmalloc+0x30/0xf0 > > [ 59.381692] ? kmem_cache_alloc_trace+0x1ad/0x270 > > [ 59.386426] rtnl_newlink+0x43/0x60 > > [ 59.389936] rtnetlink_rcv_msg+0x184/0x540 > > [ 59.394057] ? lock_acquire+0xe2/0x2e0 > > [ 59.397830] ? rtnl_stats_set+0x190/0x190 > > [ 59.401863] netlink_rcv_skb+0x51/0xf0 > > [ 59.405639] netlink_unicast+0x189/0x260 > > [ 59.409586] netlink_sendmsg+0x25a/0x4c0 > > [ 59.413536] sock_sendmsg+0x5c/0x60 > > [ 59.417045] ____sys_sendmsg+0x22b/0x270 > > [ 59.420991] ? import_iovec+0x17/0x20 > > [ 59.424675] ? sendmsg_copy_msghdr+0x78/0xa0 > > [ 59.428972] ___sys_sendmsg+0x85/0xc0 > > [ 59.432658] ? lock_is_held_type+0xe3/0x140 > > [ 59.436867] ? find_held_lock+0x2b/0x80 > > [ 59.440727] ? lock_release+0x145/0x300 > > [ 59.444586] ? __fget_files+0xe5/0x170 > > [ 59.448360] __sys_sendmsg+0x5c/0xb0 > > [ 59.451961] do_syscall_64+0x5b/0x80 > > [ 59.455558] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > [ 59.460641] RIP: 0033:0x7f164628539d > > [ 59.464251] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 0a > > b1 f7 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 > > 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 5e b1 f7 > > ff 48 > > [ 59.483133] RSP: 002b:00007ffd9bf01520 EFLAGS: 00000293 ORIG_RAX: > > 000000000000002e > > [ 59.490749] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f164628539d > > [ 59.497925] RDX: 0000000000000000 RSI: 00007ffd9bf01560 RDI: 000000000000000c > > [ 59.505100] RBP: 00005575f2915040 R08: 0000000000000000 R09: 0000000000000000 > > [ 59.512275] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 > > [ 59.519453] R13: 00007ffd9bf016c0 R14: 00007ffd9bf016bc R15: 0000000000000000 > > [ 59.526637] </TASK> > > [ 59.528834] Modules linked in: rfkill sunrpc intel_rapl_msr > > intel_rapl_common vfat fat qede qed edac_mce_amd i2c_piix4 crc8 rapl > > igb ipmi_ssif ptdma ses enclosure pcspkr dca hpilo k10temp acpi_ipmi > > acpi_tad ipmi_si acpi_power_meter fuse zram xfs crct10dif_pclmul > > crc32_pclmul crc32c_intel mgag200 i2c_algo_bit drm_shmem_helper > > drm_kms_helper ghash_clmulni_intel drm hpwdt smartpqi ccp > > scsi_transport_sas sp5100_tco wmi ipmi_devintf ipmi_msghandler > > [ 59.568459] ---[ end trace 0000000000000000 ]--- > > [ 59.632952] RIP: 0010:qede_load.cold+0x5a1/0x819 [qede] > > [ 59.632967] Code: 41 88 84 24 b1 00 00 00 41 0f b7 84 24 b6 00 00 > > 00 45 88 b4 24 b3 00 00 00 e9 b8 00 ff ff 48 c7 c1 09 66 46 c1 e9 6f > > ff ff ff <0f> 0b 49 8b 7c 24 08 e8 82 e2 fe ff 48 89 c1 48 85 c0 74 32 > > ba 2a > > [ 59.632970] RSP: 0018:ffffae1e04593688 EFLAGS: 00010206 > > [ 59.632972] RAX: 000000000000006b RBX: 0000000000000000 RCX: 0000000000000006 > > [ 59.632974] RDX: ffff8f8f35332be8 RSI: ffffffffaf96411f RDI: ffffffffaf8e4b1e > > [ 59.632977] RBP: ffff8f8f2a87acd0 R08: 0000000000000001 R09: 0000000000000001 > > [ 59.632978] R10: 0000000000000000 R11: 000000000f8c087f R12: ffff8f8f2a87ac00 > > [ 59.632980] R13: ffff8f8f34d7f928 R14: ffffae1e0c039000 R15: 0000000000000000 > > [ 59.632982] FS: 00007f164509f500(0000) GS:ffff8f9dfd800000(0000) > > knlGS:0000000000000000 > > [ 59.632984] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 59.632986] CR2: 00005575f29a5c08 CR3: 0000000163810000 CR4: 0000000000350ee0 > > [ 59.632989] Kernel panic - not syncing: Fatal exception > > [ 59.732905] Kernel Offset: 0x2d000000 from 0xffffffff81000000 > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > [ 59.807803] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > > > > cki issue tracker: https://datawarehouse.cki-project.org/issue/1470 > > > > [1] https://datawarehouse.cki-project.org/kcidb/tests/4002370 > > [2] http://s3.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/06/20/568171088/redhat:568171088/redhat:568171088_x86_64_debug/.config > > > > Thanks, > > Bruno Goncalves > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0 2022-08-03 12:13 ` Bruno Goncalves @ 2022-08-03 15:37 ` Jakub Kicinski 2022-08-18 7:22 ` Bruno Goncalves 0 siblings, 1 reply; 11+ messages in thread From: Jakub Kicinski @ 2022-08-03 15:37 UTC (permalink / raw) To: Bruno Goncalves; +Cc: LKML, Networking, CKI Project On Wed, 3 Aug 2022 14:13:00 +0200 Bruno Goncalves wrote: > Got this from the most recent failure (kernel built using commit 0805c6fb39f6): > > the tarball is https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/603714145/build%20x86_64%20debug/2807738987/artifacts/kernel-mainline.kernel.org-redhat_603714145_x86_64_debug.tar.gz > and the call trace from > https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/08/02/redhat:603123526/build_x86_64_redhat:603123526_x86_64_debug/tests/1/results_0001/console.log/console.log > > [ 69.876513] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > [ 69.888521] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant > DL325 Gen10 Plus, BIOS A43 08/09/2021 > [ 69.897971] RIP: 0010:qede_load.cold > (/builds/2807738987/workdir/./include/linux/spinlock.h:389 > /builds/2807738987/workdir/./include/linux/netdevice.h:4294 > /builds/2807738987/workdir/./include/linux/netdevice.h:4385 > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2594 > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2575) Thanks a lot! That seems to point the finger at commit 3aa6bce9af0e ("net: watchdog: hold device global xmit lock during tx disable") but frankly IDK why... The driver must be fully initialized to get to ndo_open() so how is the tx_global_lock busted?! Would you be able to re-run with CONFIG_KASAN=y ? Perhaps KASAN can tell us what's messing up the lock. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0 2022-08-03 15:37 ` Jakub Kicinski @ 2022-08-18 7:22 ` Bruno Goncalves 2022-08-18 15:51 ` Jakub Kicinski 0 siblings, 1 reply; 11+ messages in thread From: Bruno Goncalves @ 2022-08-18 7:22 UTC (permalink / raw) To: Jakub Kicinski; +Cc: LKML, Networking, CKI Project On Wed, 3 Aug 2022 at 17:37, Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 3 Aug 2022 14:13:00 +0200 Bruno Goncalves wrote: > > Got this from the most recent failure (kernel built using commit 0805c6fb39f6): > > > > the tarball is https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/603714145/build%20x86_64%20debug/2807738987/artifacts/kernel-mainline.kernel.org-redhat_603714145_x86_64_debug.tar.gz > > and the call trace from > > https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/08/02/redhat:603123526/build_x86_64_redhat:603123526_x86_64_debug/tests/1/results_0001/console.log/console.log > > > > [ 69.876513] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > [ 69.888521] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant > > DL325 Gen10 Plus, BIOS A43 08/09/2021 > > [ 69.897971] RIP: 0010:qede_load.cold > > (/builds/2807738987/workdir/./include/linux/spinlock.h:389 > > /builds/2807738987/workdir/./include/linux/netdevice.h:4294 > > /builds/2807738987/workdir/./include/linux/netdevice.h:4385 > > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2594 > > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2575) > > Thanks a lot! That seems to point the finger at commit 3aa6bce9af0e > ("net: watchdog: hold device global xmit lock during tx disable") but > frankly IDK why... The driver must be fully initialized to get to > ndo_open() so how is the tx_global_lock busted?! > > Would you be able to re-run with CONFIG_KASAN=y ? > Perhaps KASAN can tell us what's messing up the lock. Sorry for taking a long time to provide the info. Below is the call trace, note it is on a different machine. It might take me a few days in case I need to try on the original machine. [ 110.329039] [0000:c1:00.2]:[qedf_link_update:613]:9: LINK DOWN. [ 110.330183] invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 110.340728] CPU: 56 PID: 1810 Comm: NetworkManager Not tainted 5.19.0 #1 [ 110.347435] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.18.0 01/17/2022 [ 110.355088] RIP: 0010:qede_load.cold+0x14c/0xa08 [qede] [ 110.360348] Code: c6 60 fb 40 c0 48 c7 c7 40 e1 40 c0 e8 b7 21 28 c8 48 8b 3c 24 e8 fa 06 2d c7 41 0f b7 9f b6 00 00 00 41 89 dc e9 c2 3c fe ff <0f> 0b 48 c7 c1 60 d0 40 c0 eb c1 49 8d 7f 08 e8 36 09 2d c7 49 8b [ 110.379101] RSP: 0018:ffff888162ab6e00 EFLAGS: 00010206 [ 110.384338] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffc03ed524 [ 110.391479] RDX: 000000000000006b RSI: 0000000000000007 RDI: ffff88810401a758 [ 110.398621] RBP: ffff8888a20f2cd0 R08: 0000000000000001 R09: ffffffff8bba9e0f [ 110.405761] R10: fffffbfff17753c1 R11: 0000000000000001 R12: ffff88810401a758 [ 110.412895] R13: ffff8888a20f2c08 R14: ffff8888a20f2cb6 R15: ffff8888a20f2c00 [ 110.420036] FS: 00007fac3a412500(0000) GS:ffff888810d00000(0000) knlGS:0000000000000000 [ 110.428129] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 110.433875] CR2: 00007fac38ffca88 CR3: 0000000123528000 CR4: 00000000003506e0 [ 110.441009] Call Trace: [ 110.443464] <TASK> [ 110.445585] ? qed_eth_rxq_start_ramrod+0x320/0x320 [qed] [ 110.451110] ? qede_alloc_mem_txq+0x240/0x240 [qede] [ 110.456106] ? lock_release+0x233/0x470 [ 110.459958] ? rwsem_wake.isra.0+0xf1/0x100 [ 110.464163] ? lock_chain_count+0x20/0x20 [ 110.468179] ? find_held_lock+0x83/0xa0 [ 110.472032] ? lock_is_held_type+0xe3/0x140 [ 110.476245] ? lockdep_hardirqs_on_prepare+0x132/0x230 [ 110.481397] ? queue_delayed_work_on+0x57/0x90 [ 110.485852] ? lockdep_hardirqs_on+0x7d/0x100 [ 110.490221] ? qed_get_int_fp+0xe0/0xe0 [qed] [ 110.494703] qede_open+0x6d/0x100 [qede] [ 110.498664] __dev_open+0x1c3/0x2c0 [ 110.502171] ? dev_set_rx_mode+0x60/0x60 [ 110.506105] ? lockdep_hardirqs_on_prepare+0x132/0x230 [ 110.511254] ? __local_bh_enable_ip+0x8f/0x110 [ 110.515711] __dev_change_flags+0x31b/0x3b0 [ 110.519906] ? dev_set_allmulti+0x10/0x10 [ 110.523935] dev_change_flags+0x58/0xb0 [ 110.527783] do_setlink+0xb38/0x19e0 [ 110.531370] ? reacquire_held_locks+0x270/0x270 [ 110.535910] ? rtnetlink_put_metrics+0x2e0/0x2e0 [ 110.540538] ? entry_SYSCALL_64+0x1/0x29 [ 110.544478] ? is_bpf_text_address+0x83/0xf0 [ 110.548762] ? kernel_text_address+0x125/0x130 [ 110.553218] ? __kernel_text_address+0xe/0x40 [ 110.557585] ? unwind_get_return_address+0x33/0x50 [ 110.562386] ? create_prof_cpu_mask+0x20/0x20 [ 110.566755] ? arch_stack_walk+0xa3/0x100 [ 110.570781] ? memset+0x1f/0x40 [ 110.573939] ? __nla_validate_parse+0xb4/0x1040 [ 110.578481] ? stack_trace_save+0x96/0xd0 [ 110.582504] ? nla_get_range_signed+0x180/0x180 [ 110.587042] ? __stack_depot_save+0x35/0x4a0 [ 110.591335] __rtnl_newlink+0x715/0xc90 [ 110.595182] ? mark_lock+0xd51/0xd90 [ 110.598773] ? rtnl_link_unregister+0x1e0/0x1e0 [ 110.603309] ? _raw_spin_unlock_irqrestore+0x40/0x60 [ 110.608285] ? ___slab_alloc+0x919/0xf80 [ 110.612222] ? rtnl_newlink+0x36/0x70 [ 110.615896] ? reacquire_held_locks+0x270/0x270 [ 110.620440] ? lock_is_held_type+0xe3/0x140 [ 110.624634] ? rcu_read_lock_sched_held+0x3f/0x80 [ 110.629353] ? trace_kmalloc+0x33/0x100 [ 110.633207] rtnl_newlink+0x4f/0x70 [ 110.636704] rtnetlink_rcv_msg+0x242/0x6b0 [ 110.640815] ? rtnl_stats_set+0x260/0x260 [ 110.644836] ? lock_acquire+0x16f/0x410 [ 110.648682] ? lock_acquire+0x17f/0x410 [ 110.652533] netlink_rcv_skb+0xce/0x200 [ 110.656385] ? rtnl_stats_set+0x260/0x260 [ 110.660408] ? netlink_ack+0x520/0x520 [ 110.664166] ? netlink_deliver_tap+0x13c/0x5c0 [ 110.668626] ? netlink_deliver_tap+0x141/0x5c0 [ 110.673083] netlink_unicast+0x2cb/0x460 [ 110.677015] ? netlink_attachskb+0x440/0x440 [ 110.681294] ? __build_skb_around+0x12a/0x150 [ 110.685667] netlink_sendmsg+0x3d2/0x710 [ 110.689609] ? netlink_unicast+0x460/0x460 [ 110.693710] ? iovec_from_user.part.0+0x95/0x200 [ 110.698348] ? netlink_unicast+0x460/0x460 [ 110.702456] sock_sendmsg+0x99/0xa0 [ 110.705963] ____sys_sendmsg+0x3d4/0x410 [ 110.709895] ? kernel_sendmsg+0x30/0x30 [ 110.713740] ? __ia32_sys_recvmmsg+0x160/0x160 [ 110.718200] ? lockdep_hardirqs_on_prepare+0x230/0x230 [ 110.723358] ___sys_sendmsg+0xe2/0x150 [ 110.727124] ? sendmsg_copy_msghdr+0x110/0x110 [ 110.731576] ? find_held_lock+0x83/0xa0 [ 110.735425] ? lock_release+0x233/0x470 [ 110.739271] ? __fget_files+0x14a/0x200 [ 110.743120] ? reacquire_held_locks+0x270/0x270 [ 110.747674] ? __fget_files+0x162/0x200 [ 110.751524] ? __fget_light+0x66/0x100 [ 110.755286] __sys_sendmsg+0xc3/0x140 [ 110.758964] ? __sys_sendmsg_sock+0x20/0x20 [ 110.763158] ? mark_held_locks+0x24/0x90 [ 110.767099] ? ktime_get_coarse_real_ts64+0x19/0x80 [ 110.771990] ? ktime_get_coarse_real_ts64+0x65/0x80 [ 110.776879] ? syscall_trace_enter.constprop.0+0x16f/0x230 [ 110.782375] do_syscall_64+0x5b/0x80 [ 110.785963] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 110.791021] RIP: 0033:0x7fac3b54f71d [ 110.794609] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 ea c4 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 3e c5 f4 ff 48 [ 110.813362] RSP: 002b:00007ffd3b5c7da0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 110.820938] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fac3b54f71d [ 110.828081] RDX: 0000000000000000 RSI: 00007ffd3b5c7de0 RDI: 000000000000000d [ 110.835221] RBP: 0000563d7ac60090 R08: 0000000000000000 R09: 0000000000000000 [ 110.842361] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffd3b5c7f4c [ 110.849494] R13: 00007ffd3b5c7f50 R14: 0000000000000000 R15: 00007ffd3b5c7f58 [ 110.856639] </TASK> [ 110.858837] Modules linked in: pcc_cpufreq(-) rfkill intel_rapl_msr dcdbas intel_rapl_common amd64_edac edac_mce_amd rapl pcspkr qedi mgag200 i2c_algo_bit iscsi_boot_sysfs libiscsi drm_shmem_helper cdc_ether scsi_transport_iscsi usbnet drm_kms_helper mii uio ipmi_ssif k10temp i2c_piix4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_cpufreq vfat fat drm fuse xfs qedf qede qed crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel libfcoe libfc scsi_transport_fc crc8 ccp tg3 sp5100_tco [ 110.904398] ---[ end trace 0000000000000000 ]--- [ 110.909039] RIP: 0010:qede_load.cold+0x14c/0xa08 [qede] [ 110.914306] Code: c6 60 fb 40 c0 48 c7 c7 40 e1 40 c0 e8 b7 21 28 c8 48 8b 3c 24 e8 fa 06 2d c7 41 0f b7 9f b6 00 00 00 41 89 dc e9 c2 3c fe ff <0f> 0b 48 c7 c1 60 d0 40 c0 eb c1 49 8d 7f 08 e8 36 09 2d c7 49 8b [ 110.933068] RSP: 0018:ffff888162ab6e00 EFLAGS: 00010206 [ 110.938314] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffc03ed524 [ 110.945466] RDX: 000000000000006b RSI: 0000000000000007 RDI: ffff88810401a758 [ 110.952616] RBP: ffff8888a20f2cd0 R08: 0000000000000001 R09: ffffffff8bba9e0f [ 110.959772] R10: fffffbfff17753c1 R11: 0000000000000001 R12: ffff88810401a758 [ 110.966925] R13: ffff8888a20f2c08 R14: ffff8888a20f2cb6 R15: ffff8888a20f2c00 [ 110.974092] FS: 00007fac3a412500(0000) GS:ffff888810d00000(0000) knlGS:0000000000000000 [ 110.982198] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 110.987971] CR2: 00007fac38ffca88 CR3: 0000000123528000 CR4: 00000000003506e0 [ 110.995131] Kernel panic - not syncing: Fatal exception [ 111.001311] Kernel Offset: 0x6000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 111.012016] ---[ end Kernel panic - not syncing: Fatal exception ]--- kernel tarball: https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/604654489/publish%20x86_64%20debug/2813007034/artifacts/kernel-mainline.kernel.org-redhat_604654489_x86_64_debug.tar.gz kernel config: https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/604654489/build%20x86_64%20debug/2813006987/artifacts/kernel-mainline.kernel.org-redhat_604654489_x86_64_debug.config Bruno > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0 2022-08-18 7:22 ` Bruno Goncalves @ 2022-08-18 15:51 ` Jakub Kicinski 2022-08-18 17:55 ` [EXT] " Manish Chopra 0 siblings, 1 reply; 11+ messages in thread From: Jakub Kicinski @ 2022-08-18 15:51 UTC (permalink / raw) To: Bruno Goncalves, Ariel Elior Cc: LKML, Networking, CKI Project, Saurav Kashyap, Javed Hasan, Manish Chopra On Thu, 18 Aug 2022 09:22:17 +0200 Bruno Goncalves wrote: > On Wed, 3 Aug 2022 at 17:37, Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Wed, 3 Aug 2022 14:13:00 +0200 Bruno Goncalves wrote: > > > Got this from the most recent failure (kernel built using commit 0805c6fb39f6): > > > > > > the tarball is https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/603714145/build%20x86_64%20debug/2807738987/artifacts/kernel-mainline.kernel.org-redhat_603714145_x86_64_debug.tar.gz > > > and the call trace from > > > https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/08/02/redhat:603123526/build_x86_64_redhat:603123526_x86_64_debug/tests/1/results_0001/console.log/console.log > > > > > > [ 69.876513] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > > [ 69.888521] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant > > > DL325 Gen10 Plus, BIOS A43 08/09/2021 > > > [ 69.897971] RIP: 0010:qede_load.cold > > > (/builds/2807738987/workdir/./include/linux/spinlock.h:389 > > > /builds/2807738987/workdir/./include/linux/netdevice.h:4294 > > > /builds/2807738987/workdir/./include/linux/netdevice.h:4385 > > > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2594 > > > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2575) > > > > Thanks a lot! That seems to point the finger at commit 3aa6bce9af0e > > ("net: watchdog: hold device global xmit lock during tx disable") but > > frankly IDK why... The driver must be fully initialized to get to > > ndo_open() so how is the tx_global_lock busted?! > > > > Would you be able to re-run with CONFIG_KASAN=y ? > > Perhaps KASAN can tell us what's messing up the lock. > > Sorry for taking a long time to provide the info. > Below is the call trace, note it is on a different machine. It might > take me a few days in case I need to try on the original machine. Thanks, looks like KASAN didn't catch anything, it's the same crash :( Let's CC all the Qlogic people, Qlogic PTAL. > [ 110.329039] [0000:c1:00.2]:[qedf_link_update:613]:9: LINK DOWN. > [ 110.330183] invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI > [ 110.340728] CPU: 56 PID: 1810 Comm: NetworkManager Not tainted 5.19.0 #1 > [ 110.347435] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS > 1.18.0 01/17/2022 > [ 110.355088] RIP: 0010:qede_load.cold+0x14c/0xa08 [qede] > [ 110.360348] Code: c6 60 fb 40 c0 48 c7 c7 40 e1 40 c0 e8 b7 21 28 > c8 48 8b 3c 24 e8 fa 06 2d c7 41 0f b7 9f b6 00 00 00 41 89 dc e9 c2 > 3c fe ff <0f> 0b 48 c7 c1 60 d0 40 c0 eb c1 49 8d 7f 08 e8 36 09 2d c7 > 49 8b > [ 110.379101] RSP: 0018:ffff888162ab6e00 EFLAGS: 00010206 > [ 110.384338] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffc03ed524 > [ 110.391479] RDX: 000000000000006b RSI: 0000000000000007 RDI: ffff88810401a758 > [ 110.398621] RBP: ffff8888a20f2cd0 R08: 0000000000000001 R09: ffffffff8bba9e0f > [ 110.405761] R10: fffffbfff17753c1 R11: 0000000000000001 R12: ffff88810401a758 > [ 110.412895] R13: ffff8888a20f2c08 R14: ffff8888a20f2cb6 R15: ffff8888a20f2c00 > [ 110.420036] FS: 00007fac3a412500(0000) GS:ffff888810d00000(0000) > knlGS:0000000000000000 > [ 110.428129] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 110.433875] CR2: 00007fac38ffca88 CR3: 0000000123528000 CR4: 00000000003506e0 > [ 110.441009] Call Trace: > [ 110.443464] <TASK> > [ 110.445585] ? qed_eth_rxq_start_ramrod+0x320/0x320 [qed] > [ 110.451110] ? qede_alloc_mem_txq+0x240/0x240 [qede] > [ 110.456106] ? lock_release+0x233/0x470 > [ 110.459958] ? rwsem_wake.isra.0+0xf1/0x100 > [ 110.464163] ? lock_chain_count+0x20/0x20 > [ 110.468179] ? find_held_lock+0x83/0xa0 > [ 110.472032] ? lock_is_held_type+0xe3/0x140 > [ 110.476245] ? lockdep_hardirqs_on_prepare+0x132/0x230 > [ 110.481397] ? queue_delayed_work_on+0x57/0x90 > [ 110.485852] ? lockdep_hardirqs_on+0x7d/0x100 > [ 110.490221] ? qed_get_int_fp+0xe0/0xe0 [qed] > [ 110.494703] qede_open+0x6d/0x100 [qede] > [ 110.498664] __dev_open+0x1c3/0x2c0 > [ 110.502171] ? dev_set_rx_mode+0x60/0x60 > [ 110.506105] ? lockdep_hardirqs_on_prepare+0x132/0x230 > [ 110.511254] ? __local_bh_enable_ip+0x8f/0x110 > [ 110.515711] __dev_change_flags+0x31b/0x3b0 > [ 110.519906] ? dev_set_allmulti+0x10/0x10 > [ 110.523935] dev_change_flags+0x58/0xb0 > [ 110.527783] do_setlink+0xb38/0x19e0 > [ 110.531370] ? reacquire_held_locks+0x270/0x270 > [ 110.535910] ? rtnetlink_put_metrics+0x2e0/0x2e0 > [ 110.540538] ? entry_SYSCALL_64+0x1/0x29 > [ 110.544478] ? is_bpf_text_address+0x83/0xf0 > [ 110.548762] ? kernel_text_address+0x125/0x130 > [ 110.553218] ? __kernel_text_address+0xe/0x40 > [ 110.557585] ? unwind_get_return_address+0x33/0x50 > [ 110.562386] ? create_prof_cpu_mask+0x20/0x20 > [ 110.566755] ? arch_stack_walk+0xa3/0x100 > [ 110.570781] ? memset+0x1f/0x40 > [ 110.573939] ? __nla_validate_parse+0xb4/0x1040 > [ 110.578481] ? stack_trace_save+0x96/0xd0 > [ 110.582504] ? nla_get_range_signed+0x180/0x180 > [ 110.587042] ? __stack_depot_save+0x35/0x4a0 > [ 110.591335] __rtnl_newlink+0x715/0xc90 > [ 110.595182] ? mark_lock+0xd51/0xd90 > [ 110.598773] ? rtnl_link_unregister+0x1e0/0x1e0 > [ 110.603309] ? _raw_spin_unlock_irqrestore+0x40/0x60 > [ 110.608285] ? ___slab_alloc+0x919/0xf80 > [ 110.612222] ? rtnl_newlink+0x36/0x70 > [ 110.615896] ? reacquire_held_locks+0x270/0x270 > [ 110.620440] ? lock_is_held_type+0xe3/0x140 > [ 110.624634] ? rcu_read_lock_sched_held+0x3f/0x80 > [ 110.629353] ? trace_kmalloc+0x33/0x100 > [ 110.633207] rtnl_newlink+0x4f/0x70 > [ 110.636704] rtnetlink_rcv_msg+0x242/0x6b0 > [ 110.640815] ? rtnl_stats_set+0x260/0x260 > [ 110.644836] ? lock_acquire+0x16f/0x410 > [ 110.648682] ? lock_acquire+0x17f/0x410 > [ 110.652533] netlink_rcv_skb+0xce/0x200 > [ 110.656385] ? rtnl_stats_set+0x260/0x260 > [ 110.660408] ? netlink_ack+0x520/0x520 > [ 110.664166] ? netlink_deliver_tap+0x13c/0x5c0 > [ 110.668626] ? netlink_deliver_tap+0x141/0x5c0 > [ 110.673083] netlink_unicast+0x2cb/0x460 > [ 110.677015] ? netlink_attachskb+0x440/0x440 > [ 110.681294] ? __build_skb_around+0x12a/0x150 > [ 110.685667] netlink_sendmsg+0x3d2/0x710 > [ 110.689609] ? netlink_unicast+0x460/0x460 > [ 110.693710] ? iovec_from_user.part.0+0x95/0x200 > [ 110.698348] ? netlink_unicast+0x460/0x460 > [ 110.702456] sock_sendmsg+0x99/0xa0 > [ 110.705963] ____sys_sendmsg+0x3d4/0x410 > [ 110.709895] ? kernel_sendmsg+0x30/0x30 > [ 110.713740] ? __ia32_sys_recvmmsg+0x160/0x160 > [ 110.718200] ? lockdep_hardirqs_on_prepare+0x230/0x230 > [ 110.723358] ___sys_sendmsg+0xe2/0x150 > [ 110.727124] ? sendmsg_copy_msghdr+0x110/0x110 > [ 110.731576] ? find_held_lock+0x83/0xa0 > [ 110.735425] ? lock_release+0x233/0x470 > [ 110.739271] ? __fget_files+0x14a/0x200 > [ 110.743120] ? reacquire_held_locks+0x270/0x270 > [ 110.747674] ? __fget_files+0x162/0x200 > [ 110.751524] ? __fget_light+0x66/0x100 > [ 110.755286] __sys_sendmsg+0xc3/0x140 > [ 110.758964] ? __sys_sendmsg_sock+0x20/0x20 > [ 110.763158] ? mark_held_locks+0x24/0x90 > [ 110.767099] ? ktime_get_coarse_real_ts64+0x19/0x80 > [ 110.771990] ? ktime_get_coarse_real_ts64+0x65/0x80 > [ 110.776879] ? syscall_trace_enter.constprop.0+0x16f/0x230 > [ 110.782375] do_syscall_64+0x5b/0x80 > [ 110.785963] entry_SYSCALL_64_after_hwframe+0x63/0xcd > [ 110.791021] RIP: 0033:0x7fac3b54f71d > [ 110.794609] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 ea > c4 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 > 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 3e c5 f4 > ff 48 > [ 110.813362] RSP: 002b:00007ffd3b5c7da0 EFLAGS: 00000293 ORIG_RAX: > 000000000000002e > [ 110.820938] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fac3b54f71d > [ 110.828081] RDX: 0000000000000000 RSI: 00007ffd3b5c7de0 RDI: 000000000000000d > [ 110.835221] RBP: 0000563d7ac60090 R08: 0000000000000000 R09: 0000000000000000 > [ 110.842361] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffd3b5c7f4c > [ 110.849494] R13: 00007ffd3b5c7f50 R14: 0000000000000000 R15: 00007ffd3b5c7f58 > [ 110.856639] </TASK> > [ 110.858837] Modules linked in: pcc_cpufreq(-) rfkill intel_rapl_msr > dcdbas intel_rapl_common amd64_edac edac_mce_amd rapl pcspkr qedi > mgag200 i2c_algo_bit iscsi_boot_sysfs libiscsi drm_shmem_helper > cdc_ether scsi_transport_iscsi usbnet drm_kms_helper mii uio ipmi_ssif > k10temp i2c_piix4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler > acpi_power_meter acpi_cpufreq vfat fat drm fuse xfs qedf qede qed > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel libfcoe > libfc scsi_transport_fc crc8 ccp tg3 sp5100_tco > [ 110.904398] ---[ end trace 0000000000000000 ]--- > [ 110.909039] RIP: 0010:qede_load.cold+0x14c/0xa08 [qede] > [ 110.914306] Code: c6 60 fb 40 c0 48 c7 c7 40 e1 40 c0 e8 b7 21 28 > c8 48 8b 3c 24 e8 fa 06 2d c7 41 0f b7 9f b6 00 00 00 41 89 dc e9 c2 > 3c fe ff <0f> 0b 48 c7 c1 60 d0 40 c0 eb c1 49 8d 7f 08 e8 36 09 2d c7 > 49 8b > [ 110.933068] RSP: 0018:ffff888162ab6e00 EFLAGS: 00010206 > [ 110.938314] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffc03ed524 > [ 110.945466] RDX: 000000000000006b RSI: 0000000000000007 RDI: ffff88810401a758 > [ 110.952616] RBP: ffff8888a20f2cd0 R08: 0000000000000001 R09: ffffffff8bba9e0f > [ 110.959772] R10: fffffbfff17753c1 R11: 0000000000000001 R12: ffff88810401a758 > [ 110.966925] R13: ffff8888a20f2c08 R14: ffff8888a20f2cb6 R15: ffff8888a20f2c00 > [ 110.974092] FS: 00007fac3a412500(0000) GS:ffff888810d00000(0000) > knlGS:0000000000000000 > [ 110.982198] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 110.987971] CR2: 00007fac38ffca88 CR3: 0000000123528000 CR4: 00000000003506e0 > [ 110.995131] Kernel panic - not syncing: Fatal exception > [ 111.001311] Kernel Offset: 0x6000000 from 0xffffffff81000000 > (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > [ 111.012016] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > kernel tarball: > https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/604654489/publish%20x86_64%20debug/2813007034/artifacts/kernel-mainline.kernel.org-redhat_604654489_x86_64_debug.tar.gz > kernel config: https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/604654489/build%20x86_64%20debug/2813006987/artifacts/kernel-mainline.kernel.org-redhat_604654489_x86_64_debug.config > > > Bruno > > > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [EXT] Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0 2022-08-18 15:51 ` Jakub Kicinski @ 2022-08-18 17:55 ` Manish Chopra 2022-08-18 18:26 ` Jakub Kicinski 2022-08-19 7:36 ` Bruno Goncalves 0 siblings, 2 replies; 11+ messages in thread From: Manish Chopra @ 2022-08-18 17:55 UTC (permalink / raw) To: Jakub Kicinski, Bruno Goncalves, Ariel Elior Cc: LKML, Networking, CKI Project, Saurav Kashyap, Javed Hasan, Alok Prasad > -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Thursday, August 18, 2022 9:21 PM > To: Bruno Goncalves <bgoncalv@redhat.com>; Ariel Elior > <aelior@marvell.com> > Cc: LKML <linux-kernel@vger.kernel.org>; Networking > <netdev@vger.kernel.org>; CKI Project <cki-project@redhat.com>; Saurav > Kashyap <skashyap@marvell.com>; Javed Hasan <jhasan@marvell.com>; > Manish Chopra <manishc@marvell.com> > Subject: [EXT] Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0 > > External Email > > ---------------------------------------------------------------------- > On Thu, 18 Aug 2022 09:22:17 +0200 Bruno Goncalves wrote: > > On Wed, 3 Aug 2022 at 17:37, Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > On Wed, 3 Aug 2022 14:13:00 +0200 Bruno Goncalves wrote: > > > > Got this from the most recent failure (kernel built using commit > 0805c6fb39f6): > > > > > > > > the tarball is > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__s3.amazonaws. > > > > com_arr-2Dcki-2Dprod-2Dtrusted-2Dartifacts_trusted-2Dartifacts_603 > > > > 714145_build-2520x86-5F64-2520debug_2807738987_artifacts_kernel- > 2D > > > > mainline.kernel.org-2Dredhat-5F603714145-5Fx86-5F64-5Fdebug.tar.gz > > > > > &d=DwICAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=bMTgx2X48QVXyXOEL8ALyI4d > sWoR- > > > > m74c5n3d- > ruJI8&m=zBBoyokuEgJ25hD586tidMPozXvZjlserj2qf3Iqn2o5V-ds8 > > > > > Jb7IkFIggvHpm4H&s=sjyeF4V5YfoiaDBRrtfGEXdVs3el3AdmvUNVQbteSu4&e= > > > > and the call trace from > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__s3.us-2Deast- > > > > 2D1.amazonaws.com_arr-2Dcki-2Dprod-2Ddatawarehouse- > 2Dpublic_datawa > > > > rehouse-2Dpublic_2022_08_02_redhat-3A603123526_build-5Fx86- > 5F64-5F > > > > redhat-3A603123526-5Fx86-5F64-5Fdebug_tests_1_results- > 5F0001_conso > > > > > le.log_console.log&d=DwICAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=bMTgx2X48 > QV > > > > XyXOEL8ALyI4dsWoR-m74c5n3d- > ruJI8&m=zBBoyokuEgJ25hD586tidMPozXvZjls > > > > erj2qf3Iqn2o5V- > ds8Jb7IkFIggvHpm4H&s=wV1Vq1lhXX02fbTXIWy_NRHxb9LgDz > > > > Enst11oy-RTpM&e= > > > > > > > > [ 69.876513] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > > > [ 69.888521] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant > > > > DL325 Gen10 Plus, BIOS A43 08/09/2021 > > > > [ 69.897971] RIP: 0010:qede_load.cold > > > > (/builds/2807738987/workdir/./include/linux/spinlock.h:389 > > > > /builds/2807738987/workdir/./include/linux/netdevice.h:4294 > > > > /builds/2807738987/workdir/./include/linux/netdevice.h:4385 > > > > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_m > > > > ain.c:2594 > > > > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_m > > > > ain.c:2575) > > > > > > Thanks a lot! That seems to point the finger at commit 3aa6bce9af0e > > > ("net: watchdog: hold device global xmit lock during tx disable") > > > but frankly IDK why... The driver must be fully initialized to get > > > to > > > ndo_open() so how is the tx_global_lock busted?! > > > > > > Would you be able to re-run with CONFIG_KASAN=y ? > > > Perhaps KASAN can tell us what's messing up the lock. > > > > Sorry for taking a long time to provide the info. > > Below is the call trace, note it is on a different machine. It might > > take me a few days in case I need to try on the original machine. > > Thanks, looks like KASAN didn't catch anything, it's the same crash :( Let's CC > all the Qlogic people, Qlogic PTAL. > > > [ 110.329039] [0000:c1:00.2]:[qedf_link_update:613]:9: LINK DOWN. > > [ 110.330183] invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI [ > > 110.340728] CPU: 56 PID: 1810 Comm: NetworkManager Not tainted 5.19.0 > > #1 [ 110.347435] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, > > BIOS > > 1.18.0 01/17/2022 > > [ 110.355088] RIP: 0010:qede_load.cold+0x14c/0xa08 [qede] [ > > 110.360348] Code: c6 60 fb 40 c0 48 c7 c7 40 e1 40 c0 e8 b7 21 28 > > c8 48 8b 3c 24 e8 fa 06 2d c7 41 0f b7 9f b6 00 00 00 41 89 dc e9 c2 > > 3c fe ff <0f> 0b 48 c7 c1 60 d0 40 c0 eb c1 49 8d 7f 08 e8 36 09 2d c7 > > 49 8b > > [ 110.379101] RSP: 0018:ffff888162ab6e00 EFLAGS: 00010206 [ > > 110.384338] RAX: 0000000000000000 RBX: 0000000000000000 RCX: > > ffffffffc03ed524 [ 110.391479] RDX: 000000000000006b RSI: > > 0000000000000007 RDI: ffff88810401a758 [ 110.398621] RBP: > > ffff8888a20f2cd0 R08: 0000000000000001 R09: ffffffff8bba9e0f [ > > 110.405761] R10: fffffbfff17753c1 R11: 0000000000000001 R12: > > ffff88810401a758 [ 110.412895] R13: ffff8888a20f2c08 R14: > > ffff8888a20f2cb6 R15: ffff8888a20f2c00 [ 110.420036] FS: > > 00007fac3a412500(0000) GS:ffff888810d00000(0000) > > knlGS:0000000000000000 > > [ 110.428129] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ > > 110.433875] CR2: 00007fac38ffca88 CR3: 0000000123528000 CR4: > > 00000000003506e0 [ 110.441009] Call Trace: > > [ 110.443464] <TASK> > > [ 110.445585] ? qed_eth_rxq_start_ramrod+0x320/0x320 [qed] [ > > 110.451110] ? qede_alloc_mem_txq+0x240/0x240 [qede] [ 110.456106] ? > > lock_release+0x233/0x470 [ 110.459958] ? > > rwsem_wake.isra.0+0xf1/0x100 [ 110.464163] ? > > lock_chain_count+0x20/0x20 [ 110.468179] ? find_held_lock+0x83/0xa0 > > [ 110.472032] ? lock_is_held_type+0xe3/0x140 [ 110.476245] ? > > lockdep_hardirqs_on_prepare+0x132/0x230 > > [ 110.481397] ? queue_delayed_work_on+0x57/0x90 [ 110.485852] ? > > lockdep_hardirqs_on+0x7d/0x100 [ 110.490221] ? > > qed_get_int_fp+0xe0/0xe0 [qed] [ 110.494703] qede_open+0x6d/0x100 > > [qede] [ 110.498664] __dev_open+0x1c3/0x2c0 [ 110.502171] ? > > dev_set_rx_mode+0x60/0x60 [ 110.506105] ? > > lockdep_hardirqs_on_prepare+0x132/0x230 > > [ 110.511254] ? __local_bh_enable_ip+0x8f/0x110 [ 110.515711] > > __dev_change_flags+0x31b/0x3b0 [ 110.519906] ? > > dev_set_allmulti+0x10/0x10 [ 110.523935] dev_change_flags+0x58/0xb0 > > [ 110.527783] do_setlink+0xb38/0x19e0 [ 110.531370] ? > > reacquire_held_locks+0x270/0x270 [ 110.535910] ? > > rtnetlink_put_metrics+0x2e0/0x2e0 [ 110.540538] ? > > entry_SYSCALL_64+0x1/0x29 [ 110.544478] ? > > is_bpf_text_address+0x83/0xf0 [ 110.548762] ? > > kernel_text_address+0x125/0x130 [ 110.553218] ? > > __kernel_text_address+0xe/0x40 [ 110.557585] ? > > unwind_get_return_address+0x33/0x50 > > [ 110.562386] ? create_prof_cpu_mask+0x20/0x20 [ 110.566755] ? > > arch_stack_walk+0xa3/0x100 [ 110.570781] ? memset+0x1f/0x40 [ > > 110.573939] ? __nla_validate_parse+0xb4/0x1040 [ 110.578481] ? > > stack_trace_save+0x96/0xd0 [ 110.582504] ? > > nla_get_range_signed+0x180/0x180 [ 110.587042] ? > > __stack_depot_save+0x35/0x4a0 [ 110.591335] > > __rtnl_newlink+0x715/0xc90 [ 110.595182] ? mark_lock+0xd51/0xd90 [ > > 110.598773] ? rtnl_link_unregister+0x1e0/0x1e0 [ 110.603309] ? > > _raw_spin_unlock_irqrestore+0x40/0x60 > > [ 110.608285] ? ___slab_alloc+0x919/0xf80 [ 110.612222] ? > > rtnl_newlink+0x36/0x70 [ 110.615896] ? > > reacquire_held_locks+0x270/0x270 [ 110.620440] ? > > lock_is_held_type+0xe3/0x140 [ 110.624634] ? > > rcu_read_lock_sched_held+0x3f/0x80 > > [ 110.629353] ? trace_kmalloc+0x33/0x100 [ 110.633207] > > rtnl_newlink+0x4f/0x70 [ 110.636704] rtnetlink_rcv_msg+0x242/0x6b0 [ > > 110.640815] ? rtnl_stats_set+0x260/0x260 [ 110.644836] ? > > lock_acquire+0x16f/0x410 [ 110.648682] ? lock_acquire+0x17f/0x410 [ > > 110.652533] netlink_rcv_skb+0xce/0x200 [ 110.656385] ? > > rtnl_stats_set+0x260/0x260 [ 110.660408] ? netlink_ack+0x520/0x520 [ > > 110.664166] ? netlink_deliver_tap+0x13c/0x5c0 [ 110.668626] ? > > netlink_deliver_tap+0x141/0x5c0 [ 110.673083] > > netlink_unicast+0x2cb/0x460 [ 110.677015] ? > > netlink_attachskb+0x440/0x440 [ 110.681294] ? > > __build_skb_around+0x12a/0x150 [ 110.685667] > > netlink_sendmsg+0x3d2/0x710 [ 110.689609] ? > > netlink_unicast+0x460/0x460 [ 110.693710] ? > > iovec_from_user.part.0+0x95/0x200 [ 110.698348] ? > > netlink_unicast+0x460/0x460 [ 110.702456] sock_sendmsg+0x99/0xa0 [ > > 110.705963] ____sys_sendmsg+0x3d4/0x410 [ 110.709895] ? > > kernel_sendmsg+0x30/0x30 [ 110.713740] ? > > __ia32_sys_recvmmsg+0x160/0x160 [ 110.718200] ? > > lockdep_hardirqs_on_prepare+0x230/0x230 > > [ 110.723358] ___sys_sendmsg+0xe2/0x150 [ 110.727124] ? > > sendmsg_copy_msghdr+0x110/0x110 [ 110.731576] ? > > find_held_lock+0x83/0xa0 [ 110.735425] ? lock_release+0x233/0x470 [ > > 110.739271] ? __fget_files+0x14a/0x200 [ 110.743120] ? > > reacquire_held_locks+0x270/0x270 [ 110.747674] ? > > __fget_files+0x162/0x200 [ 110.751524] ? __fget_light+0x66/0x100 [ > > 110.755286] __sys_sendmsg+0xc3/0x140 [ 110.758964] ? > > __sys_sendmsg_sock+0x20/0x20 [ 110.763158] ? > > mark_held_locks+0x24/0x90 [ 110.767099] ? > > ktime_get_coarse_real_ts64+0x19/0x80 > > [ 110.771990] ? ktime_get_coarse_real_ts64+0x65/0x80 > > [ 110.776879] ? syscall_trace_enter.constprop.0+0x16f/0x230 > > [ 110.782375] do_syscall_64+0x5b/0x80 [ 110.785963] > > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > [ 110.791021] RIP: 0033:0x7fac3b54f71d [ 110.794609] Code: 28 89 54 > > 24 1c 48 89 74 24 10 89 7c 24 08 e8 ea > > c4 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 > > 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 3e c5 f4 > > ff 48 [ 110.813362] RSP: 002b:00007ffd3b5c7da0 EFLAGS: 00000293 > > ORIG_RAX: > > 000000000000002e > > [ 110.820938] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: > > 00007fac3b54f71d [ 110.828081] RDX: 0000000000000000 RSI: > > 00007ffd3b5c7de0 RDI: 000000000000000d [ 110.835221] RBP: > > 0000563d7ac60090 R08: 0000000000000000 R09: 0000000000000000 [ > > 110.842361] R10: 0000000000000000 R11: 0000000000000293 R12: > > 00007ffd3b5c7f4c [ 110.849494] R13: 00007ffd3b5c7f50 R14: > > 0000000000000000 R15: 00007ffd3b5c7f58 [ 110.856639] </TASK> [ > > 110.858837] Modules linked in: pcc_cpufreq(-) rfkill intel_rapl_msr > > dcdbas intel_rapl_common amd64_edac edac_mce_amd rapl pcspkr qedi > > mgag200 i2c_algo_bit iscsi_boot_sysfs libiscsi drm_shmem_helper > > cdc_ether scsi_transport_iscsi usbnet drm_kms_helper mii uio ipmi_ssif > > k10temp i2c_piix4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler > > acpi_power_meter acpi_cpufreq vfat fat drm fuse xfs qedf qede qed > > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel libfcoe > > libfc scsi_transport_fc crc8 ccp tg3 sp5100_tco [ 110.904398] ---[ > > end trace 0000000000000000 ]--- [ 110.909039] RIP: > > 0010:qede_load.cold+0x14c/0xa08 [qede] [ 110.914306] Code: c6 60 fb > > 40 c0 48 c7 c7 40 e1 40 c0 e8 b7 21 28 > > c8 48 8b 3c 24 e8 fa 06 2d c7 41 0f b7 9f b6 00 00 00 41 89 dc e9 c2 > > 3c fe ff <0f> 0b 48 c7 c1 60 d0 40 c0 eb c1 49 8d 7f 08 e8 36 09 2d c7 > > 49 8b > > [ 110.933068] RSP: 0018:ffff888162ab6e00 EFLAGS: 00010206 [ > > 110.938314] RAX: 0000000000000000 RBX: 0000000000000000 RCX: > > ffffffffc03ed524 [ 110.945466] RDX: 000000000000006b RSI: > > 0000000000000007 RDI: ffff88810401a758 [ 110.952616] RBP: > > ffff8888a20f2cd0 R08: 0000000000000001 R09: ffffffff8bba9e0f [ > > 110.959772] R10: fffffbfff17753c1 R11: 0000000000000001 R12: > > ffff88810401a758 [ 110.966925] R13: ffff8888a20f2c08 R14: > > ffff8888a20f2cb6 R15: ffff8888a20f2c00 [ 110.974092] FS: > > 00007fac3a412500(0000) GS:ffff888810d00000(0000) > > knlGS:0000000000000000 > > [ 110.982198] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ > > 110.987971] CR2: 00007fac38ffca88 CR3: 0000000123528000 CR4: > > 00000000003506e0 [ 110.995131] Kernel panic - not syncing: Fatal > > exception [ 111.001311] Kernel Offset: 0x6000000 from > > 0xffffffff81000000 (relocation range: > > 0xffffffff80000000-0xffffffffbfffffff) > > [ 111.012016] ---[ end Kernel panic - not syncing: Fatal exception > > ]--- > > > > kernel tarball: > > https://urldefense.proofpoint.com/v2/url?u=https- > 3A__s3.amazonaws.com_ > > arr-2Dcki-2Dprod-2Dtrusted-2Dartifacts_trusted-2Dartifacts_604654489_p > > ublish-2520x86-5F64-2520debug_2813007034_artifacts_kernel- > 2Dmainline.k > > ernel.org-2Dredhat-5F604654489-5Fx86-5F64- > 5Fdebug.tar.gz&d=DwICAg&c=nK > > jWec2b6R0mOyPaz7xtfQ&r=bMTgx2X48QVXyXOEL8ALyI4dsWoR- > m74c5n3d-ruJI8&m=z > > BBoyokuEgJ25hD586tidMPozXvZjlserj2qf3Iqn2o5V- > ds8Jb7IkFIggvHpm4H&s=WXbt > > GecipcXSY_rwTu6JrCEI7VFKToDZ3UfZ4ciloWk&e= > > kernel config: > > https://urldefense.proofpoint.com/v2/url?u=https- > 3A__s3.amazonaws.com_ > > arr-2Dcki-2Dprod-2Dtrusted-2Dartifacts_trusted-2Dartifacts_604654489_b > > uild-2520x86-5F64-2520debug_2813006987_artifacts_kernel- > 2Dmainline.ker > > nel.org-2Dredhat-5F604654489-5Fx86-5F64- > 5Fdebug.config&d=DwICAg&c=nKjW > > ec2b6R0mOyPaz7xtfQ&r=bMTgx2X48QVXyXOEL8ALyI4dsWoR-m74c5n3d- > ruJI8&m=zBB > > oyokuEgJ25hD586tidMPozXvZjlserj2qf3Iqn2o5V- > ds8Jb7IkFIggvHpm4H&s=edaLwi > > kEZyvLAk8hrsZNE-Esjsn9HZ5luaW_FARAlCw&e= > > > > > > Bruno Hi Bruno, 1. How do you reproduce this issue exactly ? Any specific instructions or any special kernel CONFIG with which issue reproduces ? 2. Is there any Bugzilla opened for this already ? Can you please provide the complete crash logs ? (vmcore-dmesg.txt ?) 3. You mentioned about commit 3aa6bce9af0e ("net: watchdog: hold device global xmit lock during tx disable") Do you mean issue started surfacing only after this commit ? Driver calls netif_tx_disable() from these two relevant contexts - a. One in ndo_stop() flow /* Close OS Tx */ netif_tx_disable(edev->ndev); netif_carrier_off(edev->ndev); b. Other in LINK events handling from the hard IRQ context DP_NOTICE(edev, "Link is down\n"); netif_tx_disable(edev->ndev); netif_carrier_off(edev->ndev); Thanks, Manish ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [EXT] Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0 2022-08-18 17:55 ` [EXT] " Manish Chopra @ 2022-08-18 18:26 ` Jakub Kicinski 2022-08-19 7:36 ` Bruno Goncalves 1 sibling, 0 replies; 11+ messages in thread From: Jakub Kicinski @ 2022-08-18 18:26 UTC (permalink / raw) To: Manish Chopra Cc: Bruno Goncalves, Ariel Elior, LKML, Networking, CKI Project, Saurav Kashyap, Javed Hasan, Alok Prasad On Thu, 18 Aug 2022 17:55:28 +0000 Manish Chopra wrote: > 3. You mentioned about commit 3aa6bce9af0e ("net: watchdog: hold device global xmit lock during tx disable") FWIW that was just my guess based on the stack trace, Bruno posted the stacktraces with line numbers decoded here: https://lore.kernel.org/all/CA+QYu4ob4cbh3Vnh9DWgaPpyw8nTLFG__TbBpBsYg1tWJPxygg@mail.gmail.com/ > Do you mean issue started surfacing only after this commit ? Driver calls netif_tx_disable() from these two relevant contexts - > > a. One in ndo_stop() flow > > /* Close OS Tx */ > netif_tx_disable(edev->ndev); > netif_carrier_off(edev->ndev); > > b. Other in LINK events handling from the hard IRQ context > > DP_NOTICE(edev, "Link is down\n"); > netif_tx_disable(edev->ndev); > netif_carrier_off(edev->ndev); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [EXT] Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0 2022-08-18 17:55 ` [EXT] " Manish Chopra 2022-08-18 18:26 ` Jakub Kicinski @ 2022-08-19 7:36 ` Bruno Goncalves 2023-02-22 21:34 ` Jakub Kicinski 1 sibling, 1 reply; 11+ messages in thread From: Bruno Goncalves @ 2022-08-19 7:36 UTC (permalink / raw) To: Manish Chopra Cc: Jakub Kicinski, Ariel Elior, LKML, Networking, CKI Project, Saurav Kashyap, Javed Hasan, Alok Prasad On Thu, 18 Aug 2022 at 19:55, Manish Chopra <manishc@marvell.com> wrote: > > > -----Original Message----- > > From: Jakub Kicinski <kuba@kernel.org> > > Sent: Thursday, August 18, 2022 9:21 PM > > To: Bruno Goncalves <bgoncalv@redhat.com>; Ariel Elior > > <aelior@marvell.com> > > Cc: LKML <linux-kernel@vger.kernel.org>; Networking > > <netdev@vger.kernel.org>; CKI Project <cki-project@redhat.com>; Saurav > > Kashyap <skashyap@marvell.com>; Javed Hasan <jhasan@marvell.com>; > > Manish Chopra <manishc@marvell.com> > > Subject: [EXT] Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0 > > > > External Email > > > > ---------------------------------------------------------------------- > > On Thu, 18 Aug 2022 09:22:17 +0200 Bruno Goncalves wrote: > > > On Wed, 3 Aug 2022 at 17:37, Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > > > On Wed, 3 Aug 2022 14:13:00 +0200 Bruno Goncalves wrote: > > > > > Got this from the most recent failure (kernel built using commit > > 0805c6fb39f6): > > > > > > > > > > the tarball is > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__s3.amazonaws. > > > > > com_arr-2Dcki-2Dprod-2Dtrusted-2Dartifacts_trusted-2Dartifacts_603 > > > > > 714145_build-2520x86-5F64-2520debug_2807738987_artifacts_kernel- > > 2D > > > > > mainline.kernel.org-2Dredhat-5F603714145-5Fx86-5F64-5Fdebug.tar.gz > > > > > > > &d=DwICAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=bMTgx2X48QVXyXOEL8ALyI4d > > sWoR- > > > > > m74c5n3d- > > ruJI8&m=zBBoyokuEgJ25hD586tidMPozXvZjlserj2qf3Iqn2o5V-ds8 > > > > > > > Jb7IkFIggvHpm4H&s=sjyeF4V5YfoiaDBRrtfGEXdVs3el3AdmvUNVQbteSu4&e= > > > > > and the call trace from > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__s3.us-2Deast- > > > > > 2D1.amazonaws.com_arr-2Dcki-2Dprod-2Ddatawarehouse- > > 2Dpublic_datawa > > > > > rehouse-2Dpublic_2022_08_02_redhat-3A603123526_build-5Fx86- > > 5F64-5F > > > > > redhat-3A603123526-5Fx86-5F64-5Fdebug_tests_1_results- > > 5F0001_conso > > > > > > > le.log_console.log&d=DwICAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=bMTgx2X48 > > QV > > > > > XyXOEL8ALyI4dsWoR-m74c5n3d- > > ruJI8&m=zBBoyokuEgJ25hD586tidMPozXvZjls > > > > > erj2qf3Iqn2o5V- > > ds8Jb7IkFIggvHpm4H&s=wV1Vq1lhXX02fbTXIWy_NRHxb9LgDz > > > > > Enst11oy-RTpM&e= > > > > > > > > > > [ 69.876513] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > > > > [ 69.888521] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant > > > > > DL325 Gen10 Plus, BIOS A43 08/09/2021 > > > > > [ 69.897971] RIP: 0010:qede_load.cold > > > > > (/builds/2807738987/workdir/./include/linux/spinlock.h:389 > > > > > /builds/2807738987/workdir/./include/linux/netdevice.h:4294 > > > > > /builds/2807738987/workdir/./include/linux/netdevice.h:4385 > > > > > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_m > > > > > ain.c:2594 > > > > > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_m > > > > > ain.c:2575) > > > > > > > > Thanks a lot! That seems to point the finger at commit 3aa6bce9af0e > > > > ("net: watchdog: hold device global xmit lock during tx disable") > > > > but frankly IDK why... The driver must be fully initialized to get > > > > to > > > > ndo_open() so how is the tx_global_lock busted?! > > > > > > > > Would you be able to re-run with CONFIG_KASAN=y ? > > > > Perhaps KASAN can tell us what's messing up the lock. > > > > > > Sorry for taking a long time to provide the info. > > > Below is the call trace, note it is on a different machine. It might > > > take me a few days in case I need to try on the original machine. > > > > Thanks, looks like KASAN didn't catch anything, it's the same crash :( Let's CC > > all the Qlogic people, Qlogic PTAL. > > > > > [ 110.329039] [0000:c1:00.2]:[qedf_link_update:613]:9: LINK DOWN. > > > [ 110.330183] invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI [ > > > 110.340728] CPU: 56 PID: 1810 Comm: NetworkManager Not tainted 5.19.0 > > > #1 [ 110.347435] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, > > > BIOS > > > 1.18.0 01/17/2022 > > > [ 110.355088] RIP: 0010:qede_load.cold+0x14c/0xa08 [qede] [ > > > 110.360348] Code: c6 60 fb 40 c0 48 c7 c7 40 e1 40 c0 e8 b7 21 28 > > > c8 48 8b 3c 24 e8 fa 06 2d c7 41 0f b7 9f b6 00 00 00 41 89 dc e9 c2 > > > 3c fe ff <0f> 0b 48 c7 c1 60 d0 40 c0 eb c1 49 8d 7f 08 e8 36 09 2d c7 > > > 49 8b > > > [ 110.379101] RSP: 0018:ffff888162ab6e00 EFLAGS: 00010206 [ > > > 110.384338] RAX: 0000000000000000 RBX: 0000000000000000 RCX: > > > ffffffffc03ed524 [ 110.391479] RDX: 000000000000006b RSI: > > > 0000000000000007 RDI: ffff88810401a758 [ 110.398621] RBP: > > > ffff8888a20f2cd0 R08: 0000000000000001 R09: ffffffff8bba9e0f [ > > > 110.405761] R10: fffffbfff17753c1 R11: 0000000000000001 R12: > > > ffff88810401a758 [ 110.412895] R13: ffff8888a20f2c08 R14: > > > ffff8888a20f2cb6 R15: ffff8888a20f2c00 [ 110.420036] FS: > > > 00007fac3a412500(0000) GS:ffff888810d00000(0000) > > > knlGS:0000000000000000 > > > [ 110.428129] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ > > > 110.433875] CR2: 00007fac38ffca88 CR3: 0000000123528000 CR4: > > > 00000000003506e0 [ 110.441009] Call Trace: > > > [ 110.443464] <TASK> > > > [ 110.445585] ? qed_eth_rxq_start_ramrod+0x320/0x320 [qed] [ > > > 110.451110] ? qede_alloc_mem_txq+0x240/0x240 [qede] [ 110.456106] ? > > > lock_release+0x233/0x470 [ 110.459958] ? > > > rwsem_wake.isra.0+0xf1/0x100 [ 110.464163] ? > > > lock_chain_count+0x20/0x20 [ 110.468179] ? find_held_lock+0x83/0xa0 > > > [ 110.472032] ? lock_is_held_type+0xe3/0x140 [ 110.476245] ? > > > lockdep_hardirqs_on_prepare+0x132/0x230 > > > [ 110.481397] ? queue_delayed_work_on+0x57/0x90 [ 110.485852] ? > > > lockdep_hardirqs_on+0x7d/0x100 [ 110.490221] ? > > > qed_get_int_fp+0xe0/0xe0 [qed] [ 110.494703] qede_open+0x6d/0x100 > > > [qede] [ 110.498664] __dev_open+0x1c3/0x2c0 [ 110.502171] ? > > > dev_set_rx_mode+0x60/0x60 [ 110.506105] ? > > > lockdep_hardirqs_on_prepare+0x132/0x230 > > > [ 110.511254] ? __local_bh_enable_ip+0x8f/0x110 [ 110.515711] > > > __dev_change_flags+0x31b/0x3b0 [ 110.519906] ? > > > dev_set_allmulti+0x10/0x10 [ 110.523935] dev_change_flags+0x58/0xb0 > > > [ 110.527783] do_setlink+0xb38/0x19e0 [ 110.531370] ? > > > reacquire_held_locks+0x270/0x270 [ 110.535910] ? > > > rtnetlink_put_metrics+0x2e0/0x2e0 [ 110.540538] ? > > > entry_SYSCALL_64+0x1/0x29 [ 110.544478] ? > > > is_bpf_text_address+0x83/0xf0 [ 110.548762] ? > > > kernel_text_address+0x125/0x130 [ 110.553218] ? > > > __kernel_text_address+0xe/0x40 [ 110.557585] ? > > > unwind_get_return_address+0x33/0x50 > > > [ 110.562386] ? create_prof_cpu_mask+0x20/0x20 [ 110.566755] ? > > > arch_stack_walk+0xa3/0x100 [ 110.570781] ? memset+0x1f/0x40 [ > > > 110.573939] ? __nla_validate_parse+0xb4/0x1040 [ 110.578481] ? > > > stack_trace_save+0x96/0xd0 [ 110.582504] ? > > > nla_get_range_signed+0x180/0x180 [ 110.587042] ? > > > __stack_depot_save+0x35/0x4a0 [ 110.591335] > > > __rtnl_newlink+0x715/0xc90 [ 110.595182] ? mark_lock+0xd51/0xd90 [ > > > 110.598773] ? rtnl_link_unregister+0x1e0/0x1e0 [ 110.603309] ? > > > _raw_spin_unlock_irqrestore+0x40/0x60 > > > [ 110.608285] ? ___slab_alloc+0x919/0xf80 [ 110.612222] ? > > > rtnl_newlink+0x36/0x70 [ 110.615896] ? > > > reacquire_held_locks+0x270/0x270 [ 110.620440] ? > > > lock_is_held_type+0xe3/0x140 [ 110.624634] ? > > > rcu_read_lock_sched_held+0x3f/0x80 > > > [ 110.629353] ? trace_kmalloc+0x33/0x100 [ 110.633207] > > > rtnl_newlink+0x4f/0x70 [ 110.636704] rtnetlink_rcv_msg+0x242/0x6b0 [ > > > 110.640815] ? rtnl_stats_set+0x260/0x260 [ 110.644836] ? > > > lock_acquire+0x16f/0x410 [ 110.648682] ? lock_acquire+0x17f/0x410 [ > > > 110.652533] netlink_rcv_skb+0xce/0x200 [ 110.656385] ? > > > rtnl_stats_set+0x260/0x260 [ 110.660408] ? netlink_ack+0x520/0x520 [ > > > 110.664166] ? netlink_deliver_tap+0x13c/0x5c0 [ 110.668626] ? > > > netlink_deliver_tap+0x141/0x5c0 [ 110.673083] > > > netlink_unicast+0x2cb/0x460 [ 110.677015] ? > > > netlink_attachskb+0x440/0x440 [ 110.681294] ? > > > __build_skb_around+0x12a/0x150 [ 110.685667] > > > netlink_sendmsg+0x3d2/0x710 [ 110.689609] ? > > > netlink_unicast+0x460/0x460 [ 110.693710] ? > > > iovec_from_user.part.0+0x95/0x200 [ 110.698348] ? > > > netlink_unicast+0x460/0x460 [ 110.702456] sock_sendmsg+0x99/0xa0 [ > > > 110.705963] ____sys_sendmsg+0x3d4/0x410 [ 110.709895] ? > > > kernel_sendmsg+0x30/0x30 [ 110.713740] ? > > > __ia32_sys_recvmmsg+0x160/0x160 [ 110.718200] ? > > > lockdep_hardirqs_on_prepare+0x230/0x230 > > > [ 110.723358] ___sys_sendmsg+0xe2/0x150 [ 110.727124] ? > > > sendmsg_copy_msghdr+0x110/0x110 [ 110.731576] ? > > > find_held_lock+0x83/0xa0 [ 110.735425] ? lock_release+0x233/0x470 [ > > > 110.739271] ? __fget_files+0x14a/0x200 [ 110.743120] ? > > > reacquire_held_locks+0x270/0x270 [ 110.747674] ? > > > __fget_files+0x162/0x200 [ 110.751524] ? __fget_light+0x66/0x100 [ > > > 110.755286] __sys_sendmsg+0xc3/0x140 [ 110.758964] ? > > > __sys_sendmsg_sock+0x20/0x20 [ 110.763158] ? > > > mark_held_locks+0x24/0x90 [ 110.767099] ? > > > ktime_get_coarse_real_ts64+0x19/0x80 > > > [ 110.771990] ? ktime_get_coarse_real_ts64+0x65/0x80 > > > [ 110.776879] ? syscall_trace_enter.constprop.0+0x16f/0x230 > > > [ 110.782375] do_syscall_64+0x5b/0x80 [ 110.785963] > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > > [ 110.791021] RIP: 0033:0x7fac3b54f71d [ 110.794609] Code: 28 89 54 > > > 24 1c 48 89 74 24 10 89 7c 24 08 e8 ea > > > c4 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 > > > 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 3e c5 f4 > > > ff 48 [ 110.813362] RSP: 002b:00007ffd3b5c7da0 EFLAGS: 00000293 > > > ORIG_RAX: > > > 000000000000002e > > > [ 110.820938] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: > > > 00007fac3b54f71d [ 110.828081] RDX: 0000000000000000 RSI: > > > 00007ffd3b5c7de0 RDI: 000000000000000d [ 110.835221] RBP: > > > 0000563d7ac60090 R08: 0000000000000000 R09: 0000000000000000 [ > > > 110.842361] R10: 0000000000000000 R11: 0000000000000293 R12: > > > 00007ffd3b5c7f4c [ 110.849494] R13: 00007ffd3b5c7f50 R14: > > > 0000000000000000 R15: 00007ffd3b5c7f58 [ 110.856639] </TASK> [ > > > 110.858837] Modules linked in: pcc_cpufreq(-) rfkill intel_rapl_msr > > > dcdbas intel_rapl_common amd64_edac edac_mce_amd rapl pcspkr qedi > > > mgag200 i2c_algo_bit iscsi_boot_sysfs libiscsi drm_shmem_helper > > > cdc_ether scsi_transport_iscsi usbnet drm_kms_helper mii uio ipmi_ssif > > > k10temp i2c_piix4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler > > > acpi_power_meter acpi_cpufreq vfat fat drm fuse xfs qedf qede qed > > > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel libfcoe > > > libfc scsi_transport_fc crc8 ccp tg3 sp5100_tco [ 110.904398] ---[ > > > end trace 0000000000000000 ]--- [ 110.909039] RIP: > > > 0010:qede_load.cold+0x14c/0xa08 [qede] [ 110.914306] Code: c6 60 fb > > > 40 c0 48 c7 c7 40 e1 40 c0 e8 b7 21 28 > > > c8 48 8b 3c 24 e8 fa 06 2d c7 41 0f b7 9f b6 00 00 00 41 89 dc e9 c2 > > > 3c fe ff <0f> 0b 48 c7 c1 60 d0 40 c0 eb c1 49 8d 7f 08 e8 36 09 2d c7 > > > 49 8b > > > [ 110.933068] RSP: 0018:ffff888162ab6e00 EFLAGS: 00010206 [ > > > 110.938314] RAX: 0000000000000000 RBX: 0000000000000000 RCX: > > > ffffffffc03ed524 [ 110.945466] RDX: 000000000000006b RSI: > > > 0000000000000007 RDI: ffff88810401a758 [ 110.952616] RBP: > > > ffff8888a20f2cd0 R08: 0000000000000001 R09: ffffffff8bba9e0f [ > > > 110.959772] R10: fffffbfff17753c1 R11: 0000000000000001 R12: > > > ffff88810401a758 [ 110.966925] R13: ffff8888a20f2c08 R14: > > > ffff8888a20f2cb6 R15: ffff8888a20f2c00 [ 110.974092] FS: > > > 00007fac3a412500(0000) GS:ffff888810d00000(0000) > > > knlGS:0000000000000000 > > > [ 110.982198] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ > > > 110.987971] CR2: 00007fac38ffca88 CR3: 0000000123528000 CR4: > > > 00000000003506e0 [ 110.995131] Kernel panic - not syncing: Fatal > > > exception [ 111.001311] Kernel Offset: 0x6000000 from > > > 0xffffffff81000000 (relocation range: > > > 0xffffffff80000000-0xffffffffbfffffff) > > > [ 111.012016] ---[ end Kernel panic - not syncing: Fatal exception > > > ]--- > > > > > > kernel tarball: > > > https://urldefense.proofpoint.com/v2/url?u=https- > > 3A__s3.amazonaws.com_ > > > arr-2Dcki-2Dprod-2Dtrusted-2Dartifacts_trusted-2Dartifacts_604654489_p > > > ublish-2520x86-5F64-2520debug_2813007034_artifacts_kernel- > > 2Dmainline.k > > > ernel.org-2Dredhat-5F604654489-5Fx86-5F64- > > 5Fdebug.tar.gz&d=DwICAg&c=nK > > > jWec2b6R0mOyPaz7xtfQ&r=bMTgx2X48QVXyXOEL8ALyI4dsWoR- > > m74c5n3d-ruJI8&m=z > > > BBoyokuEgJ25hD586tidMPozXvZjlserj2qf3Iqn2o5V- > > ds8Jb7IkFIggvHpm4H&s=WXbt > > > GecipcXSY_rwTu6JrCEI7VFKToDZ3UfZ4ciloWk&e= > > > kernel config: > > > https://urldefense.proofpoint.com/v2/url?u=https- > > 3A__s3.amazonaws.com_ > > > arr-2Dcki-2Dprod-2Dtrusted-2Dartifacts_trusted-2Dartifacts_604654489_b > > > uild-2520x86-5F64-2520debug_2813006987_artifacts_kernel- > > 2Dmainline.ker > > > nel.org-2Dredhat-5F604654489-5Fx86-5F64- > > 5Fdebug.config&d=DwICAg&c=nKjW > > > ec2b6R0mOyPaz7xtfQ&r=bMTgx2X48QVXyXOEL8ALyI4dsWoR-m74c5n3d- > > ruJI8&m=zBB > > > oyokuEgJ25hD586tidMPozXvZjlserj2qf3Iqn2o5V- > > ds8Jb7IkFIggvHpm4H&s=edaLwi > > > kEZyvLAk8hrsZNE-Esjsn9HZ5luaW_FARAlCw&e= > > > > > > > > > Bruno > > Hi Bruno, > > 1. How do you reproduce this issue exactly ? Any specific instructions or any special kernel CONFIG with which issue reproduces ? We hit the panic by booting up the machine with a kernel 5.19.0 with debug flags enabled. kernel tarball: https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/604654489/publish%20x86_64%20debug/2813007034/artifacts/kernel-mainline.kernel.org-redhat_604654489_x86_64_debug.tar.gz kernel config is: https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/604654489/build%20x86_64%20debug/2813006987/artifacts/kernel-mainline.kernel.org-redhat_604654489_x86_64_debug.config The machine has FastLinQ QL41000 Series 10/25/40/50GbE Controller (mbi 8.52.21 [mfw 8.52.9.0]) > 2. Is there any Bugzilla opened for this already ? Can you please provide the complete crash logs ? (vmcore-dmesg.txt ?) No, there is no bugzilla, I haven't seen this problem on rhel-9 kernel (5.14). I don't have a vmcore, but I'll try to get one. Below is a link to console log from a CKI pipeline execution, it is not from the same run above as above I ran manually. https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/06/20/568171088/redhat:568171088_x86_64_debug/tests/12172126_x86_64_5_console.log For this console.log the kernel config is http://s3.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/06/20/568171088/redhat:568171088/redhat:568171088_x86_64_debug/.config and the tarball is http://s3.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/06/20/568171088/redhat:568171088/redhat:568171088_x86_64_debug/kernel-mainline.kernel.org-x86_64-78ca55889a549a9a194c6ec666836329b774ab6d.tar.gz Bruno > 3. You mentioned about commit 3aa6bce9af0e ("net: watchdog: hold device global xmit lock during tx disable") > Do you mean issue started surfacing only after this commit ? Driver calls netif_tx_disable() from these two relevant contexts - > > a. One in ndo_stop() flow > > /* Close OS Tx */ > netif_tx_disable(edev->ndev); > netif_carrier_off(edev->ndev); > > b. Other in LINK events handling from the hard IRQ context > > DP_NOTICE(edev, "Link is down\n"); > netif_tx_disable(edev->ndev); > netif_carrier_off(edev->ndev); > > Thanks, > Manish > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [EXT] Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0 2022-08-19 7:36 ` Bruno Goncalves @ 2023-02-22 21:34 ` Jakub Kicinski 2023-02-23 15:16 ` Bruno Goncalves 0 siblings, 1 reply; 11+ messages in thread From: Jakub Kicinski @ 2023-02-22 21:34 UTC (permalink / raw) To: Bruno Goncalves Cc: Manish Chopra, Ariel Elior, LKML, Networking, CKI Project, Saurav Kashyap, Javed Hasan, Alok Prasad On Fri, 19 Aug 2022 09:36:54 +0200 Bruno Goncalves wrote: > We hit the panic by booting up the machine with a kernel 5.19.0 with > debug flags enabled. Hi Bruno, Was this ever fixed? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [EXT] Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0 2023-02-22 21:34 ` Jakub Kicinski @ 2023-02-23 15:16 ` Bruno Goncalves 0 siblings, 0 replies; 11+ messages in thread From: Bruno Goncalves @ 2023-02-23 15:16 UTC (permalink / raw) To: Jakub Kicinski Cc: Manish Chopra, Ariel Elior, LKML, Networking, CKI Project, Saurav Kashyap, Javed Hasan, Alok Prasad On Wed, 22 Feb 2023 at 22:34, Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 19 Aug 2022 09:36:54 +0200 Bruno Goncalves wrote: > > We hit the panic by booting up the machine with a kernel 5.19.0 with > > debug flags enabled. > > Hi Bruno, > > Was this ever fixed? It looks like it got fixed, I haven't seen this failure on 6.2.0 kernels. Bruno ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-02-23 15:17 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-02 11:27 RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0 Bruno Goncalves 2022-08-02 19:23 ` Jakub Kicinski 2022-08-03 12:13 ` Bruno Goncalves 2022-08-03 15:37 ` Jakub Kicinski 2022-08-18 7:22 ` Bruno Goncalves 2022-08-18 15:51 ` Jakub Kicinski 2022-08-18 17:55 ` [EXT] " Manish Chopra 2022-08-18 18:26 ` Jakub Kicinski 2022-08-19 7:36 ` Bruno Goncalves 2023-02-22 21:34 ` Jakub Kicinski 2023-02-23 15:16 ` Bruno Goncalves
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).