linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0
@ 2022-08-02 11:27 Bruno Goncalves
  2022-08-02 19:23 ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Bruno Goncalves @ 2022-08-02 11:27 UTC (permalink / raw)
  To: LKML, Networking; +Cc: CKI Project

Hello,

We've noticed the following panic when booting up kernel 5.19.0 on a
specific machine.
The panic seems to happen when we build the kernel with debug flags.
Below is the first crash we noticed, more logs at [1] and the kernel
config is at [2].

[   59.207684] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[   59.212949] CPU: 32 PID: 1967 Comm: NetworkManager Not tainted 5.19.0-rc3 #1
[   59.220041] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant
DL325 Gen10 Plus, BIOS A43 08/09/2021
[   59.229490] RIP: 0010:qede_load.cold+0x5a1/0x819 [qede]
[   59.234757] Code: 41 88 84 24 b1 00 00 00 41 0f b7 84 24 b6 00 00
00 45 88 b4 24 b3 00 00 00 e9 b8 00 ff ff 48 c7 c1 09 66 46 c1 e9 6f
ff ff ff <0f> 0b 49 8b 7c 24 08 e8 82 e2 fe ff 48 89 c1 48 85 c0 74 32
ba 2a
[   59.253639] RSP: 0018:ffffae1e04593688 EFLAGS: 00010206
[   59.258897] RAX: 000000000000006b RBX: 0000000000000000 RCX: 0000000000000006
[   59.266073] RDX: ffff8f8f35332be8 RSI: ffffffffaf96411f RDI: ffffffffaf8e4b1e
[   59.273250] RBP: ffff8f8f2a87acd0 R08: 0000000000000001 R09: 0000000000000001
[   59.280426] R10: 0000000000000000 R11: 000000000f8c087f R12: ffff8f8f2a87ac00
[   59.287602] R13: ffff8f8f34d7f928 R14: ffffae1e0c039000 R15: 0000000000000000
[   59.294777] FS:  00007f164509f500(0000) GS:ffff8f9dfd800000(0000)
knlGS:0000000000000000
[   59.302917] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   59.308697] CR2: 00005575f29a5c08 CR3: 0000000163810000 CR4: 0000000000350ee0
[   59.315875] Call Trace:
[   59.318335]  <TASK>
[   59.320458]  qede_open+0x3b/0x90 [qede]
[   59.324323]  __dev_open+0xf1/0x1c0
[   59.327748]  __dev_change_flags+0x1f8/0x280
[   59.331957]  dev_change_flags+0x22/0x60
[   59.335816]  do_setlink+0x327/0x1140
[   59.339413]  ? lock_is_held_type+0xe3/0x140
[   59.343625]  ? lock_is_held_type+0xe3/0x140
[   59.347833]  ? __nla_validate_parse+0x5f/0xb70
[   59.352307]  ? mark_held_locks+0x49/0x70
[   59.356256]  ? _raw_spin_unlock_irqrestore+0x30/0x60
[   59.361254]  ? lockdep_hardirqs_on+0x7d/0x100
[   59.365640]  __rtnl_newlink+0x59c/0x950
[   59.369502]  ? rtnl_newlink+0x2a/0x60
[   59.373185]  ? rcu_read_lock_sched_held+0x3c/0x70
[   59.377918]  ? trace_kmalloc+0x30/0xf0
[   59.381692]  ? kmem_cache_alloc_trace+0x1ad/0x270
[   59.386426]  rtnl_newlink+0x43/0x60
[   59.389936]  rtnetlink_rcv_msg+0x184/0x540
[   59.394057]  ? lock_acquire+0xe2/0x2e0
[   59.397830]  ? rtnl_stats_set+0x190/0x190
[   59.401863]  netlink_rcv_skb+0x51/0xf0
[   59.405639]  netlink_unicast+0x189/0x260
[   59.409586]  netlink_sendmsg+0x25a/0x4c0
[   59.413536]  sock_sendmsg+0x5c/0x60
[   59.417045]  ____sys_sendmsg+0x22b/0x270
[   59.420991]  ? import_iovec+0x17/0x20
[   59.424675]  ? sendmsg_copy_msghdr+0x78/0xa0
[   59.428972]  ___sys_sendmsg+0x85/0xc0
[   59.432658]  ? lock_is_held_type+0xe3/0x140
[   59.436867]  ? find_held_lock+0x2b/0x80
[   59.440727]  ? lock_release+0x145/0x300
[   59.444586]  ? __fget_files+0xe5/0x170
[   59.448360]  __sys_sendmsg+0x5c/0xb0
[   59.451961]  do_syscall_64+0x5b/0x80
[   59.455558]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[   59.460641] RIP: 0033:0x7f164628539d
[   59.464251] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 0a
b1 f7 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 5e b1 f7
ff 48
[   59.483133] RSP: 002b:00007ffd9bf01520 EFLAGS: 00000293 ORIG_RAX:
000000000000002e
[   59.490749] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f164628539d
[   59.497925] RDX: 0000000000000000 RSI: 00007ffd9bf01560 RDI: 000000000000000c
[   59.505100] RBP: 00005575f2915040 R08: 0000000000000000 R09: 0000000000000000
[   59.512275] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[   59.519453] R13: 00007ffd9bf016c0 R14: 00007ffd9bf016bc R15: 0000000000000000
[   59.526637]  </TASK>
[   59.528834] Modules linked in: rfkill sunrpc intel_rapl_msr
intel_rapl_common vfat fat qede qed edac_mce_amd i2c_piix4 crc8 rapl
igb ipmi_ssif ptdma ses enclosure pcspkr dca hpilo k10temp acpi_ipmi
acpi_tad ipmi_si acpi_power_meter fuse zram xfs crct10dif_pclmul
crc32_pclmul crc32c_intel mgag200 i2c_algo_bit drm_shmem_helper
drm_kms_helper ghash_clmulni_intel drm hpwdt smartpqi ccp
scsi_transport_sas sp5100_tco wmi ipmi_devintf ipmi_msghandler
[   59.568459] ---[ end trace 0000000000000000 ]---
[   59.632952] RIP: 0010:qede_load.cold+0x5a1/0x819 [qede]
[   59.632967] Code: 41 88 84 24 b1 00 00 00 41 0f b7 84 24 b6 00 00
00 45 88 b4 24 b3 00 00 00 e9 b8 00 ff ff 48 c7 c1 09 66 46 c1 e9 6f
ff ff ff <0f> 0b 49 8b 7c 24 08 e8 82 e2 fe ff 48 89 c1 48 85 c0 74 32
ba 2a
[   59.632970] RSP: 0018:ffffae1e04593688 EFLAGS: 00010206
[   59.632972] RAX: 000000000000006b RBX: 0000000000000000 RCX: 0000000000000006
[   59.632974] RDX: ffff8f8f35332be8 RSI: ffffffffaf96411f RDI: ffffffffaf8e4b1e
[   59.632977] RBP: ffff8f8f2a87acd0 R08: 0000000000000001 R09: 0000000000000001
[   59.632978] R10: 0000000000000000 R11: 000000000f8c087f R12: ffff8f8f2a87ac00
[   59.632980] R13: ffff8f8f34d7f928 R14: ffffae1e0c039000 R15: 0000000000000000
[   59.632982] FS:  00007f164509f500(0000) GS:ffff8f9dfd800000(0000)
knlGS:0000000000000000
[   59.632984] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   59.632986] CR2: 00005575f29a5c08 CR3: 0000000163810000 CR4: 0000000000350ee0
[   59.632989] Kernel panic - not syncing: Fatal exception
[   59.732905] Kernel Offset: 0x2d000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   59.807803] ---[ end Kernel panic - not syncing: Fatal exception ]---


cki issue tracker: https://datawarehouse.cki-project.org/issue/1470

[1] https://datawarehouse.cki-project.org/kcidb/tests/4002370
[2] http://s3.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/06/20/568171088/redhat:568171088/redhat:568171088_x86_64_debug/.config

Thanks,
Bruno Goncalves


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0
  2022-08-02 11:27 RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0 Bruno Goncalves
@ 2022-08-02 19:23 ` Jakub Kicinski
  2022-08-03 12:13   ` Bruno Goncalves
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2022-08-02 19:23 UTC (permalink / raw)
  To: Bruno Goncalves; +Cc: LKML, Networking, CKI Project

On Tue, 2 Aug 2022 13:27:32 +0200 Bruno Goncalves wrote:
> Hello,
> 
> We've noticed the following panic when booting up kernel 5.19.0 on a
> specific machine.
> The panic seems to happen when we build the kernel with debug flags.
> Below is the first crash we noticed, more logs at [1] and the kernel
> config is at [2].
> 
> [   59.207684] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [   59.212949] CPU: 32 PID: 1967 Comm: NetworkManager Not tainted 5.19.0-rc3 #1
> [   59.220041] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant
> DL325 Gen10 Plus, BIOS A43 08/09/2021
> [   59.229490] RIP: 0010:qede_load.cold+0x5a1/0x819 [qede]

Is it this warning?

   WARN_ON(xdp_rxq_info_reg(&fp->rxq->xdp_rxq, edev->ndev,

Would you be able to run the stacktrace thru
scripts/decode_stacktrace.sh ?

> [   59.234757] Code: 41 88 84 24 b1 00 00 00 41 0f b7 84 24 b6 00 00
> 00 45 88 b4 24 b3 00 00 00 e9 b8 00 ff ff 48 c7 c1 09 66 46 c1 e9 6f
> ff ff ff <0f> 0b 49 8b 7c 24 08 e8 82 e2 fe ff 48 89 c1 48 85 c0 74 32
> ba 2a
> [   59.253639] RSP: 0018:ffffae1e04593688 EFLAGS: 00010206
> [   59.258897] RAX: 000000000000006b RBX: 0000000000000000 RCX: 0000000000000006
> [   59.266073] RDX: ffff8f8f35332be8 RSI: ffffffffaf96411f RDI: ffffffffaf8e4b1e
> [   59.273250] RBP: ffff8f8f2a87acd0 R08: 0000000000000001 R09: 0000000000000001
> [   59.280426] R10: 0000000000000000 R11: 000000000f8c087f R12: ffff8f8f2a87ac00
> [   59.287602] R13: ffff8f8f34d7f928 R14: ffffae1e0c039000 R15: 0000000000000000
> [   59.294777] FS:  00007f164509f500(0000) GS:ffff8f9dfd800000(0000)
> knlGS:0000000000000000
> [   59.302917] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   59.308697] CR2: 00005575f29a5c08 CR3: 0000000163810000 CR4: 0000000000350ee0
> [   59.315875] Call Trace:
> [   59.318335]  <TASK>
> [   59.320458]  qede_open+0x3b/0x90 [qede]
> [   59.324323]  __dev_open+0xf1/0x1c0
> [   59.327748]  __dev_change_flags+0x1f8/0x280
> [   59.331957]  dev_change_flags+0x22/0x60
> [   59.335816]  do_setlink+0x327/0x1140
> [   59.339413]  ? lock_is_held_type+0xe3/0x140
> [   59.343625]  ? lock_is_held_type+0xe3/0x140
> [   59.347833]  ? __nla_validate_parse+0x5f/0xb70
> [   59.352307]  ? mark_held_locks+0x49/0x70
> [   59.356256]  ? _raw_spin_unlock_irqrestore+0x30/0x60
> [   59.361254]  ? lockdep_hardirqs_on+0x7d/0x100
> [   59.365640]  __rtnl_newlink+0x59c/0x950
> [   59.369502]  ? rtnl_newlink+0x2a/0x60
> [   59.373185]  ? rcu_read_lock_sched_held+0x3c/0x70
> [   59.377918]  ? trace_kmalloc+0x30/0xf0
> [   59.381692]  ? kmem_cache_alloc_trace+0x1ad/0x270
> [   59.386426]  rtnl_newlink+0x43/0x60
> [   59.389936]  rtnetlink_rcv_msg+0x184/0x540
> [   59.394057]  ? lock_acquire+0xe2/0x2e0
> [   59.397830]  ? rtnl_stats_set+0x190/0x190
> [   59.401863]  netlink_rcv_skb+0x51/0xf0
> [   59.405639]  netlink_unicast+0x189/0x260
> [   59.409586]  netlink_sendmsg+0x25a/0x4c0
> [   59.413536]  sock_sendmsg+0x5c/0x60
> [   59.417045]  ____sys_sendmsg+0x22b/0x270
> [   59.420991]  ? import_iovec+0x17/0x20
> [   59.424675]  ? sendmsg_copy_msghdr+0x78/0xa0
> [   59.428972]  ___sys_sendmsg+0x85/0xc0
> [   59.432658]  ? lock_is_held_type+0xe3/0x140
> [   59.436867]  ? find_held_lock+0x2b/0x80
> [   59.440727]  ? lock_release+0x145/0x300
> [   59.444586]  ? __fget_files+0xe5/0x170
> [   59.448360]  __sys_sendmsg+0x5c/0xb0
> [   59.451961]  do_syscall_64+0x5b/0x80
> [   59.455558]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> [   59.460641] RIP: 0033:0x7f164628539d
> [   59.464251] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 0a
> b1 f7 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 5e b1 f7
> ff 48
> [   59.483133] RSP: 002b:00007ffd9bf01520 EFLAGS: 00000293 ORIG_RAX:
> 000000000000002e
> [   59.490749] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f164628539d
> [   59.497925] RDX: 0000000000000000 RSI: 00007ffd9bf01560 RDI: 000000000000000c
> [   59.505100] RBP: 00005575f2915040 R08: 0000000000000000 R09: 0000000000000000
> [   59.512275] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
> [   59.519453] R13: 00007ffd9bf016c0 R14: 00007ffd9bf016bc R15: 0000000000000000
> [   59.526637]  </TASK>
> [   59.528834] Modules linked in: rfkill sunrpc intel_rapl_msr
> intel_rapl_common vfat fat qede qed edac_mce_amd i2c_piix4 crc8 rapl
> igb ipmi_ssif ptdma ses enclosure pcspkr dca hpilo k10temp acpi_ipmi
> acpi_tad ipmi_si acpi_power_meter fuse zram xfs crct10dif_pclmul
> crc32_pclmul crc32c_intel mgag200 i2c_algo_bit drm_shmem_helper
> drm_kms_helper ghash_clmulni_intel drm hpwdt smartpqi ccp
> scsi_transport_sas sp5100_tco wmi ipmi_devintf ipmi_msghandler
> [   59.568459] ---[ end trace 0000000000000000 ]---
> [   59.632952] RIP: 0010:qede_load.cold+0x5a1/0x819 [qede]
> [   59.632967] Code: 41 88 84 24 b1 00 00 00 41 0f b7 84 24 b6 00 00
> 00 45 88 b4 24 b3 00 00 00 e9 b8 00 ff ff 48 c7 c1 09 66 46 c1 e9 6f
> ff ff ff <0f> 0b 49 8b 7c 24 08 e8 82 e2 fe ff 48 89 c1 48 85 c0 74 32
> ba 2a
> [   59.632970] RSP: 0018:ffffae1e04593688 EFLAGS: 00010206
> [   59.632972] RAX: 000000000000006b RBX: 0000000000000000 RCX: 0000000000000006
> [   59.632974] RDX: ffff8f8f35332be8 RSI: ffffffffaf96411f RDI: ffffffffaf8e4b1e
> [   59.632977] RBP: ffff8f8f2a87acd0 R08: 0000000000000001 R09: 0000000000000001
> [   59.632978] R10: 0000000000000000 R11: 000000000f8c087f R12: ffff8f8f2a87ac00
> [   59.632980] R13: ffff8f8f34d7f928 R14: ffffae1e0c039000 R15: 0000000000000000
> [   59.632982] FS:  00007f164509f500(0000) GS:ffff8f9dfd800000(0000)
> knlGS:0000000000000000
> [   59.632984] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   59.632986] CR2: 00005575f29a5c08 CR3: 0000000163810000 CR4: 0000000000350ee0
> [   59.632989] Kernel panic - not syncing: Fatal exception
> [   59.732905] Kernel Offset: 0x2d000000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [   59.807803] ---[ end Kernel panic - not syncing: Fatal exception ]---
> 
> 
> cki issue tracker: https://datawarehouse.cki-project.org/issue/1470
> 
> [1] https://datawarehouse.cki-project.org/kcidb/tests/4002370
> [2] http://s3.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/06/20/568171088/redhat:568171088/redhat:568171088_x86_64_debug/.config
> 
> Thanks,
> Bruno Goncalves
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0
  2022-08-02 19:23 ` Jakub Kicinski
@ 2022-08-03 12:13   ` Bruno Goncalves
  2022-08-03 15:37     ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Bruno Goncalves @ 2022-08-03 12:13 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: LKML, Networking, CKI Project

On Tue, 2 Aug 2022 at 21:24, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 2 Aug 2022 13:27:32 +0200 Bruno Goncalves wrote:
> > Hello,
> >
> > We've noticed the following panic when booting up kernel 5.19.0 on a
> > specific machine.
> > The panic seems to happen when we build the kernel with debug flags.
> > Below is the first crash we noticed, more logs at [1] and the kernel
> > config is at [2].
> >
> > [   59.207684] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > [   59.212949] CPU: 32 PID: 1967 Comm: NetworkManager Not tainted 5.19.0-rc3 #1
> > [   59.220041] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant
> > DL325 Gen10 Plus, BIOS A43 08/09/2021
> > [   59.229490] RIP: 0010:qede_load.cold+0x5a1/0x819 [qede]
>
> Is it this warning?
>
>    WARN_ON(xdp_rxq_info_reg(&fp->rxq->xdp_rxq, edev->ndev,
>
> Would you be able to run the stacktrace thru
> scripts/decode_stacktrace.sh ?

Got this from the most recent failure (kernel built using commit 0805c6fb39f6):

the tarball is https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/603714145/build%20x86_64%20debug/2807738987/artifacts/kernel-mainline.kernel.org-redhat_603714145_x86_64_debug.tar.gz
and the call trace from
https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/08/02/redhat:603123526/build_x86_64_redhat:603123526_x86_64_debug/tests/1/results_0001/console.log/console.log

[   69.876513] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[   69.888521] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant
DL325 Gen10 Plus, BIOS A43 08/09/2021
[   69.897971] RIP: 0010:qede_load.cold
(/builds/2807738987/workdir/./include/linux/spinlock.h:389
/builds/2807738987/workdir/./include/linux/netdevice.h:4294
/builds/2807738987/workdir/./include/linux/netdevice.h:4385
/builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2594
/builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2575)
qede
[ 69.903242] Code: 41 88 84 24 b1 00 00 00 41 0f b7 84 24 b6 00 00 00
45 88 b4 24 b3 00 00 00 e9 12 ff fe ff 48 c7 c1 09 c6 d7 c0 e9 6f ff
ff ff <0f> 0b 49 8b 7c 24 08 e8 8c e0 fe ff 48 89 c1 48 85 c0 74 32 ba
2a
All code
========
   0:    41 88 84 24 b1 00 00     mov    %al,0xb1(%r12)
   7:    00
   8:    41 0f b7 84 24 b6 00     movzwl 0xb6(%r12),%eax
   f:    00 00
  11:    45 88 b4 24 b3 00 00     mov    %r14b,0xb3(%r12)
  18:    00
  19:    e9 12 ff fe ff           jmpq   0xfffffffffffeff30
  1e:    48 c7 c1 09 c6 d7 c0     mov    $0xffffffffc0d7c609,%rcx
  25:    e9 6f ff ff ff           jmpq   0xffffffffffffff99
  2a:*    0f 0b                    ud2            <-- trapping instruction
  2c:    49 8b 7c 24 08           mov    0x8(%r12),%rdi
  31:    e8 8c e0 fe ff           callq  0xfffffffffffee0c2
  36:    48 89 c1                 mov    %rax,%rcx
  39:    48 85 c0                 test   %rax,%rax
  3c:    74 32                    je     0x70
  3e:    ba                       .byte 0xba
  3f:    2a                       .byte 0x2a

Code starting with the faulting instruction
===========================================
   0:    0f 0b                    ud2
   2:    49 8b 7c 24 08           mov    0x8(%r12),%rdi
   7:    e8 8c e0 fe ff           callq  0xfffffffffffee098
   c:    48 89 c1                 mov    %rax,%rcx
   f:    48 85 c0                 test   %rax,%rax
  12:    74 32                    je     0x46
  14:    ba                       .byte 0xba
  15:    2a                       .byte 0x2a
[   69.922125] RSP: 0018:ffffac3c848a3658 EFLAGS: 00010206
[   69.927385] RAX: 000000000000006b RBX: 0000000000000000 RCX: 0000000000000006
[   69.934562] RDX: ffff94a3f3eabba8 RSI: ffffffff9e9578a7 RDI: ffffffff9e8d8176
[   69.941738] RBP: ffff94a3ee75acd0 R08: 0000000000000001 R09: 0000000000000001
[   69.948914] R10: 0000000000000000 R11: 00000000f6665eaf R12: ffff94a3ee75ac00
[   69.956089] R13: ffff94a3f31bb928 R14: ffffac3ca31dd000 R15: 0000000000000000
[   69.963265] FS:  00007f623da2c500(0000) GS:ffff94b2bc240000(0000)
knlGS:0000000000000000
[   69.971405] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   69.977183] CR2: 000056491c907688 CR3: 000000015ebe2000 CR4: 0000000000350ee0
[   69.984361] Call Trace:
[   69.986820]  <TASK>
[   69.988950] qede_open
(/builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2552)
qede
[   69.992817] __dev_open (/builds/2807738987/workdir/net/core/dev.c:1434)
[   69.996247] __dev_change_flags
(/builds/2807738987/workdir/net/core/dev.c:8537)
[   70.000459] dev_change_flags (/builds/2807738987/workdir/net/core/dev.c:8608)
[   70.004318] do_setlink (/builds/2807738987/workdir/net/core/rtnetlink.c:2780)
[   70.007916] ? lock_is_held_type
(/builds/2807738987/workdir/kernel/locking/lockdep.c:466
/builds/2807738987/workdir/kernel/locking/lockdep.c:5710)
[   70.012131] ? lock_is_held_type
(/builds/2807738987/workdir/kernel/locking/lockdep.c:466
/builds/2807738987/workdir/kernel/locking/lockdep.c:5710)
[   70.016342] ? __nla_validate_parse
(/builds/2807738987/workdir/./include/net/netlink.h:1159
(discriminator 2) /builds/2807738987/workdir/lib/nlattr.c:576
(discriminator 2))
[   70.020816] ? mark_held_locks
(/builds/2807738987/workdir/kernel/locking/lockdep.c:4234)
[   70.024767] ? _raw_spin_unlock_irqrestore
(/builds/2807738987/workdir/./arch/x86/include/asm/paravirt.h:704
/builds/2807738987/workdir/./arch/x86/include/asm/irqflags.h:138
/builds/2807738987/workdir/./include/linux/spinlock_api_smp.h:151
/builds/2807738987/workdir/kernel/locking/spinlock.c:194)
[   70.029763] ? lockdep_hardirqs_on
(/builds/2807738987/workdir/kernel/locking/lockdep.c:4383)
[   70.034152] __rtnl_newlink
(/builds/2807738987/workdir/net/core/rtnetlink.c:3546)
[   70.038020] ? rtnl_newlink
(/builds/2807738987/workdir/net/core/rtnetlink.c:3590)
[   70.041702] ? rcu_read_lock_sched_held
(/builds/2807738987/workdir/kernel/rcu/update.c:125
/builds/2807738987/workdir/kernel/rcu/update.c:119)
[   70.046437] ? trace_kmalloc
(/builds/2807738987/workdir/./include/trace/events/kmem.h:52
/builds/2807738987/workdir/./include/trace/events/kmem.h:52)
[   70.050297] ? kmem_cache_alloc_trace
(/builds/2807738987/workdir/mm/slub.c:3286)
[   70.055035] rtnl_newlink
(/builds/2807738987/workdir/net/core/rtnetlink.c:3594)
[   70.058544] rtnetlink_rcv_msg
(/builds/2807738987/workdir/net/core/rtnetlink.c:6089)
[   70.062667] ? lock_acquire
(/builds/2807738987/workdir/kernel/locking/lockdep.c:466
/builds/2807738987/workdir/kernel/locking/lockdep.c:5668
/builds/2807738987/workdir/kernel/locking/lockdep.c:5631)
[   70.066442] ? rtnl_stats_set
(/builds/2807738987/workdir/net/core/rtnetlink.c:5986)
[   70.070478] netlink_rcv_skb
(/builds/2807738987/workdir/net/netlink/af_netlink.c:2501)
[   70.074345] netlink_unicast
(/builds/2807738987/workdir/net/netlink/af_netlink.c:1320
/builds/2807738987/workdir/net/netlink/af_netlink.c:1345)
[   70.078295] netlink_sendmsg
(/builds/2807738987/workdir/net/netlink/af_netlink.c:1921)
[   70.082249] sock_sendmsg
(/builds/2807738987/workdir/net/socket.c:714
/builds/2807738987/workdir/net/socket.c:734)
[   70.085760] ____sys_sendmsg (/builds/2807738987/workdir/net/socket.c:2488)
[   70.089705] ? import_iovec (/builds/2807738987/workdir/lib/iov_iter.c:2001)
[   70.093389] ? sendmsg_copy_msghdr
(/builds/2807738987/workdir/net/socket.c:2429
/builds/2807738987/workdir/net/socket.c:2519)
[   70.097689] ___sys_sendmsg (/builds/2807738987/workdir/net/socket.c:2544)
[   70.101378] ? lock_is_held_type
(/builds/2807738987/workdir/kernel/locking/lockdep.c:466
/builds/2807738987/workdir/kernel/locking/lockdep.c:5710)
[   70.105588] ? find_held_lock
(/builds/2807738987/workdir/kernel/locking/lockdep.c:5156)
[   70.109451] ? lock_release
(/builds/2807738987/workdir/kernel/locking/lockdep.c:466
/builds/2807738987/workdir/kernel/locking/lockdep.c:5688)
[   70.113313] ? __fget_files (/builds/2807738987/workdir/fs/file.c:917)
[   70.117089] __sys_sendmsg (/builds/2807738987/workdir/net/socket.c:2571)
[   70.120692] do_syscall_64
(/builds/2807738987/workdir/arch/x86/entry/common.c:50
/builds/2807738987/workdir/arch/x86/entry/common.c:80)
[   70.124294] entry_SYSCALL_64_after_hwframe
(/builds/2807206727/workdir/arch/x86/entry/entry_64.S:120)
[   70.129378] RIP: 0033:0x7f623e42e71d
[ 70.132988] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 2a 9b
f7 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00
0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 7e 9b f7 ff
48
All code
========
   0:    28 89 54 24 1c 48        sub    %cl,0x481c2454(%rcx)
   6:    89 74 24 10              mov    %esi,0x10(%rsp)
   a:    89 7c 24 08              mov    %edi,0x8(%rsp)
   e:    e8 2a 9b f7 ff           callq  0xfffffffffff79b3d
  13:    8b 54 24 1c              mov    0x1c(%rsp),%edx
  17:    48 8b 74 24 10           mov    0x10(%rsp),%rsi
  1c:    41 89 c0                 mov    %eax,%r8d
  1f:    8b 7c 24 08              mov    0x8(%rsp),%edi
  23:    b8 2e 00 00 00           mov    $0x2e,%eax
  28:    0f 05                    syscall
  2a:*    48 3d 00 f0 ff ff        cmp    $0xfffffffffffff000,%rax
   <-- trapping instruction
  30:    77 33                    ja     0x65
  32:    44 89 c7                 mov    %r8d,%edi
  35:    48 89 44 24 08           mov    %rax,0x8(%rsp)
  3a:    e8 7e 9b f7 ff           callq  0xfffffffffff79bbd
  3f:    48                       rex.W

Code starting with the faulting instruction
===========================================
   0:    48 3d 00 f0 ff ff        cmp    $0xfffffffffffff000,%rax
   6:    77 33                    ja     0x3b
   8:    44 89 c7                 mov    %r8d,%edi
   b:    48 89 44 24 08           mov    %rax,0x8(%rsp)
  10:    e8 7e 9b f7 ff           callq  0xfffffffffff79b93
  15:    48                       rex.W
[   70.151871] RSP: 002b:00007fff0ccddd70 EFLAGS: 00000293 ORIG_RAX:
000000000000002e
[   70.159488] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f623e42e71d
[   70.166665] RDX: 0000000000000000 RSI: 00007fff0ccdddb0 RDI: 000000000000000d
[   70.173842] RBP: 0000564bf49cb090 R08: 0000000000000000 R09: 0000000000000000
[   70.181017] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000000b
[   70.188192] R13: 00007fff0ccddf20 R14: 00007fff0ccddf1c R15: 0000000000000000
[   70.195379]  </TASK>
[   70.197575] Modules linked in: acpi_cpufreq(-) rfkill sunrpc qede
vfat fat intel_rapl_msr intel_rapl_common qed ipmi_ssif crc8
edac_mce_amd k10temp pcspkr rapl ptdma acpi_ipmi ses igb hpilo
enclosure ipmi_si ipmi_devintf dca i2c_piix4 ipmi_msghandler acpi_tad
acpi_power_meter fuse zram xfs mgag200 crct10dif_pclmul i2c_algo_bit
crc32_pclmul drm_shmem_helper crc32c_intel drm_kms_helper
ghash_clmulni_intel drm hpwdt ccp smartpqi scsi_transport_sas
sp5100_tco wmi
[   70.238596] ---[ end trace 0000000000000000 ]---
[   70.310657] RIP: 0010:qede_load.cold
(/builds/2807738987/workdir/./include/linux/spinlock.h:389
/builds/2807738987/workdir/./include/linux/netdevice.h:4294
/builds/2807738987/workdir/./include/linux/netdevice.h:4385
/builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2594
/builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2575)
qede
[ 70.316130] Code: 41 88 84 24 b1 00 00 00 41 0f b7 84 24 b6 00 00 00
45 88 b4 24 b3 00 00 00 e9 12 ff fe ff 48 c7 c1 09 c6 d7 c0 e9 6f ff
ff ff <0f> 0b 49 8b 7c 24 08 e8 8c e0 fe ff 48 89 c1 48 85 c0 74 32 ba
2a
All code
========
   0:    41 88 84 24 b1 00 00     mov    %al,0xb1(%r12)
   7:    00
   8:    41 0f b7 84 24 b6 00     movzwl 0xb6(%r12),%eax
   f:    00 00
  11:    45 88 b4 24 b3 00 00     mov    %r14b,0xb3(%r12)
  18:    00
  19:    e9 12 ff fe ff           jmpq   0xfffffffffffeff30
  1e:    48 c7 c1 09 c6 d7 c0     mov    $0xffffffffc0d7c609,%rcx
  25:    e9 6f ff ff ff           jmpq   0xffffffffffffff99
  2a:*    0f 0b                    ud2            <-- trapping instruction
  2c:    49 8b 7c 24 08           mov    0x8(%r12),%rdi
  31:    e8 8c e0 fe ff           callq  0xfffffffffffee0c2
  36:    48 89 c1                 mov    %rax,%rcx
  39:    48 85 c0                 test   %rax,%rax
  3c:    74 32                    je     0x70
  3e:    ba                       .byte 0xba
  3f:    2a                       .byte 0x2a

Code starting with the faulting instruction
===========================================
   0:    0f 0b                    ud2
   2:    49 8b 7c 24 08           mov    0x8(%r12),%rdi
   7:    e8 8c e0 fe ff           callq  0xfffffffffffee098
   c:    48 89 c1                 mov    %rax,%rcx
   f:    48 85 c0                 test   %rax,%rax
  12:    74 32                    je     0x46
  14:    ba                       .byte 0xba
  15:    2a                       .byte 0x2a
[   70.335057] RSP: 0018:ffffac3c848a3658 EFLAGS: 00010206
[   70.340332] RAX: 000000000000006b RBX: 0000000000000000 RCX: 0000000000000006
[   70.347554] RDX: ffff94a3f3eabba8 RSI: ffffffff9e9578a7 RDI: ffffffff9e8d8176
[   70.354747] RBP: ffff94a3ee75acd0 R08: 0000000000000001 R09: 0000000000000001
[   70.361968] R10: 0000000000000000 R11: 00000000f6665eaf R12: ffff94a3ee75ac00
[   70.369160] R13: ffff94a3f31bb928 R14: ffffac3ca31dd000 R15: 0000000000000000
[   70.376385] FS:  00007f623da2c500(0000) GS:ffff94b2bc240000(0000)
knlGS:0000000000000000
[   70.384543] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   70.390336] CR2: 000056491c907688 CR3: 000000015ebe2000 CR4: 0000000000350ee0
[   70.397531] Kernel panic - not syncing: Fatal exception
[   70.406430] Kernel Offset: 0x1c000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   70.484036] ---[ end Kernel panic - not syncing: Fatal exception ]---

Thanks,
Bruno
>
> > [   59.234757] Code: 41 88 84 24 b1 00 00 00 41 0f b7 84 24 b6 00 00
> > 00 45 88 b4 24 b3 00 00 00 e9 b8 00 ff ff 48 c7 c1 09 66 46 c1 e9 6f
> > ff ff ff <0f> 0b 49 8b 7c 24 08 e8 82 e2 fe ff 48 89 c1 48 85 c0 74 32
> > ba 2a
> > [   59.253639] RSP: 0018:ffffae1e04593688 EFLAGS: 00010206
> > [   59.258897] RAX: 000000000000006b RBX: 0000000000000000 RCX: 0000000000000006
> > [   59.266073] RDX: ffff8f8f35332be8 RSI: ffffffffaf96411f RDI: ffffffffaf8e4b1e
> > [   59.273250] RBP: ffff8f8f2a87acd0 R08: 0000000000000001 R09: 0000000000000001
> > [   59.280426] R10: 0000000000000000 R11: 000000000f8c087f R12: ffff8f8f2a87ac00
> > [   59.287602] R13: ffff8f8f34d7f928 R14: ffffae1e0c039000 R15: 0000000000000000
> > [   59.294777] FS:  00007f164509f500(0000) GS:ffff8f9dfd800000(0000)
> > knlGS:0000000000000000
> > [   59.302917] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   59.308697] CR2: 00005575f29a5c08 CR3: 0000000163810000 CR4: 0000000000350ee0
> > [   59.315875] Call Trace:
> > [   59.318335]  <TASK>
> > [   59.320458]  qede_open+0x3b/0x90 [qede]
> > [   59.324323]  __dev_open+0xf1/0x1c0
> > [   59.327748]  __dev_change_flags+0x1f8/0x280
> > [   59.331957]  dev_change_flags+0x22/0x60
> > [   59.335816]  do_setlink+0x327/0x1140
> > [   59.339413]  ? lock_is_held_type+0xe3/0x140
> > [   59.343625]  ? lock_is_held_type+0xe3/0x140
> > [   59.347833]  ? __nla_validate_parse+0x5f/0xb70
> > [   59.352307]  ? mark_held_locks+0x49/0x70
> > [   59.356256]  ? _raw_spin_unlock_irqrestore+0x30/0x60
> > [   59.361254]  ? lockdep_hardirqs_on+0x7d/0x100
> > [   59.365640]  __rtnl_newlink+0x59c/0x950
> > [   59.369502]  ? rtnl_newlink+0x2a/0x60
> > [   59.373185]  ? rcu_read_lock_sched_held+0x3c/0x70
> > [   59.377918]  ? trace_kmalloc+0x30/0xf0
> > [   59.381692]  ? kmem_cache_alloc_trace+0x1ad/0x270
> > [   59.386426]  rtnl_newlink+0x43/0x60
> > [   59.389936]  rtnetlink_rcv_msg+0x184/0x540
> > [   59.394057]  ? lock_acquire+0xe2/0x2e0
> > [   59.397830]  ? rtnl_stats_set+0x190/0x190
> > [   59.401863]  netlink_rcv_skb+0x51/0xf0
> > [   59.405639]  netlink_unicast+0x189/0x260
> > [   59.409586]  netlink_sendmsg+0x25a/0x4c0
> > [   59.413536]  sock_sendmsg+0x5c/0x60
> > [   59.417045]  ____sys_sendmsg+0x22b/0x270
> > [   59.420991]  ? import_iovec+0x17/0x20
> > [   59.424675]  ? sendmsg_copy_msghdr+0x78/0xa0
> > [   59.428972]  ___sys_sendmsg+0x85/0xc0
> > [   59.432658]  ? lock_is_held_type+0xe3/0x140
> > [   59.436867]  ? find_held_lock+0x2b/0x80
> > [   59.440727]  ? lock_release+0x145/0x300
> > [   59.444586]  ? __fget_files+0xe5/0x170
> > [   59.448360]  __sys_sendmsg+0x5c/0xb0
> > [   59.451961]  do_syscall_64+0x5b/0x80
> > [   59.455558]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > [   59.460641] RIP: 0033:0x7f164628539d
> > [   59.464251] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 0a
> > b1 f7 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00
> > 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 5e b1 f7
> > ff 48
> > [   59.483133] RSP: 002b:00007ffd9bf01520 EFLAGS: 00000293 ORIG_RAX:
> > 000000000000002e
> > [   59.490749] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f164628539d
> > [   59.497925] RDX: 0000000000000000 RSI: 00007ffd9bf01560 RDI: 000000000000000c
> > [   59.505100] RBP: 00005575f2915040 R08: 0000000000000000 R09: 0000000000000000
> > [   59.512275] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
> > [   59.519453] R13: 00007ffd9bf016c0 R14: 00007ffd9bf016bc R15: 0000000000000000
> > [   59.526637]  </TASK>
> > [   59.528834] Modules linked in: rfkill sunrpc intel_rapl_msr
> > intel_rapl_common vfat fat qede qed edac_mce_amd i2c_piix4 crc8 rapl
> > igb ipmi_ssif ptdma ses enclosure pcspkr dca hpilo k10temp acpi_ipmi
> > acpi_tad ipmi_si acpi_power_meter fuse zram xfs crct10dif_pclmul
> > crc32_pclmul crc32c_intel mgag200 i2c_algo_bit drm_shmem_helper
> > drm_kms_helper ghash_clmulni_intel drm hpwdt smartpqi ccp
> > scsi_transport_sas sp5100_tco wmi ipmi_devintf ipmi_msghandler
> > [   59.568459] ---[ end trace 0000000000000000 ]---
> > [   59.632952] RIP: 0010:qede_load.cold+0x5a1/0x819 [qede]
> > [   59.632967] Code: 41 88 84 24 b1 00 00 00 41 0f b7 84 24 b6 00 00
> > 00 45 88 b4 24 b3 00 00 00 e9 b8 00 ff ff 48 c7 c1 09 66 46 c1 e9 6f
> > ff ff ff <0f> 0b 49 8b 7c 24 08 e8 82 e2 fe ff 48 89 c1 48 85 c0 74 32
> > ba 2a
> > [   59.632970] RSP: 0018:ffffae1e04593688 EFLAGS: 00010206
> > [   59.632972] RAX: 000000000000006b RBX: 0000000000000000 RCX: 0000000000000006
> > [   59.632974] RDX: ffff8f8f35332be8 RSI: ffffffffaf96411f RDI: ffffffffaf8e4b1e
> > [   59.632977] RBP: ffff8f8f2a87acd0 R08: 0000000000000001 R09: 0000000000000001
> > [   59.632978] R10: 0000000000000000 R11: 000000000f8c087f R12: ffff8f8f2a87ac00
> > [   59.632980] R13: ffff8f8f34d7f928 R14: ffffae1e0c039000 R15: 0000000000000000
> > [   59.632982] FS:  00007f164509f500(0000) GS:ffff8f9dfd800000(0000)
> > knlGS:0000000000000000
> > [   59.632984] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   59.632986] CR2: 00005575f29a5c08 CR3: 0000000163810000 CR4: 0000000000350ee0
> > [   59.632989] Kernel panic - not syncing: Fatal exception
> > [   59.732905] Kernel Offset: 0x2d000000 from 0xffffffff81000000
> > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [   59.807803] ---[ end Kernel panic - not syncing: Fatal exception ]---
> >
> >
> > cki issue tracker: https://datawarehouse.cki-project.org/issue/1470
> >
> > [1] https://datawarehouse.cki-project.org/kcidb/tests/4002370
> > [2] http://s3.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/06/20/568171088/redhat:568171088/redhat:568171088_x86_64_debug/.config
> >
> > Thanks,
> > Bruno Goncalves
> >
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0
  2022-08-03 12:13   ` Bruno Goncalves
@ 2022-08-03 15:37     ` Jakub Kicinski
  2022-08-18  7:22       ` Bruno Goncalves
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2022-08-03 15:37 UTC (permalink / raw)
  To: Bruno Goncalves; +Cc: LKML, Networking, CKI Project

On Wed, 3 Aug 2022 14:13:00 +0200 Bruno Goncalves wrote:
> Got this from the most recent failure (kernel built using commit 0805c6fb39f6):
> 
> the tarball is https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/603714145/build%20x86_64%20debug/2807738987/artifacts/kernel-mainline.kernel.org-redhat_603714145_x86_64_debug.tar.gz
> and the call trace from
> https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/08/02/redhat:603123526/build_x86_64_redhat:603123526_x86_64_debug/tests/1/results_0001/console.log/console.log
> 
> [   69.876513] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [   69.888521] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant
> DL325 Gen10 Plus, BIOS A43 08/09/2021
> [   69.897971] RIP: 0010:qede_load.cold
> (/builds/2807738987/workdir/./include/linux/spinlock.h:389
> /builds/2807738987/workdir/./include/linux/netdevice.h:4294
> /builds/2807738987/workdir/./include/linux/netdevice.h:4385
> /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2594
> /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2575)

Thanks a lot! That seems to point the finger at commit 3aa6bce9af0e
("net: watchdog: hold device global xmit lock during tx disable") but
frankly IDK why... The driver must be fully initialized to get to
ndo_open() so how is the tx_global_lock busted?!

Would you be able to re-run with CONFIG_KASAN=y ?
Perhaps KASAN can tell us what's messing up the lock.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0
  2022-08-03 15:37     ` Jakub Kicinski
@ 2022-08-18  7:22       ` Bruno Goncalves
  2022-08-18 15:51         ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Bruno Goncalves @ 2022-08-18  7:22 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: LKML, Networking, CKI Project

On Wed, 3 Aug 2022 at 17:37, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 3 Aug 2022 14:13:00 +0200 Bruno Goncalves wrote:
> > Got this from the most recent failure (kernel built using commit 0805c6fb39f6):
> >
> > the tarball is https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/603714145/build%20x86_64%20debug/2807738987/artifacts/kernel-mainline.kernel.org-redhat_603714145_x86_64_debug.tar.gz
> > and the call trace from
> > https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/08/02/redhat:603123526/build_x86_64_redhat:603123526_x86_64_debug/tests/1/results_0001/console.log/console.log
> >
> > [   69.876513] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > [   69.888521] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant
> > DL325 Gen10 Plus, BIOS A43 08/09/2021
> > [   69.897971] RIP: 0010:qede_load.cold
> > (/builds/2807738987/workdir/./include/linux/spinlock.h:389
> > /builds/2807738987/workdir/./include/linux/netdevice.h:4294
> > /builds/2807738987/workdir/./include/linux/netdevice.h:4385
> > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2594
> > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2575)
>
> Thanks a lot! That seems to point the finger at commit 3aa6bce9af0e
> ("net: watchdog: hold device global xmit lock during tx disable") but
> frankly IDK why... The driver must be fully initialized to get to
> ndo_open() so how is the tx_global_lock busted?!
>
> Would you be able to re-run with CONFIG_KASAN=y ?
> Perhaps KASAN can tell us what's messing up the lock.

Sorry for taking a long time to provide the info.
Below is the call trace, note it is on a different machine. It might
take me a few days in case I need to try on the original machine.

[  110.329039] [0000:c1:00.2]:[qedf_link_update:613]:9: LINK DOWN.
[  110.330183] invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[  110.340728] CPU: 56 PID: 1810 Comm: NetworkManager Not tainted 5.19.0 #1
[  110.347435] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS
1.18.0 01/17/2022
[  110.355088] RIP: 0010:qede_load.cold+0x14c/0xa08 [qede]
[  110.360348] Code: c6 60 fb 40 c0 48 c7 c7 40 e1 40 c0 e8 b7 21 28
c8 48 8b 3c 24 e8 fa 06 2d c7 41 0f b7 9f b6 00 00 00 41 89 dc e9 c2
3c fe ff <0f> 0b 48 c7 c1 60 d0 40 c0 eb c1 49 8d 7f 08 e8 36 09 2d c7
49 8b
[  110.379101] RSP: 0018:ffff888162ab6e00 EFLAGS: 00010206
[  110.384338] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffc03ed524
[  110.391479] RDX: 000000000000006b RSI: 0000000000000007 RDI: ffff88810401a758
[  110.398621] RBP: ffff8888a20f2cd0 R08: 0000000000000001 R09: ffffffff8bba9e0f
[  110.405761] R10: fffffbfff17753c1 R11: 0000000000000001 R12: ffff88810401a758
[  110.412895] R13: ffff8888a20f2c08 R14: ffff8888a20f2cb6 R15: ffff8888a20f2c00
[  110.420036] FS:  00007fac3a412500(0000) GS:ffff888810d00000(0000)
knlGS:0000000000000000
[  110.428129] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  110.433875] CR2: 00007fac38ffca88 CR3: 0000000123528000 CR4: 00000000003506e0
[  110.441009] Call Trace:
[  110.443464]  <TASK>
[  110.445585]  ? qed_eth_rxq_start_ramrod+0x320/0x320 [qed]
[  110.451110]  ? qede_alloc_mem_txq+0x240/0x240 [qede]
[  110.456106]  ? lock_release+0x233/0x470
[  110.459958]  ? rwsem_wake.isra.0+0xf1/0x100
[  110.464163]  ? lock_chain_count+0x20/0x20
[  110.468179]  ? find_held_lock+0x83/0xa0
[  110.472032]  ? lock_is_held_type+0xe3/0x140
[  110.476245]  ? lockdep_hardirqs_on_prepare+0x132/0x230
[  110.481397]  ? queue_delayed_work_on+0x57/0x90
[  110.485852]  ? lockdep_hardirqs_on+0x7d/0x100
[  110.490221]  ? qed_get_int_fp+0xe0/0xe0 [qed]
[  110.494703]  qede_open+0x6d/0x100 [qede]
[  110.498664]  __dev_open+0x1c3/0x2c0
[  110.502171]  ? dev_set_rx_mode+0x60/0x60
[  110.506105]  ? lockdep_hardirqs_on_prepare+0x132/0x230
[  110.511254]  ? __local_bh_enable_ip+0x8f/0x110
[  110.515711]  __dev_change_flags+0x31b/0x3b0
[  110.519906]  ? dev_set_allmulti+0x10/0x10
[  110.523935]  dev_change_flags+0x58/0xb0
[  110.527783]  do_setlink+0xb38/0x19e0
[  110.531370]  ? reacquire_held_locks+0x270/0x270
[  110.535910]  ? rtnetlink_put_metrics+0x2e0/0x2e0
[  110.540538]  ? entry_SYSCALL_64+0x1/0x29
[  110.544478]  ? is_bpf_text_address+0x83/0xf0
[  110.548762]  ? kernel_text_address+0x125/0x130
[  110.553218]  ? __kernel_text_address+0xe/0x40
[  110.557585]  ? unwind_get_return_address+0x33/0x50
[  110.562386]  ? create_prof_cpu_mask+0x20/0x20
[  110.566755]  ? arch_stack_walk+0xa3/0x100
[  110.570781]  ? memset+0x1f/0x40
[  110.573939]  ? __nla_validate_parse+0xb4/0x1040
[  110.578481]  ? stack_trace_save+0x96/0xd0
[  110.582504]  ? nla_get_range_signed+0x180/0x180
[  110.587042]  ? __stack_depot_save+0x35/0x4a0
[  110.591335]  __rtnl_newlink+0x715/0xc90
[  110.595182]  ? mark_lock+0xd51/0xd90
[  110.598773]  ? rtnl_link_unregister+0x1e0/0x1e0
[  110.603309]  ? _raw_spin_unlock_irqrestore+0x40/0x60
[  110.608285]  ? ___slab_alloc+0x919/0xf80
[  110.612222]  ? rtnl_newlink+0x36/0x70
[  110.615896]  ? reacquire_held_locks+0x270/0x270
[  110.620440]  ? lock_is_held_type+0xe3/0x140
[  110.624634]  ? rcu_read_lock_sched_held+0x3f/0x80
[  110.629353]  ? trace_kmalloc+0x33/0x100
[  110.633207]  rtnl_newlink+0x4f/0x70
[  110.636704]  rtnetlink_rcv_msg+0x242/0x6b0
[  110.640815]  ? rtnl_stats_set+0x260/0x260
[  110.644836]  ? lock_acquire+0x16f/0x410
[  110.648682]  ? lock_acquire+0x17f/0x410
[  110.652533]  netlink_rcv_skb+0xce/0x200
[  110.656385]  ? rtnl_stats_set+0x260/0x260
[  110.660408]  ? netlink_ack+0x520/0x520
[  110.664166]  ? netlink_deliver_tap+0x13c/0x5c0
[  110.668626]  ? netlink_deliver_tap+0x141/0x5c0
[  110.673083]  netlink_unicast+0x2cb/0x460
[  110.677015]  ? netlink_attachskb+0x440/0x440
[  110.681294]  ? __build_skb_around+0x12a/0x150
[  110.685667]  netlink_sendmsg+0x3d2/0x710
[  110.689609]  ? netlink_unicast+0x460/0x460
[  110.693710]  ? iovec_from_user.part.0+0x95/0x200
[  110.698348]  ? netlink_unicast+0x460/0x460
[  110.702456]  sock_sendmsg+0x99/0xa0
[  110.705963]  ____sys_sendmsg+0x3d4/0x410
[  110.709895]  ? kernel_sendmsg+0x30/0x30
[  110.713740]  ? __ia32_sys_recvmmsg+0x160/0x160
[  110.718200]  ? lockdep_hardirqs_on_prepare+0x230/0x230
[  110.723358]  ___sys_sendmsg+0xe2/0x150
[  110.727124]  ? sendmsg_copy_msghdr+0x110/0x110
[  110.731576]  ? find_held_lock+0x83/0xa0
[  110.735425]  ? lock_release+0x233/0x470
[  110.739271]  ? __fget_files+0x14a/0x200
[  110.743120]  ? reacquire_held_locks+0x270/0x270
[  110.747674]  ? __fget_files+0x162/0x200
[  110.751524]  ? __fget_light+0x66/0x100
[  110.755286]  __sys_sendmsg+0xc3/0x140
[  110.758964]  ? __sys_sendmsg_sock+0x20/0x20
[  110.763158]  ? mark_held_locks+0x24/0x90
[  110.767099]  ? ktime_get_coarse_real_ts64+0x19/0x80
[  110.771990]  ? ktime_get_coarse_real_ts64+0x65/0x80
[  110.776879]  ? syscall_trace_enter.constprop.0+0x16f/0x230
[  110.782375]  do_syscall_64+0x5b/0x80
[  110.785963]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  110.791021] RIP: 0033:0x7fac3b54f71d
[  110.794609] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 ea
c4 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 3e c5 f4
ff 48
[  110.813362] RSP: 002b:00007ffd3b5c7da0 EFLAGS: 00000293 ORIG_RAX:
000000000000002e
[  110.820938] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fac3b54f71d
[  110.828081] RDX: 0000000000000000 RSI: 00007ffd3b5c7de0 RDI: 000000000000000d
[  110.835221] RBP: 0000563d7ac60090 R08: 0000000000000000 R09: 0000000000000000
[  110.842361] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffd3b5c7f4c
[  110.849494] R13: 00007ffd3b5c7f50 R14: 0000000000000000 R15: 00007ffd3b5c7f58
[  110.856639]  </TASK>
[  110.858837] Modules linked in: pcc_cpufreq(-) rfkill intel_rapl_msr
dcdbas intel_rapl_common amd64_edac edac_mce_amd rapl pcspkr qedi
mgag200 i2c_algo_bit iscsi_boot_sysfs libiscsi drm_shmem_helper
cdc_ether scsi_transport_iscsi usbnet drm_kms_helper mii uio ipmi_ssif
k10temp i2c_piix4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler
acpi_power_meter acpi_cpufreq vfat fat drm fuse xfs qedf qede qed
crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel libfcoe
libfc scsi_transport_fc crc8 ccp tg3 sp5100_tco
[  110.904398] ---[ end trace 0000000000000000 ]---
[  110.909039] RIP: 0010:qede_load.cold+0x14c/0xa08 [qede]
[  110.914306] Code: c6 60 fb 40 c0 48 c7 c7 40 e1 40 c0 e8 b7 21 28
c8 48 8b 3c 24 e8 fa 06 2d c7 41 0f b7 9f b6 00 00 00 41 89 dc e9 c2
3c fe ff <0f> 0b 48 c7 c1 60 d0 40 c0 eb c1 49 8d 7f 08 e8 36 09 2d c7
49 8b
[  110.933068] RSP: 0018:ffff888162ab6e00 EFLAGS: 00010206
[  110.938314] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffc03ed524
[  110.945466] RDX: 000000000000006b RSI: 0000000000000007 RDI: ffff88810401a758
[  110.952616] RBP: ffff8888a20f2cd0 R08: 0000000000000001 R09: ffffffff8bba9e0f
[  110.959772] R10: fffffbfff17753c1 R11: 0000000000000001 R12: ffff88810401a758
[  110.966925] R13: ffff8888a20f2c08 R14: ffff8888a20f2cb6 R15: ffff8888a20f2c00
[  110.974092] FS:  00007fac3a412500(0000) GS:ffff888810d00000(0000)
knlGS:0000000000000000
[  110.982198] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  110.987971] CR2: 00007fac38ffca88 CR3: 0000000123528000 CR4: 00000000003506e0
[  110.995131] Kernel panic - not syncing: Fatal exception
[  111.001311] Kernel Offset: 0x6000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  111.012016] ---[ end Kernel panic - not syncing: Fatal exception ]---

kernel tarball:
https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/604654489/publish%20x86_64%20debug/2813007034/artifacts/kernel-mainline.kernel.org-redhat_604654489_x86_64_debug.tar.gz
kernel config: https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/604654489/build%20x86_64%20debug/2813006987/artifacts/kernel-mainline.kernel.org-redhat_604654489_x86_64_debug.config


Bruno


>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0
  2022-08-18  7:22       ` Bruno Goncalves
@ 2022-08-18 15:51         ` Jakub Kicinski
  2022-08-18 17:55           ` [EXT] " Manish Chopra
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2022-08-18 15:51 UTC (permalink / raw)
  To: Bruno Goncalves, Ariel Elior
  Cc: LKML, Networking, CKI Project, Saurav Kashyap, Javed Hasan,
	Manish Chopra

On Thu, 18 Aug 2022 09:22:17 +0200 Bruno Goncalves wrote:
> On Wed, 3 Aug 2022 at 17:37, Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Wed, 3 Aug 2022 14:13:00 +0200 Bruno Goncalves wrote:  
> > > Got this from the most recent failure (kernel built using commit 0805c6fb39f6):
> > >
> > > the tarball is https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/603714145/build%20x86_64%20debug/2807738987/artifacts/kernel-mainline.kernel.org-redhat_603714145_x86_64_debug.tar.gz
> > > and the call trace from
> > > https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/08/02/redhat:603123526/build_x86_64_redhat:603123526_x86_64_debug/tests/1/results_0001/console.log/console.log
> > >
> > > [   69.876513] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > [   69.888521] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant
> > > DL325 Gen10 Plus, BIOS A43 08/09/2021
> > > [   69.897971] RIP: 0010:qede_load.cold
> > > (/builds/2807738987/workdir/./include/linux/spinlock.h:389
> > > /builds/2807738987/workdir/./include/linux/netdevice.h:4294
> > > /builds/2807738987/workdir/./include/linux/netdevice.h:4385
> > > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2594
> > > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2575)  
> >
> > Thanks a lot! That seems to point the finger at commit 3aa6bce9af0e
> > ("net: watchdog: hold device global xmit lock during tx disable") but
> > frankly IDK why... The driver must be fully initialized to get to
> > ndo_open() so how is the tx_global_lock busted?!
> >
> > Would you be able to re-run with CONFIG_KASAN=y ?
> > Perhaps KASAN can tell us what's messing up the lock.  
> 
> Sorry for taking a long time to provide the info.
> Below is the call trace, note it is on a different machine. It might
> take me a few days in case I need to try on the original machine.

Thanks, looks like KASAN didn't catch anything, it's the same crash :(
Let's CC all the Qlogic people, Qlogic PTAL.

> [  110.329039] [0000:c1:00.2]:[qedf_link_update:613]:9: LINK DOWN.
> [  110.330183] invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
> [  110.340728] CPU: 56 PID: 1810 Comm: NetworkManager Not tainted 5.19.0 #1
> [  110.347435] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS
> 1.18.0 01/17/2022
> [  110.355088] RIP: 0010:qede_load.cold+0x14c/0xa08 [qede]
> [  110.360348] Code: c6 60 fb 40 c0 48 c7 c7 40 e1 40 c0 e8 b7 21 28
> c8 48 8b 3c 24 e8 fa 06 2d c7 41 0f b7 9f b6 00 00 00 41 89 dc e9 c2
> 3c fe ff <0f> 0b 48 c7 c1 60 d0 40 c0 eb c1 49 8d 7f 08 e8 36 09 2d c7
> 49 8b
> [  110.379101] RSP: 0018:ffff888162ab6e00 EFLAGS: 00010206
> [  110.384338] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffc03ed524
> [  110.391479] RDX: 000000000000006b RSI: 0000000000000007 RDI: ffff88810401a758
> [  110.398621] RBP: ffff8888a20f2cd0 R08: 0000000000000001 R09: ffffffff8bba9e0f
> [  110.405761] R10: fffffbfff17753c1 R11: 0000000000000001 R12: ffff88810401a758
> [  110.412895] R13: ffff8888a20f2c08 R14: ffff8888a20f2cb6 R15: ffff8888a20f2c00
> [  110.420036] FS:  00007fac3a412500(0000) GS:ffff888810d00000(0000)
> knlGS:0000000000000000
> [  110.428129] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  110.433875] CR2: 00007fac38ffca88 CR3: 0000000123528000 CR4: 00000000003506e0
> [  110.441009] Call Trace:
> [  110.443464]  <TASK>
> [  110.445585]  ? qed_eth_rxq_start_ramrod+0x320/0x320 [qed]
> [  110.451110]  ? qede_alloc_mem_txq+0x240/0x240 [qede]
> [  110.456106]  ? lock_release+0x233/0x470
> [  110.459958]  ? rwsem_wake.isra.0+0xf1/0x100
> [  110.464163]  ? lock_chain_count+0x20/0x20
> [  110.468179]  ? find_held_lock+0x83/0xa0
> [  110.472032]  ? lock_is_held_type+0xe3/0x140
> [  110.476245]  ? lockdep_hardirqs_on_prepare+0x132/0x230
> [  110.481397]  ? queue_delayed_work_on+0x57/0x90
> [  110.485852]  ? lockdep_hardirqs_on+0x7d/0x100
> [  110.490221]  ? qed_get_int_fp+0xe0/0xe0 [qed]
> [  110.494703]  qede_open+0x6d/0x100 [qede]
> [  110.498664]  __dev_open+0x1c3/0x2c0
> [  110.502171]  ? dev_set_rx_mode+0x60/0x60
> [  110.506105]  ? lockdep_hardirqs_on_prepare+0x132/0x230
> [  110.511254]  ? __local_bh_enable_ip+0x8f/0x110
> [  110.515711]  __dev_change_flags+0x31b/0x3b0
> [  110.519906]  ? dev_set_allmulti+0x10/0x10
> [  110.523935]  dev_change_flags+0x58/0xb0
> [  110.527783]  do_setlink+0xb38/0x19e0
> [  110.531370]  ? reacquire_held_locks+0x270/0x270
> [  110.535910]  ? rtnetlink_put_metrics+0x2e0/0x2e0
> [  110.540538]  ? entry_SYSCALL_64+0x1/0x29
> [  110.544478]  ? is_bpf_text_address+0x83/0xf0
> [  110.548762]  ? kernel_text_address+0x125/0x130
> [  110.553218]  ? __kernel_text_address+0xe/0x40
> [  110.557585]  ? unwind_get_return_address+0x33/0x50
> [  110.562386]  ? create_prof_cpu_mask+0x20/0x20
> [  110.566755]  ? arch_stack_walk+0xa3/0x100
> [  110.570781]  ? memset+0x1f/0x40
> [  110.573939]  ? __nla_validate_parse+0xb4/0x1040
> [  110.578481]  ? stack_trace_save+0x96/0xd0
> [  110.582504]  ? nla_get_range_signed+0x180/0x180
> [  110.587042]  ? __stack_depot_save+0x35/0x4a0
> [  110.591335]  __rtnl_newlink+0x715/0xc90
> [  110.595182]  ? mark_lock+0xd51/0xd90
> [  110.598773]  ? rtnl_link_unregister+0x1e0/0x1e0
> [  110.603309]  ? _raw_spin_unlock_irqrestore+0x40/0x60
> [  110.608285]  ? ___slab_alloc+0x919/0xf80
> [  110.612222]  ? rtnl_newlink+0x36/0x70
> [  110.615896]  ? reacquire_held_locks+0x270/0x270
> [  110.620440]  ? lock_is_held_type+0xe3/0x140
> [  110.624634]  ? rcu_read_lock_sched_held+0x3f/0x80
> [  110.629353]  ? trace_kmalloc+0x33/0x100
> [  110.633207]  rtnl_newlink+0x4f/0x70
> [  110.636704]  rtnetlink_rcv_msg+0x242/0x6b0
> [  110.640815]  ? rtnl_stats_set+0x260/0x260
> [  110.644836]  ? lock_acquire+0x16f/0x410
> [  110.648682]  ? lock_acquire+0x17f/0x410
> [  110.652533]  netlink_rcv_skb+0xce/0x200
> [  110.656385]  ? rtnl_stats_set+0x260/0x260
> [  110.660408]  ? netlink_ack+0x520/0x520
> [  110.664166]  ? netlink_deliver_tap+0x13c/0x5c0
> [  110.668626]  ? netlink_deliver_tap+0x141/0x5c0
> [  110.673083]  netlink_unicast+0x2cb/0x460
> [  110.677015]  ? netlink_attachskb+0x440/0x440
> [  110.681294]  ? __build_skb_around+0x12a/0x150
> [  110.685667]  netlink_sendmsg+0x3d2/0x710
> [  110.689609]  ? netlink_unicast+0x460/0x460
> [  110.693710]  ? iovec_from_user.part.0+0x95/0x200
> [  110.698348]  ? netlink_unicast+0x460/0x460
> [  110.702456]  sock_sendmsg+0x99/0xa0
> [  110.705963]  ____sys_sendmsg+0x3d4/0x410
> [  110.709895]  ? kernel_sendmsg+0x30/0x30
> [  110.713740]  ? __ia32_sys_recvmmsg+0x160/0x160
> [  110.718200]  ? lockdep_hardirqs_on_prepare+0x230/0x230
> [  110.723358]  ___sys_sendmsg+0xe2/0x150
> [  110.727124]  ? sendmsg_copy_msghdr+0x110/0x110
> [  110.731576]  ? find_held_lock+0x83/0xa0
> [  110.735425]  ? lock_release+0x233/0x470
> [  110.739271]  ? __fget_files+0x14a/0x200
> [  110.743120]  ? reacquire_held_locks+0x270/0x270
> [  110.747674]  ? __fget_files+0x162/0x200
> [  110.751524]  ? __fget_light+0x66/0x100
> [  110.755286]  __sys_sendmsg+0xc3/0x140
> [  110.758964]  ? __sys_sendmsg_sock+0x20/0x20
> [  110.763158]  ? mark_held_locks+0x24/0x90
> [  110.767099]  ? ktime_get_coarse_real_ts64+0x19/0x80
> [  110.771990]  ? ktime_get_coarse_real_ts64+0x65/0x80
> [  110.776879]  ? syscall_trace_enter.constprop.0+0x16f/0x230
> [  110.782375]  do_syscall_64+0x5b/0x80
> [  110.785963]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [  110.791021] RIP: 0033:0x7fac3b54f71d
> [  110.794609] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 ea
> c4 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 3e c5 f4
> ff 48
> [  110.813362] RSP: 002b:00007ffd3b5c7da0 EFLAGS: 00000293 ORIG_RAX:
> 000000000000002e
> [  110.820938] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fac3b54f71d
> [  110.828081] RDX: 0000000000000000 RSI: 00007ffd3b5c7de0 RDI: 000000000000000d
> [  110.835221] RBP: 0000563d7ac60090 R08: 0000000000000000 R09: 0000000000000000
> [  110.842361] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffd3b5c7f4c
> [  110.849494] R13: 00007ffd3b5c7f50 R14: 0000000000000000 R15: 00007ffd3b5c7f58
> [  110.856639]  </TASK>
> [  110.858837] Modules linked in: pcc_cpufreq(-) rfkill intel_rapl_msr
> dcdbas intel_rapl_common amd64_edac edac_mce_amd rapl pcspkr qedi
> mgag200 i2c_algo_bit iscsi_boot_sysfs libiscsi drm_shmem_helper
> cdc_ether scsi_transport_iscsi usbnet drm_kms_helper mii uio ipmi_ssif
> k10temp i2c_piix4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler
> acpi_power_meter acpi_cpufreq vfat fat drm fuse xfs qedf qede qed
> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel libfcoe
> libfc scsi_transport_fc crc8 ccp tg3 sp5100_tco
> [  110.904398] ---[ end trace 0000000000000000 ]---
> [  110.909039] RIP: 0010:qede_load.cold+0x14c/0xa08 [qede]
> [  110.914306] Code: c6 60 fb 40 c0 48 c7 c7 40 e1 40 c0 e8 b7 21 28
> c8 48 8b 3c 24 e8 fa 06 2d c7 41 0f b7 9f b6 00 00 00 41 89 dc e9 c2
> 3c fe ff <0f> 0b 48 c7 c1 60 d0 40 c0 eb c1 49 8d 7f 08 e8 36 09 2d c7
> 49 8b
> [  110.933068] RSP: 0018:ffff888162ab6e00 EFLAGS: 00010206
> [  110.938314] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffc03ed524
> [  110.945466] RDX: 000000000000006b RSI: 0000000000000007 RDI: ffff88810401a758
> [  110.952616] RBP: ffff8888a20f2cd0 R08: 0000000000000001 R09: ffffffff8bba9e0f
> [  110.959772] R10: fffffbfff17753c1 R11: 0000000000000001 R12: ffff88810401a758
> [  110.966925] R13: ffff8888a20f2c08 R14: ffff8888a20f2cb6 R15: ffff8888a20f2c00
> [  110.974092] FS:  00007fac3a412500(0000) GS:ffff888810d00000(0000)
> knlGS:0000000000000000
> [  110.982198] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  110.987971] CR2: 00007fac38ffca88 CR3: 0000000123528000 CR4: 00000000003506e0
> [  110.995131] Kernel panic - not syncing: Fatal exception
> [  111.001311] Kernel Offset: 0x6000000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [  111.012016] ---[ end Kernel panic - not syncing: Fatal exception ]---
> 
> kernel tarball:
> https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/604654489/publish%20x86_64%20debug/2813007034/artifacts/kernel-mainline.kernel.org-redhat_604654489_x86_64_debug.tar.gz
> kernel config: https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/604654489/build%20x86_64%20debug/2813006987/artifacts/kernel-mainline.kernel.org-redhat_604654489_x86_64_debug.config
> 
> 
> Bruno
> 
> 
> >  
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [EXT] Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0
  2022-08-18 15:51         ` Jakub Kicinski
@ 2022-08-18 17:55           ` Manish Chopra
  2022-08-18 18:26             ` Jakub Kicinski
  2022-08-19  7:36             ` Bruno Goncalves
  0 siblings, 2 replies; 11+ messages in thread
From: Manish Chopra @ 2022-08-18 17:55 UTC (permalink / raw)
  To: Jakub Kicinski, Bruno Goncalves, Ariel Elior
  Cc: LKML, Networking, CKI Project, Saurav Kashyap, Javed Hasan, Alok Prasad

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Thursday, August 18, 2022 9:21 PM
> To: Bruno Goncalves <bgoncalv@redhat.com>; Ariel Elior
> <aelior@marvell.com>
> Cc: LKML <linux-kernel@vger.kernel.org>; Networking
> <netdev@vger.kernel.org>; CKI Project <cki-project@redhat.com>; Saurav
> Kashyap <skashyap@marvell.com>; Javed Hasan <jhasan@marvell.com>;
> Manish Chopra <manishc@marvell.com>
> Subject: [EXT] Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Thu, 18 Aug 2022 09:22:17 +0200 Bruno Goncalves wrote:
> > On Wed, 3 Aug 2022 at 17:37, Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Wed, 3 Aug 2022 14:13:00 +0200 Bruno Goncalves wrote:
> > > > Got this from the most recent failure (kernel built using commit
> 0805c6fb39f6):
> > > >
> > > > the tarball is
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__s3.amazonaws.
> > > > com_arr-2Dcki-2Dprod-2Dtrusted-2Dartifacts_trusted-2Dartifacts_603
> > > > 714145_build-2520x86-5F64-2520debug_2807738987_artifacts_kernel-
> 2D
> > > > mainline.kernel.org-2Dredhat-5F603714145-5Fx86-5F64-5Fdebug.tar.gz
> > > >
> &d=DwICAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=bMTgx2X48QVXyXOEL8ALyI4d
> sWoR-
> > > > m74c5n3d-
> ruJI8&m=zBBoyokuEgJ25hD586tidMPozXvZjlserj2qf3Iqn2o5V-ds8
> > > >
> Jb7IkFIggvHpm4H&s=sjyeF4V5YfoiaDBRrtfGEXdVs3el3AdmvUNVQbteSu4&e=
> > > > and the call trace from
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__s3.us-2Deast-
> > > > 2D1.amazonaws.com_arr-2Dcki-2Dprod-2Ddatawarehouse-
> 2Dpublic_datawa
> > > > rehouse-2Dpublic_2022_08_02_redhat-3A603123526_build-5Fx86-
> 5F64-5F
> > > > redhat-3A603123526-5Fx86-5F64-5Fdebug_tests_1_results-
> 5F0001_conso
> > > >
> le.log_console.log&d=DwICAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=bMTgx2X48
> QV
> > > > XyXOEL8ALyI4dsWoR-m74c5n3d-
> ruJI8&m=zBBoyokuEgJ25hD586tidMPozXvZjls
> > > > erj2qf3Iqn2o5V-
> ds8Jb7IkFIggvHpm4H&s=wV1Vq1lhXX02fbTXIWy_NRHxb9LgDz
> > > > Enst11oy-RTpM&e=
> > > >
> > > > [   69.876513] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > > [   69.888521] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant
> > > > DL325 Gen10 Plus, BIOS A43 08/09/2021
> > > > [   69.897971] RIP: 0010:qede_load.cold
> > > > (/builds/2807738987/workdir/./include/linux/spinlock.h:389
> > > > /builds/2807738987/workdir/./include/linux/netdevice.h:4294
> > > > /builds/2807738987/workdir/./include/linux/netdevice.h:4385
> > > > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_m
> > > > ain.c:2594
> > > > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_m
> > > > ain.c:2575)
> > >
> > > Thanks a lot! That seems to point the finger at commit 3aa6bce9af0e
> > > ("net: watchdog: hold device global xmit lock during tx disable")
> > > but frankly IDK why... The driver must be fully initialized to get
> > > to
> > > ndo_open() so how is the tx_global_lock busted?!
> > >
> > > Would you be able to re-run with CONFIG_KASAN=y ?
> > > Perhaps KASAN can tell us what's messing up the lock.
> >
> > Sorry for taking a long time to provide the info.
> > Below is the call trace, note it is on a different machine. It might
> > take me a few days in case I need to try on the original machine.
> 
> Thanks, looks like KASAN didn't catch anything, it's the same crash :( Let's CC
> all the Qlogic people, Qlogic PTAL.
> 
> > [  110.329039] [0000:c1:00.2]:[qedf_link_update:613]:9: LINK DOWN.
> > [  110.330183] invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI [
> > 110.340728] CPU: 56 PID: 1810 Comm: NetworkManager Not tainted 5.19.0
> > #1 [  110.347435] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T,
> > BIOS
> > 1.18.0 01/17/2022
> > [  110.355088] RIP: 0010:qede_load.cold+0x14c/0xa08 [qede] [
> > 110.360348] Code: c6 60 fb 40 c0 48 c7 c7 40 e1 40 c0 e8 b7 21 28
> > c8 48 8b 3c 24 e8 fa 06 2d c7 41 0f b7 9f b6 00 00 00 41 89 dc e9 c2
> > 3c fe ff <0f> 0b 48 c7 c1 60 d0 40 c0 eb c1 49 8d 7f 08 e8 36 09 2d c7
> > 49 8b
> > [  110.379101] RSP: 0018:ffff888162ab6e00 EFLAGS: 00010206 [
> > 110.384338] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > ffffffffc03ed524 [  110.391479] RDX: 000000000000006b RSI:
> > 0000000000000007 RDI: ffff88810401a758 [  110.398621] RBP:
> > ffff8888a20f2cd0 R08: 0000000000000001 R09: ffffffff8bba9e0f [
> > 110.405761] R10: fffffbfff17753c1 R11: 0000000000000001 R12:
> > ffff88810401a758 [  110.412895] R13: ffff8888a20f2c08 R14:
> > ffff8888a20f2cb6 R15: ffff8888a20f2c00 [  110.420036] FS:
> > 00007fac3a412500(0000) GS:ffff888810d00000(0000)
> > knlGS:0000000000000000
> > [  110.428129] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> > 110.433875] CR2: 00007fac38ffca88 CR3: 0000000123528000 CR4:
> > 00000000003506e0 [  110.441009] Call Trace:
> > [  110.443464]  <TASK>
> > [  110.445585]  ? qed_eth_rxq_start_ramrod+0x320/0x320 [qed] [
> > 110.451110]  ? qede_alloc_mem_txq+0x240/0x240 [qede] [  110.456106]  ?
> > lock_release+0x233/0x470 [  110.459958]  ?
> > rwsem_wake.isra.0+0xf1/0x100 [  110.464163]  ?
> > lock_chain_count+0x20/0x20 [  110.468179]  ? find_held_lock+0x83/0xa0
> > [  110.472032]  ? lock_is_held_type+0xe3/0x140 [  110.476245]  ?
> > lockdep_hardirqs_on_prepare+0x132/0x230
> > [  110.481397]  ? queue_delayed_work_on+0x57/0x90 [  110.485852]  ?
> > lockdep_hardirqs_on+0x7d/0x100 [  110.490221]  ?
> > qed_get_int_fp+0xe0/0xe0 [qed] [  110.494703]  qede_open+0x6d/0x100
> > [qede] [  110.498664]  __dev_open+0x1c3/0x2c0 [  110.502171]  ?
> > dev_set_rx_mode+0x60/0x60 [  110.506105]  ?
> > lockdep_hardirqs_on_prepare+0x132/0x230
> > [  110.511254]  ? __local_bh_enable_ip+0x8f/0x110 [  110.515711]
> > __dev_change_flags+0x31b/0x3b0 [  110.519906]  ?
> > dev_set_allmulti+0x10/0x10 [  110.523935]  dev_change_flags+0x58/0xb0
> > [  110.527783]  do_setlink+0xb38/0x19e0 [  110.531370]  ?
> > reacquire_held_locks+0x270/0x270 [  110.535910]  ?
> > rtnetlink_put_metrics+0x2e0/0x2e0 [  110.540538]  ?
> > entry_SYSCALL_64+0x1/0x29 [  110.544478]  ?
> > is_bpf_text_address+0x83/0xf0 [  110.548762]  ?
> > kernel_text_address+0x125/0x130 [  110.553218]  ?
> > __kernel_text_address+0xe/0x40 [  110.557585]  ?
> > unwind_get_return_address+0x33/0x50
> > [  110.562386]  ? create_prof_cpu_mask+0x20/0x20 [  110.566755]  ?
> > arch_stack_walk+0xa3/0x100 [  110.570781]  ? memset+0x1f/0x40 [
> > 110.573939]  ? __nla_validate_parse+0xb4/0x1040 [  110.578481]  ?
> > stack_trace_save+0x96/0xd0 [  110.582504]  ?
> > nla_get_range_signed+0x180/0x180 [  110.587042]  ?
> > __stack_depot_save+0x35/0x4a0 [  110.591335]
> > __rtnl_newlink+0x715/0xc90 [  110.595182]  ? mark_lock+0xd51/0xd90 [
> > 110.598773]  ? rtnl_link_unregister+0x1e0/0x1e0 [  110.603309]  ?
> > _raw_spin_unlock_irqrestore+0x40/0x60
> > [  110.608285]  ? ___slab_alloc+0x919/0xf80 [  110.612222]  ?
> > rtnl_newlink+0x36/0x70 [  110.615896]  ?
> > reacquire_held_locks+0x270/0x270 [  110.620440]  ?
> > lock_is_held_type+0xe3/0x140 [  110.624634]  ?
> > rcu_read_lock_sched_held+0x3f/0x80
> > [  110.629353]  ? trace_kmalloc+0x33/0x100 [  110.633207]
> > rtnl_newlink+0x4f/0x70 [  110.636704]  rtnetlink_rcv_msg+0x242/0x6b0 [
> > 110.640815]  ? rtnl_stats_set+0x260/0x260 [  110.644836]  ?
> > lock_acquire+0x16f/0x410 [  110.648682]  ? lock_acquire+0x17f/0x410 [
> > 110.652533]  netlink_rcv_skb+0xce/0x200 [  110.656385]  ?
> > rtnl_stats_set+0x260/0x260 [  110.660408]  ? netlink_ack+0x520/0x520 [
> > 110.664166]  ? netlink_deliver_tap+0x13c/0x5c0 [  110.668626]  ?
> > netlink_deliver_tap+0x141/0x5c0 [  110.673083]
> > netlink_unicast+0x2cb/0x460 [  110.677015]  ?
> > netlink_attachskb+0x440/0x440 [  110.681294]  ?
> > __build_skb_around+0x12a/0x150 [  110.685667]
> > netlink_sendmsg+0x3d2/0x710 [  110.689609]  ?
> > netlink_unicast+0x460/0x460 [  110.693710]  ?
> > iovec_from_user.part.0+0x95/0x200 [  110.698348]  ?
> > netlink_unicast+0x460/0x460 [  110.702456]  sock_sendmsg+0x99/0xa0 [
> > 110.705963]  ____sys_sendmsg+0x3d4/0x410 [  110.709895]  ?
> > kernel_sendmsg+0x30/0x30 [  110.713740]  ?
> > __ia32_sys_recvmmsg+0x160/0x160 [  110.718200]  ?
> > lockdep_hardirqs_on_prepare+0x230/0x230
> > [  110.723358]  ___sys_sendmsg+0xe2/0x150 [  110.727124]  ?
> > sendmsg_copy_msghdr+0x110/0x110 [  110.731576]  ?
> > find_held_lock+0x83/0xa0 [  110.735425]  ? lock_release+0x233/0x470 [
> > 110.739271]  ? __fget_files+0x14a/0x200 [  110.743120]  ?
> > reacquire_held_locks+0x270/0x270 [  110.747674]  ?
> > __fget_files+0x162/0x200 [  110.751524]  ? __fget_light+0x66/0x100 [
> > 110.755286]  __sys_sendmsg+0xc3/0x140 [  110.758964]  ?
> > __sys_sendmsg_sock+0x20/0x20 [  110.763158]  ?
> > mark_held_locks+0x24/0x90 [  110.767099]  ?
> > ktime_get_coarse_real_ts64+0x19/0x80
> > [  110.771990]  ? ktime_get_coarse_real_ts64+0x65/0x80
> > [  110.776879]  ? syscall_trace_enter.constprop.0+0x16f/0x230
> > [  110.782375]  do_syscall_64+0x5b/0x80 [  110.785963]
> > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > [  110.791021] RIP: 0033:0x7fac3b54f71d [  110.794609] Code: 28 89 54
> > 24 1c 48 89 74 24 10 89 7c 24 08 e8 ea
> > c4 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00
> > 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 3e c5 f4
> > ff 48 [  110.813362] RSP: 002b:00007ffd3b5c7da0 EFLAGS: 00000293
> > ORIG_RAX:
> > 000000000000002e
> > [  110.820938] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> > 00007fac3b54f71d [  110.828081] RDX: 0000000000000000 RSI:
> > 00007ffd3b5c7de0 RDI: 000000000000000d [  110.835221] RBP:
> > 0000563d7ac60090 R08: 0000000000000000 R09: 0000000000000000 [
> > 110.842361] R10: 0000000000000000 R11: 0000000000000293 R12:
> > 00007ffd3b5c7f4c [  110.849494] R13: 00007ffd3b5c7f50 R14:
> > 0000000000000000 R15: 00007ffd3b5c7f58 [  110.856639]  </TASK> [
> > 110.858837] Modules linked in: pcc_cpufreq(-) rfkill intel_rapl_msr
> > dcdbas intel_rapl_common amd64_edac edac_mce_amd rapl pcspkr qedi
> > mgag200 i2c_algo_bit iscsi_boot_sysfs libiscsi drm_shmem_helper
> > cdc_ether scsi_transport_iscsi usbnet drm_kms_helper mii uio ipmi_ssif
> > k10temp i2c_piix4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler
> > acpi_power_meter acpi_cpufreq vfat fat drm fuse xfs qedf qede qed
> > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel libfcoe
> > libfc scsi_transport_fc crc8 ccp tg3 sp5100_tco [  110.904398] ---[
> > end trace 0000000000000000 ]--- [  110.909039] RIP:
> > 0010:qede_load.cold+0x14c/0xa08 [qede] [  110.914306] Code: c6 60 fb
> > 40 c0 48 c7 c7 40 e1 40 c0 e8 b7 21 28
> > c8 48 8b 3c 24 e8 fa 06 2d c7 41 0f b7 9f b6 00 00 00 41 89 dc e9 c2
> > 3c fe ff <0f> 0b 48 c7 c1 60 d0 40 c0 eb c1 49 8d 7f 08 e8 36 09 2d c7
> > 49 8b
> > [  110.933068] RSP: 0018:ffff888162ab6e00 EFLAGS: 00010206 [
> > 110.938314] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > ffffffffc03ed524 [  110.945466] RDX: 000000000000006b RSI:
> > 0000000000000007 RDI: ffff88810401a758 [  110.952616] RBP:
> > ffff8888a20f2cd0 R08: 0000000000000001 R09: ffffffff8bba9e0f [
> > 110.959772] R10: fffffbfff17753c1 R11: 0000000000000001 R12:
> > ffff88810401a758 [  110.966925] R13: ffff8888a20f2c08 R14:
> > ffff8888a20f2cb6 R15: ffff8888a20f2c00 [  110.974092] FS:
> > 00007fac3a412500(0000) GS:ffff888810d00000(0000)
> > knlGS:0000000000000000
> > [  110.982198] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> > 110.987971] CR2: 00007fac38ffca88 CR3: 0000000123528000 CR4:
> > 00000000003506e0 [  110.995131] Kernel panic - not syncing: Fatal
> > exception [  111.001311] Kernel Offset: 0x6000000 from
> > 0xffffffff81000000 (relocation range:
> > 0xffffffff80000000-0xffffffffbfffffff)
> > [  111.012016] ---[ end Kernel panic - not syncing: Fatal exception
> > ]---
> >
> > kernel tarball:
> > https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__s3.amazonaws.com_
> > arr-2Dcki-2Dprod-2Dtrusted-2Dartifacts_trusted-2Dartifacts_604654489_p
> > ublish-2520x86-5F64-2520debug_2813007034_artifacts_kernel-
> 2Dmainline.k
> > ernel.org-2Dredhat-5F604654489-5Fx86-5F64-
> 5Fdebug.tar.gz&d=DwICAg&c=nK
> > jWec2b6R0mOyPaz7xtfQ&r=bMTgx2X48QVXyXOEL8ALyI4dsWoR-
> m74c5n3d-ruJI8&m=z
> > BBoyokuEgJ25hD586tidMPozXvZjlserj2qf3Iqn2o5V-
> ds8Jb7IkFIggvHpm4H&s=WXbt
> > GecipcXSY_rwTu6JrCEI7VFKToDZ3UfZ4ciloWk&e=
> > kernel config:
> > https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__s3.amazonaws.com_
> > arr-2Dcki-2Dprod-2Dtrusted-2Dartifacts_trusted-2Dartifacts_604654489_b
> > uild-2520x86-5F64-2520debug_2813006987_artifacts_kernel-
> 2Dmainline.ker
> > nel.org-2Dredhat-5F604654489-5Fx86-5F64-
> 5Fdebug.config&d=DwICAg&c=nKjW
> > ec2b6R0mOyPaz7xtfQ&r=bMTgx2X48QVXyXOEL8ALyI4dsWoR-m74c5n3d-
> ruJI8&m=zBB
> > oyokuEgJ25hD586tidMPozXvZjlserj2qf3Iqn2o5V-
> ds8Jb7IkFIggvHpm4H&s=edaLwi
> > kEZyvLAk8hrsZNE-Esjsn9HZ5luaW_FARAlCw&e=
> >
> >
> > Bruno

Hi Bruno,

1. How do you reproduce this issue exactly ? Any specific instructions or any special kernel CONFIG with which issue reproduces ?
2. Is there any Bugzilla opened for this already ? Can you please provide the complete crash logs ? (vmcore-dmesg.txt ?)
3. You mentioned about commit 3aa6bce9af0e ("net: watchdog: hold device global xmit lock during tx disable")
    Do you mean issue started surfacing only after this commit ? Driver calls netif_tx_disable() from these two relevant contexts -

    a. One in ndo_stop() flow 

      	        /* Close OS Tx */
        netif_tx_disable(edev->ndev);
        netif_carrier_off(edev->ndev);
   
   b. Other in LINK events handling from the hard IRQ context

        DP_NOTICE(edev, "Link is down\n");
        netif_tx_disable(edev->ndev);
        netif_carrier_off(edev->ndev);

Thanks,
Manish


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [EXT] Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0
  2022-08-18 17:55           ` [EXT] " Manish Chopra
@ 2022-08-18 18:26             ` Jakub Kicinski
  2022-08-19  7:36             ` Bruno Goncalves
  1 sibling, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2022-08-18 18:26 UTC (permalink / raw)
  To: Manish Chopra
  Cc: Bruno Goncalves, Ariel Elior, LKML, Networking, CKI Project,
	Saurav Kashyap, Javed Hasan, Alok Prasad

On Thu, 18 Aug 2022 17:55:28 +0000 Manish Chopra wrote:
> 3. You mentioned about commit 3aa6bce9af0e ("net: watchdog: hold device global xmit lock during tx disable")

FWIW that was just my guess based on the stack trace, Bruno posted the
stacktraces with line numbers decoded here:

https://lore.kernel.org/all/CA+QYu4ob4cbh3Vnh9DWgaPpyw8nTLFG__TbBpBsYg1tWJPxygg@mail.gmail.com/

>     Do you mean issue started surfacing only after this commit ? Driver calls netif_tx_disable() from these two relevant contexts -
> 
>     a. One in ndo_stop() flow 
> 
>       	        /* Close OS Tx */
>         netif_tx_disable(edev->ndev);
>         netif_carrier_off(edev->ndev);
>    
>    b. Other in LINK events handling from the hard IRQ context
> 
>         DP_NOTICE(edev, "Link is down\n");
>         netif_tx_disable(edev->ndev);
>         netif_carrier_off(edev->ndev);


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [EXT] Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0
  2022-08-18 17:55           ` [EXT] " Manish Chopra
  2022-08-18 18:26             ` Jakub Kicinski
@ 2022-08-19  7:36             ` Bruno Goncalves
  2023-02-22 21:34               ` Jakub Kicinski
  1 sibling, 1 reply; 11+ messages in thread
From: Bruno Goncalves @ 2022-08-19  7:36 UTC (permalink / raw)
  To: Manish Chopra
  Cc: Jakub Kicinski, Ariel Elior, LKML, Networking, CKI Project,
	Saurav Kashyap, Javed Hasan, Alok Prasad

On Thu, 18 Aug 2022 at 19:55, Manish Chopra <manishc@marvell.com> wrote:
>
> > -----Original Message-----
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Thursday, August 18, 2022 9:21 PM
> > To: Bruno Goncalves <bgoncalv@redhat.com>; Ariel Elior
> > <aelior@marvell.com>
> > Cc: LKML <linux-kernel@vger.kernel.org>; Networking
> > <netdev@vger.kernel.org>; CKI Project <cki-project@redhat.com>; Saurav
> > Kashyap <skashyap@marvell.com>; Javed Hasan <jhasan@marvell.com>;
> > Manish Chopra <manishc@marvell.com>
> > Subject: [EXT] Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > On Thu, 18 Aug 2022 09:22:17 +0200 Bruno Goncalves wrote:
> > > On Wed, 3 Aug 2022 at 17:37, Jakub Kicinski <kuba@kernel.org> wrote:
> > > >
> > > > On Wed, 3 Aug 2022 14:13:00 +0200 Bruno Goncalves wrote:
> > > > > Got this from the most recent failure (kernel built using commit
> > 0805c6fb39f6):
> > > > >
> > > > > the tarball is
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__s3.amazonaws.
> > > > > com_arr-2Dcki-2Dprod-2Dtrusted-2Dartifacts_trusted-2Dartifacts_603
> > > > > 714145_build-2520x86-5F64-2520debug_2807738987_artifacts_kernel-
> > 2D
> > > > > mainline.kernel.org-2Dredhat-5F603714145-5Fx86-5F64-5Fdebug.tar.gz
> > > > >
> > &d=DwICAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=bMTgx2X48QVXyXOEL8ALyI4d
> > sWoR-
> > > > > m74c5n3d-
> > ruJI8&m=zBBoyokuEgJ25hD586tidMPozXvZjlserj2qf3Iqn2o5V-ds8
> > > > >
> > Jb7IkFIggvHpm4H&s=sjyeF4V5YfoiaDBRrtfGEXdVs3el3AdmvUNVQbteSu4&e=
> > > > > and the call trace from
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__s3.us-2Deast-
> > > > > 2D1.amazonaws.com_arr-2Dcki-2Dprod-2Ddatawarehouse-
> > 2Dpublic_datawa
> > > > > rehouse-2Dpublic_2022_08_02_redhat-3A603123526_build-5Fx86-
> > 5F64-5F
> > > > > redhat-3A603123526-5Fx86-5F64-5Fdebug_tests_1_results-
> > 5F0001_conso
> > > > >
> > le.log_console.log&d=DwICAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=bMTgx2X48
> > QV
> > > > > XyXOEL8ALyI4dsWoR-m74c5n3d-
> > ruJI8&m=zBBoyokuEgJ25hD586tidMPozXvZjls
> > > > > erj2qf3Iqn2o5V-
> > ds8Jb7IkFIggvHpm4H&s=wV1Vq1lhXX02fbTXIWy_NRHxb9LgDz
> > > > > Enst11oy-RTpM&e=
> > > > >
> > > > > [   69.876513] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > > > [   69.888521] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant
> > > > > DL325 Gen10 Plus, BIOS A43 08/09/2021
> > > > > [   69.897971] RIP: 0010:qede_load.cold
> > > > > (/builds/2807738987/workdir/./include/linux/spinlock.h:389
> > > > > /builds/2807738987/workdir/./include/linux/netdevice.h:4294
> > > > > /builds/2807738987/workdir/./include/linux/netdevice.h:4385
> > > > > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_m
> > > > > ain.c:2594
> > > > > /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_m
> > > > > ain.c:2575)
> > > >
> > > > Thanks a lot! That seems to point the finger at commit 3aa6bce9af0e
> > > > ("net: watchdog: hold device global xmit lock during tx disable")
> > > > but frankly IDK why... The driver must be fully initialized to get
> > > > to
> > > > ndo_open() so how is the tx_global_lock busted?!
> > > >
> > > > Would you be able to re-run with CONFIG_KASAN=y ?
> > > > Perhaps KASAN can tell us what's messing up the lock.
> > >
> > > Sorry for taking a long time to provide the info.
> > > Below is the call trace, note it is on a different machine. It might
> > > take me a few days in case I need to try on the original machine.
> >
> > Thanks, looks like KASAN didn't catch anything, it's the same crash :( Let's CC
> > all the Qlogic people, Qlogic PTAL.
> >
> > > [  110.329039] [0000:c1:00.2]:[qedf_link_update:613]:9: LINK DOWN.
> > > [  110.330183] invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI [
> > > 110.340728] CPU: 56 PID: 1810 Comm: NetworkManager Not tainted 5.19.0
> > > #1 [  110.347435] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T,
> > > BIOS
> > > 1.18.0 01/17/2022
> > > [  110.355088] RIP: 0010:qede_load.cold+0x14c/0xa08 [qede] [
> > > 110.360348] Code: c6 60 fb 40 c0 48 c7 c7 40 e1 40 c0 e8 b7 21 28
> > > c8 48 8b 3c 24 e8 fa 06 2d c7 41 0f b7 9f b6 00 00 00 41 89 dc e9 c2
> > > 3c fe ff <0f> 0b 48 c7 c1 60 d0 40 c0 eb c1 49 8d 7f 08 e8 36 09 2d c7
> > > 49 8b
> > > [  110.379101] RSP: 0018:ffff888162ab6e00 EFLAGS: 00010206 [
> > > 110.384338] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > > ffffffffc03ed524 [  110.391479] RDX: 000000000000006b RSI:
> > > 0000000000000007 RDI: ffff88810401a758 [  110.398621] RBP:
> > > ffff8888a20f2cd0 R08: 0000000000000001 R09: ffffffff8bba9e0f [
> > > 110.405761] R10: fffffbfff17753c1 R11: 0000000000000001 R12:
> > > ffff88810401a758 [  110.412895] R13: ffff8888a20f2c08 R14:
> > > ffff8888a20f2cb6 R15: ffff8888a20f2c00 [  110.420036] FS:
> > > 00007fac3a412500(0000) GS:ffff888810d00000(0000)
> > > knlGS:0000000000000000
> > > [  110.428129] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> > > 110.433875] CR2: 00007fac38ffca88 CR3: 0000000123528000 CR4:
> > > 00000000003506e0 [  110.441009] Call Trace:
> > > [  110.443464]  <TASK>
> > > [  110.445585]  ? qed_eth_rxq_start_ramrod+0x320/0x320 [qed] [
> > > 110.451110]  ? qede_alloc_mem_txq+0x240/0x240 [qede] [  110.456106]  ?
> > > lock_release+0x233/0x470 [  110.459958]  ?
> > > rwsem_wake.isra.0+0xf1/0x100 [  110.464163]  ?
> > > lock_chain_count+0x20/0x20 [  110.468179]  ? find_held_lock+0x83/0xa0
> > > [  110.472032]  ? lock_is_held_type+0xe3/0x140 [  110.476245]  ?
> > > lockdep_hardirqs_on_prepare+0x132/0x230
> > > [  110.481397]  ? queue_delayed_work_on+0x57/0x90 [  110.485852]  ?
> > > lockdep_hardirqs_on+0x7d/0x100 [  110.490221]  ?
> > > qed_get_int_fp+0xe0/0xe0 [qed] [  110.494703]  qede_open+0x6d/0x100
> > > [qede] [  110.498664]  __dev_open+0x1c3/0x2c0 [  110.502171]  ?
> > > dev_set_rx_mode+0x60/0x60 [  110.506105]  ?
> > > lockdep_hardirqs_on_prepare+0x132/0x230
> > > [  110.511254]  ? __local_bh_enable_ip+0x8f/0x110 [  110.515711]
> > > __dev_change_flags+0x31b/0x3b0 [  110.519906]  ?
> > > dev_set_allmulti+0x10/0x10 [  110.523935]  dev_change_flags+0x58/0xb0
> > > [  110.527783]  do_setlink+0xb38/0x19e0 [  110.531370]  ?
> > > reacquire_held_locks+0x270/0x270 [  110.535910]  ?
> > > rtnetlink_put_metrics+0x2e0/0x2e0 [  110.540538]  ?
> > > entry_SYSCALL_64+0x1/0x29 [  110.544478]  ?
> > > is_bpf_text_address+0x83/0xf0 [  110.548762]  ?
> > > kernel_text_address+0x125/0x130 [  110.553218]  ?
> > > __kernel_text_address+0xe/0x40 [  110.557585]  ?
> > > unwind_get_return_address+0x33/0x50
> > > [  110.562386]  ? create_prof_cpu_mask+0x20/0x20 [  110.566755]  ?
> > > arch_stack_walk+0xa3/0x100 [  110.570781]  ? memset+0x1f/0x40 [
> > > 110.573939]  ? __nla_validate_parse+0xb4/0x1040 [  110.578481]  ?
> > > stack_trace_save+0x96/0xd0 [  110.582504]  ?
> > > nla_get_range_signed+0x180/0x180 [  110.587042]  ?
> > > __stack_depot_save+0x35/0x4a0 [  110.591335]
> > > __rtnl_newlink+0x715/0xc90 [  110.595182]  ? mark_lock+0xd51/0xd90 [
> > > 110.598773]  ? rtnl_link_unregister+0x1e0/0x1e0 [  110.603309]  ?
> > > _raw_spin_unlock_irqrestore+0x40/0x60
> > > [  110.608285]  ? ___slab_alloc+0x919/0xf80 [  110.612222]  ?
> > > rtnl_newlink+0x36/0x70 [  110.615896]  ?
> > > reacquire_held_locks+0x270/0x270 [  110.620440]  ?
> > > lock_is_held_type+0xe3/0x140 [  110.624634]  ?
> > > rcu_read_lock_sched_held+0x3f/0x80
> > > [  110.629353]  ? trace_kmalloc+0x33/0x100 [  110.633207]
> > > rtnl_newlink+0x4f/0x70 [  110.636704]  rtnetlink_rcv_msg+0x242/0x6b0 [
> > > 110.640815]  ? rtnl_stats_set+0x260/0x260 [  110.644836]  ?
> > > lock_acquire+0x16f/0x410 [  110.648682]  ? lock_acquire+0x17f/0x410 [
> > > 110.652533]  netlink_rcv_skb+0xce/0x200 [  110.656385]  ?
> > > rtnl_stats_set+0x260/0x260 [  110.660408]  ? netlink_ack+0x520/0x520 [
> > > 110.664166]  ? netlink_deliver_tap+0x13c/0x5c0 [  110.668626]  ?
> > > netlink_deliver_tap+0x141/0x5c0 [  110.673083]
> > > netlink_unicast+0x2cb/0x460 [  110.677015]  ?
> > > netlink_attachskb+0x440/0x440 [  110.681294]  ?
> > > __build_skb_around+0x12a/0x150 [  110.685667]
> > > netlink_sendmsg+0x3d2/0x710 [  110.689609]  ?
> > > netlink_unicast+0x460/0x460 [  110.693710]  ?
> > > iovec_from_user.part.0+0x95/0x200 [  110.698348]  ?
> > > netlink_unicast+0x460/0x460 [  110.702456]  sock_sendmsg+0x99/0xa0 [
> > > 110.705963]  ____sys_sendmsg+0x3d4/0x410 [  110.709895]  ?
> > > kernel_sendmsg+0x30/0x30 [  110.713740]  ?
> > > __ia32_sys_recvmmsg+0x160/0x160 [  110.718200]  ?
> > > lockdep_hardirqs_on_prepare+0x230/0x230
> > > [  110.723358]  ___sys_sendmsg+0xe2/0x150 [  110.727124]  ?
> > > sendmsg_copy_msghdr+0x110/0x110 [  110.731576]  ?
> > > find_held_lock+0x83/0xa0 [  110.735425]  ? lock_release+0x233/0x470 [
> > > 110.739271]  ? __fget_files+0x14a/0x200 [  110.743120]  ?
> > > reacquire_held_locks+0x270/0x270 [  110.747674]  ?
> > > __fget_files+0x162/0x200 [  110.751524]  ? __fget_light+0x66/0x100 [
> > > 110.755286]  __sys_sendmsg+0xc3/0x140 [  110.758964]  ?
> > > __sys_sendmsg_sock+0x20/0x20 [  110.763158]  ?
> > > mark_held_locks+0x24/0x90 [  110.767099]  ?
> > > ktime_get_coarse_real_ts64+0x19/0x80
> > > [  110.771990]  ? ktime_get_coarse_real_ts64+0x65/0x80
> > > [  110.776879]  ? syscall_trace_enter.constprop.0+0x16f/0x230
> > > [  110.782375]  do_syscall_64+0x5b/0x80 [  110.785963]
> > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > [  110.791021] RIP: 0033:0x7fac3b54f71d [  110.794609] Code: 28 89 54
> > > 24 1c 48 89 74 24 10 89 7c 24 08 e8 ea
> > > c4 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00
> > > 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 3e c5 f4
> > > ff 48 [  110.813362] RSP: 002b:00007ffd3b5c7da0 EFLAGS: 00000293
> > > ORIG_RAX:
> > > 000000000000002e
> > > [  110.820938] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> > > 00007fac3b54f71d [  110.828081] RDX: 0000000000000000 RSI:
> > > 00007ffd3b5c7de0 RDI: 000000000000000d [  110.835221] RBP:
> > > 0000563d7ac60090 R08: 0000000000000000 R09: 0000000000000000 [
> > > 110.842361] R10: 0000000000000000 R11: 0000000000000293 R12:
> > > 00007ffd3b5c7f4c [  110.849494] R13: 00007ffd3b5c7f50 R14:
> > > 0000000000000000 R15: 00007ffd3b5c7f58 [  110.856639]  </TASK> [
> > > 110.858837] Modules linked in: pcc_cpufreq(-) rfkill intel_rapl_msr
> > > dcdbas intel_rapl_common amd64_edac edac_mce_amd rapl pcspkr qedi
> > > mgag200 i2c_algo_bit iscsi_boot_sysfs libiscsi drm_shmem_helper
> > > cdc_ether scsi_transport_iscsi usbnet drm_kms_helper mii uio ipmi_ssif
> > > k10temp i2c_piix4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler
> > > acpi_power_meter acpi_cpufreq vfat fat drm fuse xfs qedf qede qed
> > > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel libfcoe
> > > libfc scsi_transport_fc crc8 ccp tg3 sp5100_tco [  110.904398] ---[
> > > end trace 0000000000000000 ]--- [  110.909039] RIP:
> > > 0010:qede_load.cold+0x14c/0xa08 [qede] [  110.914306] Code: c6 60 fb
> > > 40 c0 48 c7 c7 40 e1 40 c0 e8 b7 21 28
> > > c8 48 8b 3c 24 e8 fa 06 2d c7 41 0f b7 9f b6 00 00 00 41 89 dc e9 c2
> > > 3c fe ff <0f> 0b 48 c7 c1 60 d0 40 c0 eb c1 49 8d 7f 08 e8 36 09 2d c7
> > > 49 8b
> > > [  110.933068] RSP: 0018:ffff888162ab6e00 EFLAGS: 00010206 [
> > > 110.938314] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > > ffffffffc03ed524 [  110.945466] RDX: 000000000000006b RSI:
> > > 0000000000000007 RDI: ffff88810401a758 [  110.952616] RBP:
> > > ffff8888a20f2cd0 R08: 0000000000000001 R09: ffffffff8bba9e0f [
> > > 110.959772] R10: fffffbfff17753c1 R11: 0000000000000001 R12:
> > > ffff88810401a758 [  110.966925] R13: ffff8888a20f2c08 R14:
> > > ffff8888a20f2cb6 R15: ffff8888a20f2c00 [  110.974092] FS:
> > > 00007fac3a412500(0000) GS:ffff888810d00000(0000)
> > > knlGS:0000000000000000
> > > [  110.982198] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> > > 110.987971] CR2: 00007fac38ffca88 CR3: 0000000123528000 CR4:
> > > 00000000003506e0 [  110.995131] Kernel panic - not syncing: Fatal
> > > exception [  111.001311] Kernel Offset: 0x6000000 from
> > > 0xffffffff81000000 (relocation range:
> > > 0xffffffff80000000-0xffffffffbfffffff)
> > > [  111.012016] ---[ end Kernel panic - not syncing: Fatal exception
> > > ]---
> > >
> > > kernel tarball:
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__s3.amazonaws.com_
> > > arr-2Dcki-2Dprod-2Dtrusted-2Dartifacts_trusted-2Dartifacts_604654489_p
> > > ublish-2520x86-5F64-2520debug_2813007034_artifacts_kernel-
> > 2Dmainline.k
> > > ernel.org-2Dredhat-5F604654489-5Fx86-5F64-
> > 5Fdebug.tar.gz&d=DwICAg&c=nK
> > > jWec2b6R0mOyPaz7xtfQ&r=bMTgx2X48QVXyXOEL8ALyI4dsWoR-
> > m74c5n3d-ruJI8&m=z
> > > BBoyokuEgJ25hD586tidMPozXvZjlserj2qf3Iqn2o5V-
> > ds8Jb7IkFIggvHpm4H&s=WXbt
> > > GecipcXSY_rwTu6JrCEI7VFKToDZ3UfZ4ciloWk&e=
> > > kernel config:
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__s3.amazonaws.com_
> > > arr-2Dcki-2Dprod-2Dtrusted-2Dartifacts_trusted-2Dartifacts_604654489_b
> > > uild-2520x86-5F64-2520debug_2813006987_artifacts_kernel-
> > 2Dmainline.ker
> > > nel.org-2Dredhat-5F604654489-5Fx86-5F64-
> > 5Fdebug.config&d=DwICAg&c=nKjW
> > > ec2b6R0mOyPaz7xtfQ&r=bMTgx2X48QVXyXOEL8ALyI4dsWoR-m74c5n3d-
> > ruJI8&m=zBB
> > > oyokuEgJ25hD586tidMPozXvZjlserj2qf3Iqn2o5V-
> > ds8Jb7IkFIggvHpm4H&s=edaLwi
> > > kEZyvLAk8hrsZNE-Esjsn9HZ5luaW_FARAlCw&e=
> > >
> > >
> > > Bruno
>
> Hi Bruno,
>
> 1. How do you reproduce this issue exactly ? Any specific instructions or any special kernel CONFIG with which issue reproduces ?

We hit the panic by booting up the machine with a kernel 5.19.0 with
debug flags enabled.

kernel tarball:
https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/604654489/publish%20x86_64%20debug/2813007034/artifacts/kernel-mainline.kernel.org-redhat_604654489_x86_64_debug.tar.gz
kernel config is:
https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/604654489/build%20x86_64%20debug/2813006987/artifacts/kernel-mainline.kernel.org-redhat_604654489_x86_64_debug.config

The machine has FastLinQ QL41000 Series 10/25/40/50GbE Controller (mbi
8.52.21 [mfw 8.52.9.0])

> 2. Is there any Bugzilla opened for this already ? Can you please provide the complete crash logs ? (vmcore-dmesg.txt ?)
No, there is no bugzilla, I haven't seen this problem on rhel-9 kernel
(5.14). I don't have a vmcore, but I'll try to get one.

Below is a link to console log from a CKI pipeline execution, it is
not from the same run above as above I ran manually.

https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/06/20/568171088/redhat:568171088_x86_64_debug/tests/12172126_x86_64_5_console.log

For this console.log the kernel config is
http://s3.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/06/20/568171088/redhat:568171088/redhat:568171088_x86_64_debug/.config
and the tarball is
http://s3.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/06/20/568171088/redhat:568171088/redhat:568171088_x86_64_debug/kernel-mainline.kernel.org-x86_64-78ca55889a549a9a194c6ec666836329b774ab6d.tar.gz

Bruno

> 3. You mentioned about commit 3aa6bce9af0e ("net: watchdog: hold device global xmit lock during tx disable")
>     Do you mean issue started surfacing only after this commit ? Driver calls netif_tx_disable() from these two relevant contexts -
>
>     a. One in ndo_stop() flow
>
>                 /* Close OS Tx */
>         netif_tx_disable(edev->ndev);
>         netif_carrier_off(edev->ndev);
>
>    b. Other in LINK events handling from the hard IRQ context
>
>         DP_NOTICE(edev, "Link is down\n");
>         netif_tx_disable(edev->ndev);
>         netif_carrier_off(edev->ndev);
>
> Thanks,
> Manish
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [EXT] Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0
  2022-08-19  7:36             ` Bruno Goncalves
@ 2023-02-22 21:34               ` Jakub Kicinski
  2023-02-23 15:16                 ` Bruno Goncalves
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2023-02-22 21:34 UTC (permalink / raw)
  To: Bruno Goncalves
  Cc: Manish Chopra, Ariel Elior, LKML, Networking, CKI Project,
	Saurav Kashyap, Javed Hasan, Alok Prasad

On Fri, 19 Aug 2022 09:36:54 +0200 Bruno Goncalves wrote:
> We hit the panic by booting up the machine with a kernel 5.19.0 with
> debug flags enabled.

Hi Bruno,

Was this ever fixed?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [EXT] Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0
  2023-02-22 21:34               ` Jakub Kicinski
@ 2023-02-23 15:16                 ` Bruno Goncalves
  0 siblings, 0 replies; 11+ messages in thread
From: Bruno Goncalves @ 2023-02-23 15:16 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Manish Chopra, Ariel Elior, LKML, Networking, CKI Project,
	Saurav Kashyap, Javed Hasan, Alok Prasad

On Wed, 22 Feb 2023 at 22:34, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 19 Aug 2022 09:36:54 +0200 Bruno Goncalves wrote:
> > We hit the panic by booting up the machine with a kernel 5.19.0 with
> > debug flags enabled.
>
> Hi Bruno,
>
> Was this ever fixed?

It looks like it got fixed, I haven't seen this failure on 6.2.0 kernels.

Bruno


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-02-23 15:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-02 11:27 RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0 Bruno Goncalves
2022-08-02 19:23 ` Jakub Kicinski
2022-08-03 12:13   ` Bruno Goncalves
2022-08-03 15:37     ` Jakub Kicinski
2022-08-18  7:22       ` Bruno Goncalves
2022-08-18 15:51         ` Jakub Kicinski
2022-08-18 17:55           ` [EXT] " Manish Chopra
2022-08-18 18:26             ` Jakub Kicinski
2022-08-19  7:36             ` Bruno Goncalves
2023-02-22 21:34               ` Jakub Kicinski
2023-02-23 15:16                 ` Bruno Goncalves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).