* NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian) @ 2020-02-23 18:29 Ondrej Zary 2020-02-23 19:26 ` Bart Van Assche 0 siblings, 1 reply; 9+ messages in thread From: Ondrej Zary @ 2020-02-23 18:29 UTC (permalink / raw) To: qla2xxx-upstream, linux-scsi, linux-kernel Hello, a couple of days after upgrading a server from Debian 9 (kernel 4.9.210-1) to 10 (kernel 4.19.98), qla2xxx crashed, along with mysql. There is an EMC CX3 array connected through the fibre-channel adapter. No errors are present in EMC event log. This server was running without any problems since Debian 4. Is this a known bug? [979178.888922] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 [979178.889160] PGD 0 P4D 0 [979178.889243] Oops: 0002 [#1] SMP PTI [979178.889362] CPU: 6 PID: 11060 Comm: kworker/u16:2 Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1 [979178.889617] Hardware name: Dell Inc. PowerEdge 2950/0JR815, BIOS 2.7.0 10/30/2010 [979178.889855] Workqueue: scsi_tmf_4 scmd_eh_abort_handler [scsi_mod] [979178.890069] RIP: 0010:qla24xx_async_abort_cmd+0x1b/0x250 [qla2xxx] [979178.890258] Code: e9 19 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 41 55 41 54 55 53 4c 8b 6f 28 4c 8b 7f 20 4c 8b 77 48 <f0> 41 ff 46 04 0f ae f0 41 f6 46 24 04 74 17 f0 41 ff 4e 04 bd 02 [979178.890801] RSP: 0018:ffffb1250ba83da8 EFLAGS: 00010293 [979178.890966] RAX: 0000000000000800 RBX: ffff93b89db837a8 RCX: 00000000000005f4 [979178.891178] RDX: ffff93b89e28afa8 RSI: 0000000000000001 RDI: ffff93b8a5018fc0 [979178.891389] RBP: ffff93b89ccb89c0 R08: ffffffffc0595860 R09: 0000000000000000 [979178.891600] R10: 8080808080808080 R11: 0000000000000010 R12: ffff93b89db82000 [979178.891811] R13: ffff93b89db837a8 R14: 0000000000000000 R15: ffff93b89d88a800 [979178.892023] FS: 0000000000000000(0000) GS:ffff93b8a7b80000(0000) knlGS:0000000000000000 [979178.892258] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [979178.892430] CR2: 0000000000000004 CR3: 000000021a62a000 CR4: 00000000000006e0 [979178.892642] Call Trace: [979178.892748] qla24xx_abort_command+0x218/0x2d0 [qla2xxx] [979178.892911] ? __switch_to_asm+0x41/0x70 [979178.893031] ? __switch_to_asm+0x35/0x70 [979178.893160] qla2xxx_eh_abort+0x117/0x310 [qla2xxx] [979178.893323] scmd_eh_abort_handler+0x85/0x220 [scsi_mod] [979178.893484] process_one_work+0x1a7/0x3a0 [979178.893611] worker_thread+0x30/0x390 [979178.893727] ? create_worker+0x1a0/0x1a0 [979178.893847] kthread+0x112/0x130 [979178.893948] ? kthread_bind+0x30/0x30 [979178.894064] ret_from_fork+0x35/0x40 [979178.894174] Modules linked in: loop ipmi_ssif radeon ttm drm_kms_helper drm coretemp i2c_algo_bit iTCO_wdt iTCO_vendor_support ipmi_si joydev kvm sg evdev i5000_edac ipmi_devintf pcc_cpufreq ipmi_msghandler rng_core i5k_amb irqbypass dcdbas serio_raw acpi_cpufreq button pcspkr ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb crypto _simd cryptd glue_helper aes_x86_64 dm_service_time dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua uas usb_storage hid_generic usbhid hid sr_mod ses cdrom encl osure sd_mod scsi_transport_sas ata_generic qla2xxx uhci_hcd ehci_pci ehci_hcd psmouse ata_piix nvme_fc libata nvme_fabrics usbcore nvme_core megaraid_sas scsi_transport_ fc scsi_mod lpc_ich mfd_core usb_common bnx2 [979178.895968] CR2: 0000000000000004 [979178.896075] ---[ end trace 4d42692cc0dc3c87 ]--- [979178.896225] RIP: 0010:qla24xx_async_abort_cmd+0x1b/0x250 [qla2xxx] [979178.896414] Code: e9 19 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 41 55 41 54 55 53 4c 8b 6f 28 4c 8b 7f 20 4c 8b 77 48 <f0> 41 ff 46 04 0f ae f0 41 f6 46 24 04 74 17 f0 41 ff 4e 04 bd 02 [979178.896956] RSP: 0018:ffffb1250ba83da8 EFLAGS: 00010293 [979178.897121] RAX: 0000000000000800 RBX: ffff93b89db837a8 RCX: 00000000000005f4 [979178.897332] RDX: ffff93b89e28afa8 RSI: 0000000000000001 RDI: ffff93b8a5018fc0 [979178.897544] RBP: ffff93b89ccb89c0 R08: ffffffffc0595860 R09: 0000000000000000 [979178.908415] R10: 8080808080808080 R11: 0000000000000010 R12: ffff93b89db82000 [979178.919419] R13: ffff93b89db837a8 R14: 0000000000000000 R15: ffff93b89d88a800 [979178.930444] FS: 0000000000000000(0000) GS:ffff93b8a7b80000(0000) knlGS:0000000000000000 [979178.941366] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [979178.952142] CR2: 0000000000000004 CR3: 000000021a62a000 CR4: 00000000000006e0 [980103.072740] mysqld[2175]: segfault at 0 ip 000055bbc5cd2d93 sp 00007f2362ffb450 error 6 in mysqld[55bbc551a000+805000] [980103.083956] Code: c7 45 00 00 00 00 00 8b 7d cc 4c 89 e2 4c 89 f6 e8 62 81 84 ff 49 89 c7 49 39 c4 0f 84 f6 00 00 00 e8 e1 1c 00 00 41 8b 4d 00 <89> 08 85 c9 74 37 49 83 ff ff 0f 84 9d 00 00 00 f6 c3 06 75 28 4d -- Ondrej Zary ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian) 2020-02-23 18:29 NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian) Ondrej Zary @ 2020-02-23 19:26 ` Bart Van Assche 2020-02-23 19:57 ` Ondrej Zary 0 siblings, 1 reply; 9+ messages in thread From: Bart Van Assche @ 2020-02-23 19:26 UTC (permalink / raw) To: Ondrej Zary, qla2xxx-upstream, linux-scsi, linux-kernel On 2020-02-23 10:29, Ondrej Zary wrote: > a couple of days after upgrading a server from Debian 9 (kernel 4.9.210-1) > to 10 (kernel 4.19.98), qla2xxx crashed, along with mysql. > > There is an EMC CX3 array connected through the fibre-channel adapter. > No errors are present in EMC event log. > > This server was running without any problems since Debian 4. > Is this a known bug? Please report issues encountered with Debian kernels in the Debian bug tracker. If you want the upstream community to assist please retest with an upstream kernel. Thanks, Bart. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian) 2020-02-23 19:26 ` Bart Van Assche @ 2020-02-23 19:57 ` Ondrej Zary 2020-02-24 2:17 ` Bart Van Assche 0 siblings, 1 reply; 9+ messages in thread From: Ondrej Zary @ 2020-02-23 19:57 UTC (permalink / raw) To: Bart Van Assche; +Cc: qla2xxx-upstream, linux-scsi, linux-kernel On Sunday 23 February 2020 20:26:39 Bart Van Assche wrote: > On 2020-02-23 10:29, Ondrej Zary wrote: > > a couple of days after upgrading a server from Debian 9 (kernel 4.9.210-1) > > to 10 (kernel 4.19.98), qla2xxx crashed, along with mysql. > > > > There is an EMC CX3 array connected through the fibre-channel adapter. > > No errors are present in EMC event log. > > > > This server was running without any problems since Debian 4. > > Is this a known bug? > > Please report issues encountered with Debian kernels in the Debian bug > tracker. If you want the upstream community to assist please retest with > an upstream kernel. Debian kernel does not have any patches related to qla2xxx driver: https://salsa.debian.org/kernel-team/linux/raw/debian/4.19.98-1/debian/patches/series It crashed after running for 11 days. Not a quick&easy test. -- Ondrej Zary ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian) 2020-02-23 19:57 ` Ondrej Zary @ 2020-02-24 2:17 ` Bart Van Assche 2020-02-24 8:20 ` Ondrej Zary 0 siblings, 1 reply; 9+ messages in thread From: Bart Van Assche @ 2020-02-24 2:17 UTC (permalink / raw) To: Ondrej Zary; +Cc: qla2xxx-upstream, linux-scsi, linux-kernel On 2020-02-23 11:57, Ondrej Zary wrote: > On Sunday 23 February 2020 20:26:39 Bart Van Assche wrote: >> On 2020-02-23 10:29, Ondrej Zary wrote: >>> a couple of days after upgrading a server from Debian 9 (kernel 4.9.210-1) >>> to 10 (kernel 4.19.98), qla2xxx crashed, along with mysql. >>> >>> There is an EMC CX3 array connected through the fibre-channel adapter. >>> No errors are present in EMC event log. >>> >>> This server was running without any problems since Debian 4. >>> Is this a known bug? >> >> Please report issues encountered with Debian kernels in the Debian bug >> tracker. If you want the upstream community to assist please retest with >> an upstream kernel. > > Debian kernel does not have any patches related to qla2xxx driver: > https://salsa.debian.org/kernel-team/linux/raw/debian/4.19.98-1/debian/patches/series > > It crashed after running for 11 days. Not a quick&easy test. It would help a lot if the crash address would be translated into a source code line number. Something like the following commands should do the trick: $ gdb drivers/scsi/qla2xxx/qla2xxx.ko (gdb) list *(qla24xx_async_abort_cmd+0x1b) Thanks, Bart. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian) 2020-02-24 2:17 ` Bart Van Assche @ 2020-02-24 8:20 ` Ondrej Zary 2020-02-25 3:41 ` Bart Van Assche 0 siblings, 1 reply; 9+ messages in thread From: Ondrej Zary @ 2020-02-24 8:20 UTC (permalink / raw) To: Bart Van Assche; +Cc: qla2xxx-upstream, linux-scsi, linux-kernel On Monday 24 February 2020, Bart Van Assche wrote: > On 2020-02-23 11:57, Ondrej Zary wrote: > > On Sunday 23 February 2020 20:26:39 Bart Van Assche wrote: > >> On 2020-02-23 10:29, Ondrej Zary wrote: > >>> a couple of days after upgrading a server from Debian 9 (kernel > >>> 4.9.210-1) to 10 (kernel 4.19.98), qla2xxx crashed, along with mysql. > >>> > >>> There is an EMC CX3 array connected through the fibre-channel adapter. > >>> No errors are present in EMC event log. > >>> > >>> This server was running without any problems since Debian 4. > >>> Is this a known bug? > >> > >> Please report issues encountered with Debian kernels in the Debian bug > >> tracker. If you want the upstream community to assist please retest with > >> an upstream kernel. > > > > Debian kernel does not have any patches related to qla2xxx driver: > > https://salsa.debian.org/kernel-team/linux/raw/debian/4.19.98-1/debian/pa > >tches/series > > > > It crashed after running for 11 days. Not a quick&easy test. > > It would help a lot if the crash address would be translated into a > source code line number. Something like the following commands should do > the trick: > $ gdb drivers/scsi/qla2xxx/qla2xxx.ko > (gdb) list *(qla24xx_async_abort_cmd+0x1b) Looks like it's in some inlined function. /usr/src/linux-source-4.19# gdb /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko GNU gdb (Debian 8.2.1-2+b3) 8.2.1 ... Reading symbols from /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...Reading symbols from /usr/lib/debug//lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...done. done. (gdb) list *(qla24xx_async_abort_cmd+0x1b) 0xf88b is in qla24xx_async_abort_cmd (./arch/x86/include/asm/atomic.h:97). 92 * 93 * Atomically increments @v by 1. 94 */ 95 static __always_inline void arch_atomic_inc(atomic_t *v) 96 { 97 asm volatile(LOCK_PREFIX "incl %0" 98 : "+m" (v->counter) :: "memory"); 99 } 100 #define arch_atomic_inc arch_atomic_inc 101 (gdb) list *(qla24xx_abort_command+0x218) 0x22238 is in qla24xx_abort_command (./drivers/scsi/qla2xxx/qla_mbx.c:3084). 3079 3080 if (vha->flags.qpairs_available && sp->qpair) 3081 req = sp->qpair->req; 3082 3083 if (ql2xasynctmfenable) 3084 return qla24xx_async_abort_command(sp); 3085 3086 spin_lock_irqsave(&ha->hardware_lock, flags); 3087 for (handle = 1; handle < req->num_outstanding_cmds; handle++) { 3088 if (req->outstanding_cmds[handle] == sp) (gdb) list *(qla2xxx_eh_abort+0x117) 0x15e7 is in qla2xxx_eh_abort (./drivers/scsi/qla2xxx/qla_os.c:1314). 1309 /* Get a reference to the sp and drop the lock.*/ 1310 sp_get(sp); 1311 1312 spin_unlock_irqrestore(&ha->hardware_lock, flags); 1313 rval = ha->isp_ops->abort_command(sp); 1314 if (rval) { 1315 if (rval == QLA_FUNCTION_PARAMETER_ERROR) 1316 ret = SUCCESS; 1317 else 1318 ret = FAILED; (gdb) disassemble qla24xx_async_abort_cmd Dump of assembler code for function qla24xx_async_abort_cmd: 0x000000000000f870 <+0>: callq 0xf875 <qla24xx_async_abort_cmd+5> 0x000000000000f875 <+5>: push %r15 0x000000000000f877 <+7>: push %r14 0x000000000000f879 <+9>: push %r13 0x000000000000f87b <+11>: push %r12 0x000000000000f87d <+13>: push %rbp 0x000000000000f87e <+14>: push %rbx 0x000000000000f87f <+15>: mov 0x28(%rdi),%r13 0x000000000000f883 <+19>: mov 0x20(%rdi),%r15 0x000000000000f887 <+23>: mov 0x48(%rdi),%r14 0x000000000000f88b <+27>: lock incl 0x4(%r14) 0x000000000000f890 <+32>: mfence 0x000000000000f893 <+35>: testb $0x4,0x24(%r14) 0x000000000000f898 <+40>: je 0xf8b1 <qla24xx_async_abort_cmd+65> 0x000000000000f89a <+42>: lock decl 0x4(%r14) 0x000000000000f89f <+47>: mov $0x102,%ebp 0x000000000000f8a4 <+52>: pop %rbx 0x000000000000f8a5 <+53>: mov %ebp,%eax 0x000000000000f8a7 <+55>: pop %rbp 0x000000000000f8a8 <+56>: pop %r12 0x000000000000f8aa <+58>: pop %r13 0x000000000000f8ac <+60>: pop %r14 0x000000000000f8ae <+62>: pop %r15 0x000000000000f8b0 <+64>: retq 0x000000000000f8b1 <+65>: mov %rdi,%rbp 0x000000000000f8b4 <+68>: mov 0x30(%r14),%rdi 0x000000000000f8b8 <+72>: mov %esi,%r12d 0x000000000000f8bb <+75>: mov $0x6000c0,%esi 0x000000000000f8c0 <+80>: callq 0xf8c5 <qla24xx_async_abort_cmd+85> 0x000000000000f8c5 <+85>: mov %rax,%rbx 0x000000000000f8c8 <+88>: test %rax,%rax 0x000000000000f8cb <+91>: je 0xf89a <qla24xx_async_abort_cmd+42> 0x000000000000f8cd <+93>: lea 0x8(%rax),%rdi 0x000000000000f8d1 <+97>: mov %rax,%rcx 0x000000000000f8d4 <+100>: movq $0x0,(%rax) 0x000000000000f8db <+107>: mov $0xc,%edx 0x000000000000f8e0 <+112>: movq $0x0,0x180(%rax) 0x000000000000f8eb <+123>: and $0xfffffffffffffff8,%rdi 0x000000000000f8ef <+127>: xor %eax,%eax 0x000000000000f8f1 <+129>: sub %rdi,%rcx 0x000000000000f8f4 <+132>: add $0x188,%ecx 0x000000000000f8fa <+138>: shr $0x3,%ecx 0x000000000000f8fd <+141>: rep stos %rax,%es:(%rdi) 0x000000000000f900 <+144>: mov %r15,0x20(%rbx) 0x000000000000f904 <+148>: movl $0x1,0x40(%rbx) 0x000000000000f90b <+155>: mov 0x18(%r14),%rax 0x000000000000f90f <+159>: mov %dx,0x36(%rbx) 0x000000000000f913 <+163>: movq $0x0,0x38(%rbx) 0x000000000000f91b <+171>: mov %rax,0x28(%rbx) 0x000000000000f91f <+175>: lea 0x50(%rbx),%rax 0x000000000000f923 <+179>: mov %rax,0x50(%rbx) 0x000000000000f927 <+183>: mov %rax,0x58(%rbx) 0x000000000000f92b <+187>: mov 0x48(%rbp),%rax 0x000000000000f92f <+191>: mov %rax,0x48(%rbx) 0x000000000000f933 <+195>: test %r12b,%r12b 0x000000000000f936 <+198>: je 0xf941 <qla24xx_async_abort_cmd+209> 0x000000000000f938 <+200>: mov $0x40,%eax 0x000000000000f93d <+205>: mov %ax,0x34(%rbx) 0x000000000000f941 <+209>: lea 0xa0(%rbx),%rdi 0x000000000000f948 <+216>: mov $0x0,%rdx 0x000000000000f94f <+223>: mov $0x0,%rsi 0x000000000000f956 <+230>: movq $0x0,0x170(%rbx) 0x000000000000f961 <+241>: lea 0x148(%rbx),%r14 0x000000000000f968 <+248>: movl $0x0,0x98(%rbx) 0x000000000000f972 <+258>: callq 0xf977 <qla24xx_async_abort_cmd+263> 0x000000000000f977 <+263>: xor %r8d,%r8d 0x000000000000f97a <+266>: xor %ecx,%ecx 0x000000000000f97c <+268>: xor %edx,%edx 0x000000000000f97e <+270>: mov $0x0,%rsi 0x000000000000f985 <+277>: mov %r14,%rdi 0x000000000000f988 <+280>: callq 0xf98d <qla24xx_async_abort_cmd+285> 0x000000000000f98d <+285>: mov 0x0(%rip),%rax # 0xf994 <qla24xx_async_abort_cmd+292> 0x000000000000f994 <+292>: lea 0x78(%rbx),%rdi 0x000000000000f998 <+296>: mov $0x0,%rdx 0x000000000000f99f <+303>: mov $0x0,%rsi 0x000000000000f9a6 <+310>: movl $0x0,0x70(%rbx) 0x000000000000f9ad <+317>: add $0x2904,%rax 0x000000000000f9b3 <+323>: movq $0x0,0x180(%rbx) 0x000000000000f9be <+334>: mov %rax,0x158(%rbx) 0x000000000000f9c5 <+341>: callq 0xf9ca <qla24xx_async_abort_cmd+346> 0x000000000000f9ca <+346>: mov 0x28(%rbx),%rax 0x000000000000f9ce <+350>: mov 0x448(%rax),%rax 0x000000000000f9d5 <+357>: testb $0x2,0x15a(%rax) 0x000000000000f9dc <+364>: jne 0xfa80 <qla24xx_async_abort_cmd+528> 0x000000000000f9e2 <+370>: mov %r14,%rdi 0x000000000000f9e5 <+373>: callq 0xf9ea <qla24xx_async_abort_cmd+378> 0x000000000000f9ea <+378>: mov 0x30(%rbp),%r8d 0x000000000000f9ee <+382>: mov 0x48(%rbp),%rax 0x000000000000f9f2 <+386>: mov %r13,%rsi 0x000000000000f9f5 <+389>: movzwl 0x36(%rbp),%r9d 0x000000000000f9fa <+394>: mov $0x507c,%edx 0x000000000000f9ff <+399>: mov $0x2000000,%edi 0x000000000000fa04 <+404>: mov $0x0,%rcx 0x000000000000fa0b <+411>: mov %r8d,0x90(%rbx) 0x000000000000fa12 <+418>: mov 0x48(%rax),%rax 0x000000000000fa16 <+422>: movzwl 0x40(%rax),%eax 0x000000000000fa1a <+426>: movq $0x0,0x178(%rbx) 0x000000000000fa25 <+437>: mov %ax,0x96(%rbx) 0x000000000000fa2c <+444>: callq 0xfa31 <qla24xx_async_abort_cmd+449> 0x000000000000fa31 <+449>: mov %rbx,%rdi 0x000000000000fa34 <+452>: callq 0xfa39 <qla24xx_async_abort_cmd+457> 0x000000000000fa39 <+457>: mov %eax,%ebp 0x000000000000fa3b <+459>: test %eax,%eax 0x000000000000fa3d <+461>: jne 0xfa64 <qla24xx_async_abort_cmd+500> 0x000000000000fa3f <+463>: test %r12b,%r12b 0x000000000000fa42 <+466>: je 0xf8a4 <qla24xx_async_abort_cmd+52> 0x000000000000fa48 <+472>: lea 0x98(%rbx),%rdi 0x000000000000fa4f <+479>: callq 0xfa54 <qla24xx_async_abort_cmd+484> 0x000000000000fa54 <+484>: cmpw $0x0,0x94(%rbx) 0x000000000000fa5c <+492>: mov $0x102,%eax 0x000000000000fa61 <+497>: cmovne %eax,%ebp 0x000000000000fa64 <+500>: mov 0x180(%rbx),%rax 0x000000000000fa6b <+507>: mov %rbx,%rdi 0x000000000000fa6e <+510>: callq 0xfa73 <qla24xx_async_abort_cmd+515> 0x000000000000fa73 <+515>: mov %ebp,%eax 0x000000000000fa75 <+517>: pop %rbx 0x000000000000fa76 <+518>: pop %rbp 0x000000000000fa77 <+519>: pop %r12 0x000000000000fa79 <+521>: pop %r13 0x000000000000fa7b <+523>: pop %r14 0x000000000000fa7d <+525>: pop %r15 0x000000000000fa7f <+527>: retq 0x000000000000fa80 <+528>: cmpw $0xa,0x36(%rbx) 0x000000000000fa85 <+533>: jne 0xf9e2 <qla24xx_async_abort_cmd+370> 0x000000000000fa8b <+539>: lea 0xe8(%rbx),%rdi 0x000000000000fa92 <+546>: mov $0x0,%rdx 0x000000000000fa99 <+553>: mov $0x0,%rsi 0x000000000000faa0 <+560>: movl $0x0,0xe0(%rbx) 0x000000000000faaa <+570>: callq 0xfaaf <qla24xx_async_abort_cmd+575> 0x000000000000faaf <+575>: jmpq 0xf9e2 <qla24xx_async_abort_cmd+370> End of assembler dump. -- Ondrej Zary ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian) 2020-02-24 8:20 ` Ondrej Zary @ 2020-02-25 3:41 ` Bart Van Assche 2020-02-27 17:09 ` Ondrej Zary 0 siblings, 1 reply; 9+ messages in thread From: Bart Van Assche @ 2020-02-25 3:41 UTC (permalink / raw) To: Ondrej Zary; +Cc: qla2xxx-upstream, linux-scsi, linux-kernel On 2020-02-24 00:20, Ondrej Zary wrote: > Looks like it's in some inlined function. > > /usr/src/linux-source-4.19# gdb /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko > GNU gdb (Debian 8.2.1-2+b3) 8.2.1 > ... > Reading symbols from /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...Reading symbols > from /usr/lib/debug//lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...done. > done. > > (gdb) list *(qla24xx_async_abort_cmd+0x1b) > 0xf88b is in qla24xx_async_abort_cmd (./arch/x86/include/asm/atomic.h:97). > 92 * > 93 * Atomically increments @v by 1. > 94 */ > 95 static __always_inline void arch_atomic_inc(atomic_t *v) > 96 { > 97 asm volatile(LOCK_PREFIX "incl %0" > 98 : "+m" (v->counter) :: "memory"); > 99 } > 100 #define arch_atomic_inc arch_atomic_inc > > [ ... ] > > (gdb) disassemble qla24xx_async_abort_cmd > Dump of assembler code for function qla24xx_async_abort_cmd: > 0x000000000000f870 <+0>: callq 0xf875 <qla24xx_async_abort_cmd+5> > 0x000000000000f875 <+5>: push %r15 > 0x000000000000f877 <+7>: push %r14 > 0x000000000000f879 <+9>: push %r13 > 0x000000000000f87b <+11>: push %r12 > 0x000000000000f87d <+13>: push %rbp > 0x000000000000f87e <+14>: push %rbx > 0x000000000000f87f <+15>: mov 0x28(%rdi),%r13 > 0x000000000000f883 <+19>: mov 0x20(%rdi),%r15 > 0x000000000000f887 <+23>: mov 0x48(%rdi),%r14 > 0x000000000000f88b <+27>: lock incl 0x4(%r14) > 0x000000000000f890 <+32>: mfence Thanks, this is very helpful. I think the above means that the crash is triggered by the following code: sp = qla2xxx_get_qpair_sp(cmd_sp->qpair, cmd_sp->fcport, GFP_KERNEL); From the start of qla2xxx_get_qpair_sp(): QLA_QPAIR_MARK_BUSY(qpair, bail); From qla_def.h: #define QLA_QPAIR_MARK_BUSY(__qpair, __bail) do { \ atomic_inc(&__qpair->ref_count); \ mb(); \ if (__qpair->delete_in_progress) { \ atomic_dec(&__qpair->ref_count); \ __bail = 1; \ } else { \ __bail = 0; \ } \ } while (0) One of the changes between kernel version v4.9.210 and v4.19.98 is the following: "qla2xxx: Add multiple queue pair functionality". I think the above information means that the cmd_sp->qpair pointer is NULL. I will let QLogic recommend a solution. Bart. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian) 2020-02-25 3:41 ` Bart Van Assche @ 2020-02-27 17:09 ` Ondrej Zary 2020-03-02 22:26 ` Ondrej Zary 0 siblings, 1 reply; 9+ messages in thread From: Ondrej Zary @ 2020-02-27 17:09 UTC (permalink / raw) To: Bart Van Assche; +Cc: qla2xxx-upstream, linux-scsi, linux-kernel On Tuesday 25 February 2020 04:41:48 Bart Van Assche wrote: > On 2020-02-24 00:20, Ondrej Zary wrote: > > Looks like it's in some inlined function. > > > > /usr/src/linux-source-4.19# gdb /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko > > GNU gdb (Debian 8.2.1-2+b3) 8.2.1 > > ... > > Reading symbols from /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...Reading symbols > > from /usr/lib/debug//lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...done. > > done. > > > > (gdb) list *(qla24xx_async_abort_cmd+0x1b) > > 0xf88b is in qla24xx_async_abort_cmd (./arch/x86/include/asm/atomic.h:97). > > 92 * > > 93 * Atomically increments @v by 1. > > 94 */ > > 95 static __always_inline void arch_atomic_inc(atomic_t *v) > > 96 { > > 97 asm volatile(LOCK_PREFIX "incl %0" > > 98 : "+m" (v->counter) :: "memory"); > > 99 } > > 100 #define arch_atomic_inc arch_atomic_inc > > > > [ ... ] > > > > (gdb) disassemble qla24xx_async_abort_cmd > > Dump of assembler code for function qla24xx_async_abort_cmd: > > 0x000000000000f870 <+0>: callq 0xf875 <qla24xx_async_abort_cmd+5> > > 0x000000000000f875 <+5>: push %r15 > > 0x000000000000f877 <+7>: push %r14 > > 0x000000000000f879 <+9>: push %r13 > > 0x000000000000f87b <+11>: push %r12 > > 0x000000000000f87d <+13>: push %rbp > > 0x000000000000f87e <+14>: push %rbx > > 0x000000000000f87f <+15>: mov 0x28(%rdi),%r13 > > 0x000000000000f883 <+19>: mov 0x20(%rdi),%r15 > > 0x000000000000f887 <+23>: mov 0x48(%rdi),%r14 > > 0x000000000000f88b <+27>: lock incl 0x4(%r14) > > 0x000000000000f890 <+32>: mfence > > Thanks, this is very helpful. I think the above means that the crash is > triggered by the following code: > > sp = qla2xxx_get_qpair_sp(cmd_sp->qpair, cmd_sp->fcport, > GFP_KERNEL); > > From the start of qla2xxx_get_qpair_sp(): > > QLA_QPAIR_MARK_BUSY(qpair, bail); > > From qla_def.h: > > #define QLA_QPAIR_MARK_BUSY(__qpair, __bail) do { \ > atomic_inc(&__qpair->ref_count); \ > mb(); \ > if (__qpair->delete_in_progress) { \ > atomic_dec(&__qpair->ref_count); \ > __bail = 1; \ > } else { \ > __bail = 0; \ > } \ > } while (0) > > One of the changes between kernel version v4.9.210 and v4.19.98 is the > following: "qla2xxx: Add multiple queue pair functionality". I think the > above information means that the cmd_sp->qpair pointer is NULL. I will > let QLogic recommend a solution. Thank you very much for the analysis. Unfortunately, QLogic does not seem to care... -- Ondrej Zary ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian) 2020-02-27 17:09 ` Ondrej Zary @ 2020-03-02 22:26 ` Ondrej Zary 2020-03-19 18:01 ` Ondrej Zary 0 siblings, 1 reply; 9+ messages in thread From: Ondrej Zary @ 2020-03-02 22:26 UTC (permalink / raw) To: Bart Van Assche Cc: qla2xxx-upstream, linux-scsi, linux-kernel, Michael Hernandez, Sawan Chandak, Himanshu Madhani On Thursday 27 February 2020 18:09:07 Ondrej Zary wrote: > > On Tuesday 25 February 2020 04:41:48 Bart Van Assche wrote: > > On 2020-02-24 00:20, Ondrej Zary wrote: > > > Looks like it's in some inlined function. > > > > > > /usr/src/linux-source-4.19# gdb /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko > > > GNU gdb (Debian 8.2.1-2+b3) 8.2.1 > > > ... > > > Reading symbols from /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...Reading symbols > > > from /usr/lib/debug//lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...done. > > > done. > > > > > > (gdb) list *(qla24xx_async_abort_cmd+0x1b) > > > 0xf88b is in qla24xx_async_abort_cmd (./arch/x86/include/asm/atomic.h:97). > > > 92 * > > > 93 * Atomically increments @v by 1. > > > 94 */ > > > 95 static __always_inline void arch_atomic_inc(atomic_t *v) > > > 96 { > > > 97 asm volatile(LOCK_PREFIX "incl %0" > > > 98 : "+m" (v->counter) :: "memory"); > > > 99 } > > > 100 #define arch_atomic_inc arch_atomic_inc > > > > > > [ ... ] > > > > > > (gdb) disassemble qla24xx_async_abort_cmd > > > Dump of assembler code for function qla24xx_async_abort_cmd: > > > 0x000000000000f870 <+0>: callq 0xf875 <qla24xx_async_abort_cmd+5> > > > 0x000000000000f875 <+5>: push %r15 > > > 0x000000000000f877 <+7>: push %r14 > > > 0x000000000000f879 <+9>: push %r13 > > > 0x000000000000f87b <+11>: push %r12 > > > 0x000000000000f87d <+13>: push %rbp > > > 0x000000000000f87e <+14>: push %rbx > > > 0x000000000000f87f <+15>: mov 0x28(%rdi),%r13 > > > 0x000000000000f883 <+19>: mov 0x20(%rdi),%r15 > > > 0x000000000000f887 <+23>: mov 0x48(%rdi),%r14 > > > 0x000000000000f88b <+27>: lock incl 0x4(%r14) > > > 0x000000000000f890 <+32>: mfence > > > > Thanks, this is very helpful. I think the above means that the crash is > > triggered by the following code: > > > > sp = qla2xxx_get_qpair_sp(cmd_sp->qpair, cmd_sp->fcport, > > GFP_KERNEL); > > > > From the start of qla2xxx_get_qpair_sp(): > > > > QLA_QPAIR_MARK_BUSY(qpair, bail); > > > > From qla_def.h: > > > > #define QLA_QPAIR_MARK_BUSY(__qpair, __bail) do { \ > > atomic_inc(&__qpair->ref_count); \ > > mb(); \ > > if (__qpair->delete_in_progress) { \ > > atomic_dec(&__qpair->ref_count); \ > > __bail = 1; \ > > } else { \ > > __bail = 0; \ > > } \ > > } while (0) > > > > One of the changes between kernel version v4.9.210 and v4.19.98 is the > > following: "qla2xxx: Add multiple queue pair functionality". I think the > > above information means that the cmd_sp->qpair pointer is NULL. I will > > let QLogic recommend a solution. > > Thank you very much for the analysis. > Unfortunately, QLogic does not seem to care... Let's try to CC the people at Cavium that signed-off the commit. -- Ondrej Zary ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian) 2020-03-02 22:26 ` Ondrej Zary @ 2020-03-19 18:01 ` Ondrej Zary 0 siblings, 0 replies; 9+ messages in thread From: Ondrej Zary @ 2020-03-19 18:01 UTC (permalink / raw) To: Bart Van Assche Cc: qla2xxx-upstream, linux-scsi, linux-kernel, Michael Hernandez, Sawan Chandak, Himanshu Madhani, GR-QLogic-Storage-Upstream, Nilesh Javali On Monday 02 March 2020 23:26:08 Ondrej Zary wrote: > On Thursday 27 February 2020 18:09:07 Ondrej Zary wrote: > > > > On Tuesday 25 February 2020 04:41:48 Bart Van Assche wrote: > > > On 2020-02-24 00:20, Ondrej Zary wrote: > > > > Looks like it's in some inlined function. > > > > > > > > /usr/src/linux-source-4.19# gdb /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko > > > > GNU gdb (Debian 8.2.1-2+b3) 8.2.1 > > > > ... > > > > Reading symbols from /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...Reading symbols > > > > from /usr/lib/debug//lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...done. > > > > done. > > > > > > > > (gdb) list *(qla24xx_async_abort_cmd+0x1b) > > > > 0xf88b is in qla24xx_async_abort_cmd (./arch/x86/include/asm/atomic.h:97). > > > > 92 * > > > > 93 * Atomically increments @v by 1. > > > > 94 */ > > > > 95 static __always_inline void arch_atomic_inc(atomic_t *v) > > > > 96 { > > > > 97 asm volatile(LOCK_PREFIX "incl %0" > > > > 98 : "+m" (v->counter) :: "memory"); > > > > 99 } > > > > 100 #define arch_atomic_inc arch_atomic_inc > > > > > > > > [ ... ] > > > > > > > > (gdb) disassemble qla24xx_async_abort_cmd > > > > Dump of assembler code for function qla24xx_async_abort_cmd: > > > > 0x000000000000f870 <+0>: callq 0xf875 <qla24xx_async_abort_cmd+5> > > > > 0x000000000000f875 <+5>: push %r15 > > > > 0x000000000000f877 <+7>: push %r14 > > > > 0x000000000000f879 <+9>: push %r13 > > > > 0x000000000000f87b <+11>: push %r12 > > > > 0x000000000000f87d <+13>: push %rbp > > > > 0x000000000000f87e <+14>: push %rbx > > > > 0x000000000000f87f <+15>: mov 0x28(%rdi),%r13 > > > > 0x000000000000f883 <+19>: mov 0x20(%rdi),%r15 > > > > 0x000000000000f887 <+23>: mov 0x48(%rdi),%r14 > > > > 0x000000000000f88b <+27>: lock incl 0x4(%r14) > > > > 0x000000000000f890 <+32>: mfence > > > > > > Thanks, this is very helpful. I think the above means that the crash is > > > triggered by the following code: > > > > > > sp = qla2xxx_get_qpair_sp(cmd_sp->qpair, cmd_sp->fcport, > > > GFP_KERNEL); > > > > > > From the start of qla2xxx_get_qpair_sp(): > > > > > > QLA_QPAIR_MARK_BUSY(qpair, bail); > > > > > > From qla_def.h: > > > > > > #define QLA_QPAIR_MARK_BUSY(__qpair, __bail) do { \ > > > atomic_inc(&__qpair->ref_count); \ > > > mb(); \ > > > if (__qpair->delete_in_progress) { \ > > > atomic_dec(&__qpair->ref_count); \ > > > __bail = 1; \ > > > } else { \ > > > __bail = 0; \ > > > } \ > > > } while (0) > > > > > > One of the changes between kernel version v4.9.210 and v4.19.98 is the > > > following: "qla2xxx: Add multiple queue pair functionality". I think the > > > above information means that the cmd_sp->qpair pointer is NULL. I will > > > let QLogic recommend a solution. > > > > Thank you very much for the analysis. > > Unfortunately, QLogic does not seem to care... > > Let's try to CC the people at Cavium that signed-off the commit. No reply. qla2xxx-upstream@qlogic.com address is dead: Generating server: DC5-EXCH01.marvell.com qla2xxx-upstream@qlogic.com Remote Server returned '550 5.1.1 RESOLVER.ADR.RecipNotFound; not found' Added some more CC addresses. Yesterday it crashed again at the same place: [2076301.849762] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 [2076301.850021] PGD 0 P4D 0 [2076301.850109] Oops: 0002 [#1] SMP PTI [2076301.850219] CPU: 4 PID: 18992 Comm: kworker/u16:1 Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1 [2076301.850478] Hardware name: Dell Inc. PowerEdge 2950/0JR815, BIOS 2.7.0 10/30/2010 [2076301.850720] Workqueue: scsi_tmf_4 scmd_eh_abort_handler [scsi_mod] [2076301.850936] RIP: 0010:qla24xx_async_abort_cmd+0x1b/0x250 [qla2xxx] [2076301.851130] Code: e9 19 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 41 55 41 54 55 53 4c 8b 6f 28 4c 8b 7f 20 4c 8b 77 48 <f0> 41 ff 46 04 0f a e f0 41 f6 46 24 04 74 17 f0 41 ff 4e 04 bd 02 [2076301.851663] RSP: 0018:ffffa10f8bbe7da8 EFLAGS: 00010293 [2076301.851820] RAX: 0000000000000800 RBX: ffff8ab8ddd197a8 RCX: 0000000000000070 [2076301.852036] RDX: ffff8ab8de4a8388 RSI: 0000000000000001 RDI: ffff8ab8799b8c40 [2076301.852253] RBP: ffff8ab8dc96c480 R08: ffffffffc03b7860 R09: 0000000000000000 [2076301.852469] R10: 8080808080808080 R11: 0000000000000010 R12: ffff8ab8dea00000 [2076301.852686] R13: ffff8ab8ddd197a8 R14: 0000000000000000 R15: ffff8ab8dd632000 [2076301.852902] FS: 0000000000000000(0000) GS:ffff8ab8e7b00000(0000) knlGS:0000000000000000 [2076301.853142] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [2076301.853320] CR2: 0000000000000004 CR3: 00000002203dc000 CR4: 00000000000006e0 [2076301.853536] Call Trace: [2076301.853632] qla24xx_abort_command+0x218/0x2d0 [qla2xxx] [2076301.853799] ? __switch_to_asm+0x41/0x70 [2076301.853924] ? __switch_to_asm+0x35/0x70 [2076301.854056] qla2xxx_eh_abort+0x117/0x310 [qla2xxx] [2076301.854209] scmd_eh_abort_handler+0x85/0x220 [scsi_mod] [2076301.854375] process_one_work+0x1a7/0x3a0 [2076301.854506] worker_thread+0x30/0x390 [2076301.854628] ? create_worker+0x1a0/0x1a0 [2076301.854753] kthread+0x112/0x130 [2076301.854859] ? kthread_bind+0x30/0x30 [2076301.854980] ret_from_fork+0x35/0x40 [2076301.855095] Modules linked in: loop ipmi_ssif radeon coretemp ttm drm_kms_helper drm kvm i2c_algo_bit i5000_edac iTCO_wdt sg iTCO_vendor_support irqbypass evdev i5k_ amb serio_raw joydev ipmi_si rng_core pcc_cpufreq dcdbas pcspkr ipmi_devintf acpi_cpufreq ipmi_msghandler button ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb crypt o_simd cryptd glue_helper aes_x86_64 dm_service_time dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua uas usb_storage hid_generic usbhid hid sr_mod cdrom ses enc losure sd_mod scsi_transport_sas ata_generic qla2xxx ata_piix nvme_fc ehci_pci nvme_fabrics libata uhci_hcd psmouse ehci_hcd nvme_core megaraid_sas usbcore scsi_transport _fc lpc_ich mfd_core scsi_mod usb_common bnx2 [2076301.856887] CR2: 0000000000000004 [2076301.856999] ---[ end trace e9083db8fb76e126 ]--- [2076301.857151] RIP: 0010:qla24xx_async_abort_cmd+0x1b/0x250 [qla2xxx] [2076301.857345] Code: e9 19 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 41 55 41 54 55 53 4c 8b 6f 28 4c 8b 7f 20 4c 8b 77 48 <f0> 41 ff 46 04 0f a e f0 41 f6 46 24 04 74 17 f0 41 ff 4e 04 bd 02 [2076301.857878] RSP: 0018:ffffa10f8bbe7da8 EFLAGS: 00010293 [2076301.858035] RAX: 0000000000000800 RBX: ffff8ab8ddd197a8 RCX: 0000000000000070 [2076301.858251] RDX: ffff8ab8de4a8388 RSI: 0000000000000001 RDI: ffff8ab8799b8c40 [2076301.858467] RBP: ffff8ab8dc96c480 R08: ffffffffc03b7860 R09: 0000000000000000 [2076301.869384] R10: 8080808080808080 R11: 0000000000000010 R12: ffff8ab8dea00000 [2076301.880412] R13: ffff8ab8ddd197a8 R14: 0000000000000000 R15: ffff8ab8dd632000 [2076301.891483] FS: 0000000000000000(0000) GS:ffff8ab8e7b00000(0000) knlGS:0000000000000000 [2076301.902490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [2076301.913344] CR2: 0000000000000004 CR3: 00000002203dc000 CR4: 00000000000006e0 [2077225.259348] mysqld[2155]: segfault at 0 ip 000056409366ad93 sp 00007fa049514450 error 6 in mysqld[564092eb2000+805000] [2077225.270564] Code: c7 45 00 00 00 00 00 8b 7d cc 4c 89 e2 4c 89 f6 e8 62 81 84 ff 49 89 c7 49 39 c4 0f 84 f6 00 00 00 e8 e1 1c 00 00 41 8b 4d 00 <89> 08 85 c9 74 37 4 9 83 ff ff 0f 84 9d 00 00 00 f6 c3 06 75 28 4d -- Ondrej Zary ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-03-19 18:01 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-02-23 18:29 NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian) Ondrej Zary 2020-02-23 19:26 ` Bart Van Assche 2020-02-23 19:57 ` Ondrej Zary 2020-02-24 2:17 ` Bart Van Assche 2020-02-24 8:20 ` Ondrej Zary 2020-02-25 3:41 ` Bart Van Assche 2020-02-27 17:09 ` Ondrej Zary 2020-03-02 22:26 ` Ondrej Zary 2020-03-19 18:01 ` Ondrej Zary
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).