* NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian)
@ 2020-02-23 18:29 Ondrej Zary
2020-02-23 19:26 ` Bart Van Assche
0 siblings, 1 reply; 9+ messages in thread
From: Ondrej Zary @ 2020-02-23 18:29 UTC (permalink / raw)
To: qla2xxx-upstream, linux-scsi, linux-kernel
Hello,
a couple of days after upgrading a server from Debian 9 (kernel 4.9.210-1)
to 10 (kernel 4.19.98), qla2xxx crashed, along with mysql.
There is an EMC CX3 array connected through the fibre-channel adapter.
No errors are present in EMC event log.
This server was running without any problems since Debian 4.
Is this a known bug?
[979178.888922] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[979178.889160] PGD 0 P4D 0
[979178.889243] Oops: 0002 [#1] SMP PTI
[979178.889362] CPU: 6 PID: 11060 Comm: kworker/u16:2 Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1
[979178.889617] Hardware name: Dell Inc. PowerEdge 2950/0JR815, BIOS 2.7.0 10/30/2010
[979178.889855] Workqueue: scsi_tmf_4 scmd_eh_abort_handler [scsi_mod]
[979178.890069] RIP: 0010:qla24xx_async_abort_cmd+0x1b/0x250 [qla2xxx]
[979178.890258] Code: e9 19 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 41 55 41 54 55 53 4c 8b 6f 28 4c 8b 7f 20 4c 8b 77 48 <f0> 41 ff 46 04 0f ae
f0 41 f6 46 24 04 74 17 f0 41 ff 4e 04 bd 02
[979178.890801] RSP: 0018:ffffb1250ba83da8 EFLAGS: 00010293
[979178.890966] RAX: 0000000000000800 RBX: ffff93b89db837a8 RCX: 00000000000005f4
[979178.891178] RDX: ffff93b89e28afa8 RSI: 0000000000000001 RDI: ffff93b8a5018fc0
[979178.891389] RBP: ffff93b89ccb89c0 R08: ffffffffc0595860 R09: 0000000000000000
[979178.891600] R10: 8080808080808080 R11: 0000000000000010 R12: ffff93b89db82000
[979178.891811] R13: ffff93b89db837a8 R14: 0000000000000000 R15: ffff93b89d88a800
[979178.892023] FS: 0000000000000000(0000) GS:ffff93b8a7b80000(0000) knlGS:0000000000000000
[979178.892258] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[979178.892430] CR2: 0000000000000004 CR3: 000000021a62a000 CR4: 00000000000006e0
[979178.892642] Call Trace:
[979178.892748] qla24xx_abort_command+0x218/0x2d0 [qla2xxx]
[979178.892911] ? __switch_to_asm+0x41/0x70
[979178.893031] ? __switch_to_asm+0x35/0x70
[979178.893160] qla2xxx_eh_abort+0x117/0x310 [qla2xxx]
[979178.893323] scmd_eh_abort_handler+0x85/0x220 [scsi_mod]
[979178.893484] process_one_work+0x1a7/0x3a0
[979178.893611] worker_thread+0x30/0x390
[979178.893727] ? create_worker+0x1a0/0x1a0
[979178.893847] kthread+0x112/0x130
[979178.893948] ? kthread_bind+0x30/0x30
[979178.894064] ret_from_fork+0x35/0x40
[979178.894174] Modules linked in: loop ipmi_ssif radeon ttm drm_kms_helper drm coretemp i2c_algo_bit iTCO_wdt iTCO_vendor_support ipmi_si joydev kvm sg evdev i5000_edac
ipmi_devintf pcc_cpufreq ipmi_msghandler rng_core i5k_amb irqbypass dcdbas serio_raw acpi_cpufreq button pcspkr ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb crypto
_simd cryptd glue_helper aes_x86_64 dm_service_time dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua uas usb_storage hid_generic usbhid hid sr_mod ses cdrom encl
osure sd_mod scsi_transport_sas ata_generic qla2xxx uhci_hcd ehci_pci ehci_hcd psmouse ata_piix nvme_fc libata nvme_fabrics usbcore nvme_core megaraid_sas scsi_transport_
fc scsi_mod lpc_ich mfd_core usb_common bnx2
[979178.895968] CR2: 0000000000000004
[979178.896075] ---[ end trace 4d42692cc0dc3c87 ]---
[979178.896225] RIP: 0010:qla24xx_async_abort_cmd+0x1b/0x250 [qla2xxx]
[979178.896414] Code: e9 19 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 41 55 41 54 55 53 4c 8b 6f 28 4c 8b 7f 20 4c 8b 77 48 <f0> 41 ff 46 04 0f ae
f0 41 f6 46 24 04 74 17 f0 41 ff 4e 04 bd 02
[979178.896956] RSP: 0018:ffffb1250ba83da8 EFLAGS: 00010293
[979178.897121] RAX: 0000000000000800 RBX: ffff93b89db837a8 RCX: 00000000000005f4
[979178.897332] RDX: ffff93b89e28afa8 RSI: 0000000000000001 RDI: ffff93b8a5018fc0
[979178.897544] RBP: ffff93b89ccb89c0 R08: ffffffffc0595860 R09: 0000000000000000
[979178.908415] R10: 8080808080808080 R11: 0000000000000010 R12: ffff93b89db82000
[979178.919419] R13: ffff93b89db837a8 R14: 0000000000000000 R15: ffff93b89d88a800
[979178.930444] FS: 0000000000000000(0000) GS:ffff93b8a7b80000(0000) knlGS:0000000000000000
[979178.941366] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[979178.952142] CR2: 0000000000000004 CR3: 000000021a62a000 CR4: 00000000000006e0
[980103.072740] mysqld[2175]: segfault at 0 ip 000055bbc5cd2d93 sp 00007f2362ffb450 error 6 in mysqld[55bbc551a000+805000]
[980103.083956] Code: c7 45 00 00 00 00 00 8b 7d cc 4c 89 e2 4c 89 f6 e8 62 81 84 ff 49 89 c7 49 39 c4 0f 84 f6 00 00 00 e8 e1 1c 00 00 41 8b 4d 00 <89> 08 85 c9 74 37 49
83 ff ff 0f 84 9d 00 00 00 f6 c3 06 75 28 4d
--
Ondrej Zary
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian)
2020-02-23 18:29 NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian) Ondrej Zary
@ 2020-02-23 19:26 ` Bart Van Assche
2020-02-23 19:57 ` Ondrej Zary
0 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2020-02-23 19:26 UTC (permalink / raw)
To: Ondrej Zary, qla2xxx-upstream, linux-scsi, linux-kernel
On 2020-02-23 10:29, Ondrej Zary wrote:
> a couple of days after upgrading a server from Debian 9 (kernel 4.9.210-1)
> to 10 (kernel 4.19.98), qla2xxx crashed, along with mysql.
>
> There is an EMC CX3 array connected through the fibre-channel adapter.
> No errors are present in EMC event log.
>
> This server was running without any problems since Debian 4.
> Is this a known bug?
Please report issues encountered with Debian kernels in the Debian bug
tracker. If you want the upstream community to assist please retest with
an upstream kernel.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian)
2020-02-23 19:26 ` Bart Van Assche
@ 2020-02-23 19:57 ` Ondrej Zary
2020-02-24 2:17 ` Bart Van Assche
0 siblings, 1 reply; 9+ messages in thread
From: Ondrej Zary @ 2020-02-23 19:57 UTC (permalink / raw)
To: Bart Van Assche; +Cc: qla2xxx-upstream, linux-scsi, linux-kernel
On Sunday 23 February 2020 20:26:39 Bart Van Assche wrote:
> On 2020-02-23 10:29, Ondrej Zary wrote:
> > a couple of days after upgrading a server from Debian 9 (kernel 4.9.210-1)
> > to 10 (kernel 4.19.98), qla2xxx crashed, along with mysql.
> >
> > There is an EMC CX3 array connected through the fibre-channel adapter.
> > No errors are present in EMC event log.
> >
> > This server was running without any problems since Debian 4.
> > Is this a known bug?
>
> Please report issues encountered with Debian kernels in the Debian bug
> tracker. If you want the upstream community to assist please retest with
> an upstream kernel.
Debian kernel does not have any patches related to qla2xxx driver:
https://salsa.debian.org/kernel-team/linux/raw/debian/4.19.98-1/debian/patches/series
It crashed after running for 11 days. Not a quick&easy test.
--
Ondrej Zary
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian)
2020-02-23 19:57 ` Ondrej Zary
@ 2020-02-24 2:17 ` Bart Van Assche
2020-02-24 8:20 ` Ondrej Zary
0 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2020-02-24 2:17 UTC (permalink / raw)
To: Ondrej Zary; +Cc: qla2xxx-upstream, linux-scsi, linux-kernel
On 2020-02-23 11:57, Ondrej Zary wrote:
> On Sunday 23 February 2020 20:26:39 Bart Van Assche wrote:
>> On 2020-02-23 10:29, Ondrej Zary wrote:
>>> a couple of days after upgrading a server from Debian 9 (kernel 4.9.210-1)
>>> to 10 (kernel 4.19.98), qla2xxx crashed, along with mysql.
>>>
>>> There is an EMC CX3 array connected through the fibre-channel adapter.
>>> No errors are present in EMC event log.
>>>
>>> This server was running without any problems since Debian 4.
>>> Is this a known bug?
>>
>> Please report issues encountered with Debian kernels in the Debian bug
>> tracker. If you want the upstream community to assist please retest with
>> an upstream kernel.
>
> Debian kernel does not have any patches related to qla2xxx driver:
> https://salsa.debian.org/kernel-team/linux/raw/debian/4.19.98-1/debian/patches/series
>
> It crashed after running for 11 days. Not a quick&easy test.
It would help a lot if the crash address would be translated into a
source code line number. Something like the following commands should do
the trick:
$ gdb drivers/scsi/qla2xxx/qla2xxx.ko
(gdb) list *(qla24xx_async_abort_cmd+0x1b)
Thanks,
Bart.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian)
2020-02-24 2:17 ` Bart Van Assche
@ 2020-02-24 8:20 ` Ondrej Zary
2020-02-25 3:41 ` Bart Van Assche
0 siblings, 1 reply; 9+ messages in thread
From: Ondrej Zary @ 2020-02-24 8:20 UTC (permalink / raw)
To: Bart Van Assche; +Cc: qla2xxx-upstream, linux-scsi, linux-kernel
On Monday 24 February 2020, Bart Van Assche wrote:
> On 2020-02-23 11:57, Ondrej Zary wrote:
> > On Sunday 23 February 2020 20:26:39 Bart Van Assche wrote:
> >> On 2020-02-23 10:29, Ondrej Zary wrote:
> >>> a couple of days after upgrading a server from Debian 9 (kernel
> >>> 4.9.210-1) to 10 (kernel 4.19.98), qla2xxx crashed, along with mysql.
> >>>
> >>> There is an EMC CX3 array connected through the fibre-channel adapter.
> >>> No errors are present in EMC event log.
> >>>
> >>> This server was running without any problems since Debian 4.
> >>> Is this a known bug?
> >>
> >> Please report issues encountered with Debian kernels in the Debian bug
> >> tracker. If you want the upstream community to assist please retest with
> >> an upstream kernel.
> >
> > Debian kernel does not have any patches related to qla2xxx driver:
> > https://salsa.debian.org/kernel-team/linux/raw/debian/4.19.98-1/debian/pa
> >tches/series
> >
> > It crashed after running for 11 days. Not a quick&easy test.
>
> It would help a lot if the crash address would be translated into a
> source code line number. Something like the following commands should do
> the trick:
> $ gdb drivers/scsi/qla2xxx/qla2xxx.ko
> (gdb) list *(qla24xx_async_abort_cmd+0x1b)
Looks like it's in some inlined function.
/usr/src/linux-source-4.19# gdb /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
GNU gdb (Debian 8.2.1-2+b3) 8.2.1
...
Reading symbols from /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...Reading symbols
from /usr/lib/debug//lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...done.
done.
(gdb) list *(qla24xx_async_abort_cmd+0x1b)
0xf88b is in qla24xx_async_abort_cmd (./arch/x86/include/asm/atomic.h:97).
92 *
93 * Atomically increments @v by 1.
94 */
95 static __always_inline void arch_atomic_inc(atomic_t *v)
96 {
97 asm volatile(LOCK_PREFIX "incl %0"
98 : "+m" (v->counter) :: "memory");
99 }
100 #define arch_atomic_inc arch_atomic_inc
101
(gdb) list *(qla24xx_abort_command+0x218)
0x22238 is in qla24xx_abort_command (./drivers/scsi/qla2xxx/qla_mbx.c:3084).
3079
3080 if (vha->flags.qpairs_available && sp->qpair)
3081 req = sp->qpair->req;
3082
3083 if (ql2xasynctmfenable)
3084 return qla24xx_async_abort_command(sp);
3085
3086 spin_lock_irqsave(&ha->hardware_lock, flags);
3087 for (handle = 1; handle < req->num_outstanding_cmds; handle++) {
3088 if (req->outstanding_cmds[handle] == sp)
(gdb) list *(qla2xxx_eh_abort+0x117)
0x15e7 is in qla2xxx_eh_abort (./drivers/scsi/qla2xxx/qla_os.c:1314).
1309 /* Get a reference to the sp and drop the lock.*/
1310 sp_get(sp);
1311
1312 spin_unlock_irqrestore(&ha->hardware_lock, flags);
1313 rval = ha->isp_ops->abort_command(sp);
1314 if (rval) {
1315 if (rval == QLA_FUNCTION_PARAMETER_ERROR)
1316 ret = SUCCESS;
1317 else
1318 ret = FAILED;
(gdb) disassemble qla24xx_async_abort_cmd
Dump of assembler code for function qla24xx_async_abort_cmd:
0x000000000000f870 <+0>: callq 0xf875 <qla24xx_async_abort_cmd+5>
0x000000000000f875 <+5>: push %r15
0x000000000000f877 <+7>: push %r14
0x000000000000f879 <+9>: push %r13
0x000000000000f87b <+11>: push %r12
0x000000000000f87d <+13>: push %rbp
0x000000000000f87e <+14>: push %rbx
0x000000000000f87f <+15>: mov 0x28(%rdi),%r13
0x000000000000f883 <+19>: mov 0x20(%rdi),%r15
0x000000000000f887 <+23>: mov 0x48(%rdi),%r14
0x000000000000f88b <+27>: lock incl 0x4(%r14)
0x000000000000f890 <+32>: mfence
0x000000000000f893 <+35>: testb $0x4,0x24(%r14)
0x000000000000f898 <+40>: je 0xf8b1 <qla24xx_async_abort_cmd+65>
0x000000000000f89a <+42>: lock decl 0x4(%r14)
0x000000000000f89f <+47>: mov $0x102,%ebp
0x000000000000f8a4 <+52>: pop %rbx
0x000000000000f8a5 <+53>: mov %ebp,%eax
0x000000000000f8a7 <+55>: pop %rbp
0x000000000000f8a8 <+56>: pop %r12
0x000000000000f8aa <+58>: pop %r13
0x000000000000f8ac <+60>: pop %r14
0x000000000000f8ae <+62>: pop %r15
0x000000000000f8b0 <+64>: retq
0x000000000000f8b1 <+65>: mov %rdi,%rbp
0x000000000000f8b4 <+68>: mov 0x30(%r14),%rdi
0x000000000000f8b8 <+72>: mov %esi,%r12d
0x000000000000f8bb <+75>: mov $0x6000c0,%esi
0x000000000000f8c0 <+80>: callq 0xf8c5 <qla24xx_async_abort_cmd+85>
0x000000000000f8c5 <+85>: mov %rax,%rbx
0x000000000000f8c8 <+88>: test %rax,%rax
0x000000000000f8cb <+91>: je 0xf89a <qla24xx_async_abort_cmd+42>
0x000000000000f8cd <+93>: lea 0x8(%rax),%rdi
0x000000000000f8d1 <+97>: mov %rax,%rcx
0x000000000000f8d4 <+100>: movq $0x0,(%rax)
0x000000000000f8db <+107>: mov $0xc,%edx
0x000000000000f8e0 <+112>: movq $0x0,0x180(%rax)
0x000000000000f8eb <+123>: and $0xfffffffffffffff8,%rdi
0x000000000000f8ef <+127>: xor %eax,%eax
0x000000000000f8f1 <+129>: sub %rdi,%rcx
0x000000000000f8f4 <+132>: add $0x188,%ecx
0x000000000000f8fa <+138>: shr $0x3,%ecx
0x000000000000f8fd <+141>: rep stos %rax,%es:(%rdi)
0x000000000000f900 <+144>: mov %r15,0x20(%rbx)
0x000000000000f904 <+148>: movl $0x1,0x40(%rbx)
0x000000000000f90b <+155>: mov 0x18(%r14),%rax
0x000000000000f90f <+159>: mov %dx,0x36(%rbx)
0x000000000000f913 <+163>: movq $0x0,0x38(%rbx)
0x000000000000f91b <+171>: mov %rax,0x28(%rbx)
0x000000000000f91f <+175>: lea 0x50(%rbx),%rax
0x000000000000f923 <+179>: mov %rax,0x50(%rbx)
0x000000000000f927 <+183>: mov %rax,0x58(%rbx)
0x000000000000f92b <+187>: mov 0x48(%rbp),%rax
0x000000000000f92f <+191>: mov %rax,0x48(%rbx)
0x000000000000f933 <+195>: test %r12b,%r12b
0x000000000000f936 <+198>: je 0xf941 <qla24xx_async_abort_cmd+209>
0x000000000000f938 <+200>: mov $0x40,%eax
0x000000000000f93d <+205>: mov %ax,0x34(%rbx)
0x000000000000f941 <+209>: lea 0xa0(%rbx),%rdi
0x000000000000f948 <+216>: mov $0x0,%rdx
0x000000000000f94f <+223>: mov $0x0,%rsi
0x000000000000f956 <+230>: movq $0x0,0x170(%rbx)
0x000000000000f961 <+241>: lea 0x148(%rbx),%r14
0x000000000000f968 <+248>: movl $0x0,0x98(%rbx)
0x000000000000f972 <+258>: callq 0xf977 <qla24xx_async_abort_cmd+263>
0x000000000000f977 <+263>: xor %r8d,%r8d
0x000000000000f97a <+266>: xor %ecx,%ecx
0x000000000000f97c <+268>: xor %edx,%edx
0x000000000000f97e <+270>: mov $0x0,%rsi
0x000000000000f985 <+277>: mov %r14,%rdi
0x000000000000f988 <+280>: callq 0xf98d <qla24xx_async_abort_cmd+285>
0x000000000000f98d <+285>: mov 0x0(%rip),%rax # 0xf994 <qla24xx_async_abort_cmd+292>
0x000000000000f994 <+292>: lea 0x78(%rbx),%rdi
0x000000000000f998 <+296>: mov $0x0,%rdx
0x000000000000f99f <+303>: mov $0x0,%rsi
0x000000000000f9a6 <+310>: movl $0x0,0x70(%rbx)
0x000000000000f9ad <+317>: add $0x2904,%rax
0x000000000000f9b3 <+323>: movq $0x0,0x180(%rbx)
0x000000000000f9be <+334>: mov %rax,0x158(%rbx)
0x000000000000f9c5 <+341>: callq 0xf9ca <qla24xx_async_abort_cmd+346>
0x000000000000f9ca <+346>: mov 0x28(%rbx),%rax
0x000000000000f9ce <+350>: mov 0x448(%rax),%rax
0x000000000000f9d5 <+357>: testb $0x2,0x15a(%rax)
0x000000000000f9dc <+364>: jne 0xfa80 <qla24xx_async_abort_cmd+528>
0x000000000000f9e2 <+370>: mov %r14,%rdi
0x000000000000f9e5 <+373>: callq 0xf9ea <qla24xx_async_abort_cmd+378>
0x000000000000f9ea <+378>: mov 0x30(%rbp),%r8d
0x000000000000f9ee <+382>: mov 0x48(%rbp),%rax
0x000000000000f9f2 <+386>: mov %r13,%rsi
0x000000000000f9f5 <+389>: movzwl 0x36(%rbp),%r9d
0x000000000000f9fa <+394>: mov $0x507c,%edx
0x000000000000f9ff <+399>: mov $0x2000000,%edi
0x000000000000fa04 <+404>: mov $0x0,%rcx
0x000000000000fa0b <+411>: mov %r8d,0x90(%rbx)
0x000000000000fa12 <+418>: mov 0x48(%rax),%rax
0x000000000000fa16 <+422>: movzwl 0x40(%rax),%eax
0x000000000000fa1a <+426>: movq $0x0,0x178(%rbx)
0x000000000000fa25 <+437>: mov %ax,0x96(%rbx)
0x000000000000fa2c <+444>: callq 0xfa31 <qla24xx_async_abort_cmd+449>
0x000000000000fa31 <+449>: mov %rbx,%rdi
0x000000000000fa34 <+452>: callq 0xfa39 <qla24xx_async_abort_cmd+457>
0x000000000000fa39 <+457>: mov %eax,%ebp
0x000000000000fa3b <+459>: test %eax,%eax
0x000000000000fa3d <+461>: jne 0xfa64 <qla24xx_async_abort_cmd+500>
0x000000000000fa3f <+463>: test %r12b,%r12b
0x000000000000fa42 <+466>: je 0xf8a4 <qla24xx_async_abort_cmd+52>
0x000000000000fa48 <+472>: lea 0x98(%rbx),%rdi
0x000000000000fa4f <+479>: callq 0xfa54 <qla24xx_async_abort_cmd+484>
0x000000000000fa54 <+484>: cmpw $0x0,0x94(%rbx)
0x000000000000fa5c <+492>: mov $0x102,%eax
0x000000000000fa61 <+497>: cmovne %eax,%ebp
0x000000000000fa64 <+500>: mov 0x180(%rbx),%rax
0x000000000000fa6b <+507>: mov %rbx,%rdi
0x000000000000fa6e <+510>: callq 0xfa73 <qla24xx_async_abort_cmd+515>
0x000000000000fa73 <+515>: mov %ebp,%eax
0x000000000000fa75 <+517>: pop %rbx
0x000000000000fa76 <+518>: pop %rbp
0x000000000000fa77 <+519>: pop %r12
0x000000000000fa79 <+521>: pop %r13
0x000000000000fa7b <+523>: pop %r14
0x000000000000fa7d <+525>: pop %r15
0x000000000000fa7f <+527>: retq
0x000000000000fa80 <+528>: cmpw $0xa,0x36(%rbx)
0x000000000000fa85 <+533>: jne 0xf9e2 <qla24xx_async_abort_cmd+370>
0x000000000000fa8b <+539>: lea 0xe8(%rbx),%rdi
0x000000000000fa92 <+546>: mov $0x0,%rdx
0x000000000000fa99 <+553>: mov $0x0,%rsi
0x000000000000faa0 <+560>: movl $0x0,0xe0(%rbx)
0x000000000000faaa <+570>: callq 0xfaaf <qla24xx_async_abort_cmd+575>
0x000000000000faaf <+575>: jmpq 0xf9e2 <qla24xx_async_abort_cmd+370>
End of assembler dump.
--
Ondrej Zary
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian)
2020-02-24 8:20 ` Ondrej Zary
@ 2020-02-25 3:41 ` Bart Van Assche
2020-02-27 17:09 ` Ondrej Zary
0 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2020-02-25 3:41 UTC (permalink / raw)
To: Ondrej Zary; +Cc: qla2xxx-upstream, linux-scsi, linux-kernel
On 2020-02-24 00:20, Ondrej Zary wrote:
> Looks like it's in some inlined function.
>
> /usr/src/linux-source-4.19# gdb /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
> GNU gdb (Debian 8.2.1-2+b3) 8.2.1
> ...
> Reading symbols from /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...Reading symbols
> from /usr/lib/debug//lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...done.
> done.
>
> (gdb) list *(qla24xx_async_abort_cmd+0x1b)
> 0xf88b is in qla24xx_async_abort_cmd (./arch/x86/include/asm/atomic.h:97).
> 92 *
> 93 * Atomically increments @v by 1.
> 94 */
> 95 static __always_inline void arch_atomic_inc(atomic_t *v)
> 96 {
> 97 asm volatile(LOCK_PREFIX "incl %0"
> 98 : "+m" (v->counter) :: "memory");
> 99 }
> 100 #define arch_atomic_inc arch_atomic_inc
>
> [ ... ]
>
> (gdb) disassemble qla24xx_async_abort_cmd
> Dump of assembler code for function qla24xx_async_abort_cmd:
> 0x000000000000f870 <+0>: callq 0xf875 <qla24xx_async_abort_cmd+5>
> 0x000000000000f875 <+5>: push %r15
> 0x000000000000f877 <+7>: push %r14
> 0x000000000000f879 <+9>: push %r13
> 0x000000000000f87b <+11>: push %r12
> 0x000000000000f87d <+13>: push %rbp
> 0x000000000000f87e <+14>: push %rbx
> 0x000000000000f87f <+15>: mov 0x28(%rdi),%r13
> 0x000000000000f883 <+19>: mov 0x20(%rdi),%r15
> 0x000000000000f887 <+23>: mov 0x48(%rdi),%r14
> 0x000000000000f88b <+27>: lock incl 0x4(%r14)
> 0x000000000000f890 <+32>: mfence
Thanks, this is very helpful. I think the above means that the crash is
triggered by the following code:
sp = qla2xxx_get_qpair_sp(cmd_sp->qpair, cmd_sp->fcport,
GFP_KERNEL);
From the start of qla2xxx_get_qpair_sp():
QLA_QPAIR_MARK_BUSY(qpair, bail);
From qla_def.h:
#define QLA_QPAIR_MARK_BUSY(__qpair, __bail) do { \
atomic_inc(&__qpair->ref_count); \
mb(); \
if (__qpair->delete_in_progress) { \
atomic_dec(&__qpair->ref_count); \
__bail = 1; \
} else { \
__bail = 0; \
} \
} while (0)
One of the changes between kernel version v4.9.210 and v4.19.98 is the
following: "qla2xxx: Add multiple queue pair functionality". I think the
above information means that the cmd_sp->qpair pointer is NULL. I will
let QLogic recommend a solution.
Bart.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian)
2020-02-25 3:41 ` Bart Van Assche
@ 2020-02-27 17:09 ` Ondrej Zary
2020-03-02 22:26 ` Ondrej Zary
0 siblings, 1 reply; 9+ messages in thread
From: Ondrej Zary @ 2020-02-27 17:09 UTC (permalink / raw)
To: Bart Van Assche; +Cc: qla2xxx-upstream, linux-scsi, linux-kernel
On Tuesday 25 February 2020 04:41:48 Bart Van Assche wrote:
> On 2020-02-24 00:20, Ondrej Zary wrote:
> > Looks like it's in some inlined function.
> >
> > /usr/src/linux-source-4.19# gdb /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
> > GNU gdb (Debian 8.2.1-2+b3) 8.2.1
> > ...
> > Reading symbols from /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...Reading symbols
> > from /usr/lib/debug//lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...done.
> > done.
> >
> > (gdb) list *(qla24xx_async_abort_cmd+0x1b)
> > 0xf88b is in qla24xx_async_abort_cmd (./arch/x86/include/asm/atomic.h:97).
> > 92 *
> > 93 * Atomically increments @v by 1.
> > 94 */
> > 95 static __always_inline void arch_atomic_inc(atomic_t *v)
> > 96 {
> > 97 asm volatile(LOCK_PREFIX "incl %0"
> > 98 : "+m" (v->counter) :: "memory");
> > 99 }
> > 100 #define arch_atomic_inc arch_atomic_inc
> >
> > [ ... ]
> >
> > (gdb) disassemble qla24xx_async_abort_cmd
> > Dump of assembler code for function qla24xx_async_abort_cmd:
> > 0x000000000000f870 <+0>: callq 0xf875 <qla24xx_async_abort_cmd+5>
> > 0x000000000000f875 <+5>: push %r15
> > 0x000000000000f877 <+7>: push %r14
> > 0x000000000000f879 <+9>: push %r13
> > 0x000000000000f87b <+11>: push %r12
> > 0x000000000000f87d <+13>: push %rbp
> > 0x000000000000f87e <+14>: push %rbx
> > 0x000000000000f87f <+15>: mov 0x28(%rdi),%r13
> > 0x000000000000f883 <+19>: mov 0x20(%rdi),%r15
> > 0x000000000000f887 <+23>: mov 0x48(%rdi),%r14
> > 0x000000000000f88b <+27>: lock incl 0x4(%r14)
> > 0x000000000000f890 <+32>: mfence
>
> Thanks, this is very helpful. I think the above means that the crash is
> triggered by the following code:
>
> sp = qla2xxx_get_qpair_sp(cmd_sp->qpair, cmd_sp->fcport,
> GFP_KERNEL);
>
> From the start of qla2xxx_get_qpair_sp():
>
> QLA_QPAIR_MARK_BUSY(qpair, bail);
>
> From qla_def.h:
>
> #define QLA_QPAIR_MARK_BUSY(__qpair, __bail) do { \
> atomic_inc(&__qpair->ref_count); \
> mb(); \
> if (__qpair->delete_in_progress) { \
> atomic_dec(&__qpair->ref_count); \
> __bail = 1; \
> } else { \
> __bail = 0; \
> } \
> } while (0)
>
> One of the changes between kernel version v4.9.210 and v4.19.98 is the
> following: "qla2xxx: Add multiple queue pair functionality". I think the
> above information means that the cmd_sp->qpair pointer is NULL. I will
> let QLogic recommend a solution.
Thank you very much for the analysis.
Unfortunately, QLogic does not seem to care...
--
Ondrej Zary
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian)
2020-02-27 17:09 ` Ondrej Zary
@ 2020-03-02 22:26 ` Ondrej Zary
2020-03-19 18:01 ` Ondrej Zary
0 siblings, 1 reply; 9+ messages in thread
From: Ondrej Zary @ 2020-03-02 22:26 UTC (permalink / raw)
To: Bart Van Assche
Cc: qla2xxx-upstream, linux-scsi, linux-kernel, Michael Hernandez,
Sawan Chandak, Himanshu Madhani
On Thursday 27 February 2020 18:09:07 Ondrej Zary wrote:
>
> On Tuesday 25 February 2020 04:41:48 Bart Van Assche wrote:
> > On 2020-02-24 00:20, Ondrej Zary wrote:
> > > Looks like it's in some inlined function.
> > >
> > > /usr/src/linux-source-4.19# gdb /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
> > > GNU gdb (Debian 8.2.1-2+b3) 8.2.1
> > > ...
> > > Reading symbols from /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...Reading symbols
> > > from /usr/lib/debug//lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...done.
> > > done.
> > >
> > > (gdb) list *(qla24xx_async_abort_cmd+0x1b)
> > > 0xf88b is in qla24xx_async_abort_cmd (./arch/x86/include/asm/atomic.h:97).
> > > 92 *
> > > 93 * Atomically increments @v by 1.
> > > 94 */
> > > 95 static __always_inline void arch_atomic_inc(atomic_t *v)
> > > 96 {
> > > 97 asm volatile(LOCK_PREFIX "incl %0"
> > > 98 : "+m" (v->counter) :: "memory");
> > > 99 }
> > > 100 #define arch_atomic_inc arch_atomic_inc
> > >
> > > [ ... ]
> > >
> > > (gdb) disassemble qla24xx_async_abort_cmd
> > > Dump of assembler code for function qla24xx_async_abort_cmd:
> > > 0x000000000000f870 <+0>: callq 0xf875 <qla24xx_async_abort_cmd+5>
> > > 0x000000000000f875 <+5>: push %r15
> > > 0x000000000000f877 <+7>: push %r14
> > > 0x000000000000f879 <+9>: push %r13
> > > 0x000000000000f87b <+11>: push %r12
> > > 0x000000000000f87d <+13>: push %rbp
> > > 0x000000000000f87e <+14>: push %rbx
> > > 0x000000000000f87f <+15>: mov 0x28(%rdi),%r13
> > > 0x000000000000f883 <+19>: mov 0x20(%rdi),%r15
> > > 0x000000000000f887 <+23>: mov 0x48(%rdi),%r14
> > > 0x000000000000f88b <+27>: lock incl 0x4(%r14)
> > > 0x000000000000f890 <+32>: mfence
> >
> > Thanks, this is very helpful. I think the above means that the crash is
> > triggered by the following code:
> >
> > sp = qla2xxx_get_qpair_sp(cmd_sp->qpair, cmd_sp->fcport,
> > GFP_KERNEL);
> >
> > From the start of qla2xxx_get_qpair_sp():
> >
> > QLA_QPAIR_MARK_BUSY(qpair, bail);
> >
> > From qla_def.h:
> >
> > #define QLA_QPAIR_MARK_BUSY(__qpair, __bail) do { \
> > atomic_inc(&__qpair->ref_count); \
> > mb(); \
> > if (__qpair->delete_in_progress) { \
> > atomic_dec(&__qpair->ref_count); \
> > __bail = 1; \
> > } else { \
> > __bail = 0; \
> > } \
> > } while (0)
> >
> > One of the changes between kernel version v4.9.210 and v4.19.98 is the
> > following: "qla2xxx: Add multiple queue pair functionality". I think the
> > above information means that the cmd_sp->qpair pointer is NULL. I will
> > let QLogic recommend a solution.
>
> Thank you very much for the analysis.
> Unfortunately, QLogic does not seem to care...
Let's try to CC the people at Cavium that signed-off the commit.
--
Ondrej Zary
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian)
2020-03-02 22:26 ` Ondrej Zary
@ 2020-03-19 18:01 ` Ondrej Zary
0 siblings, 0 replies; 9+ messages in thread
From: Ondrej Zary @ 2020-03-19 18:01 UTC (permalink / raw)
To: Bart Van Assche
Cc: qla2xxx-upstream, linux-scsi, linux-kernel, Michael Hernandez,
Sawan Chandak, Himanshu Madhani, GR-QLogic-Storage-Upstream,
Nilesh Javali
On Monday 02 March 2020 23:26:08 Ondrej Zary wrote:
> On Thursday 27 February 2020 18:09:07 Ondrej Zary wrote:
> >
> > On Tuesday 25 February 2020 04:41:48 Bart Van Assche wrote:
> > > On 2020-02-24 00:20, Ondrej Zary wrote:
> > > > Looks like it's in some inlined function.
> > > >
> > > > /usr/src/linux-source-4.19# gdb /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
> > > > GNU gdb (Debian 8.2.1-2+b3) 8.2.1
> > > > ...
> > > > Reading symbols from /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...Reading symbols
> > > > from /usr/lib/debug//lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...done.
> > > > done.
> > > >
> > > > (gdb) list *(qla24xx_async_abort_cmd+0x1b)
> > > > 0xf88b is in qla24xx_async_abort_cmd (./arch/x86/include/asm/atomic.h:97).
> > > > 92 *
> > > > 93 * Atomically increments @v by 1.
> > > > 94 */
> > > > 95 static __always_inline void arch_atomic_inc(atomic_t *v)
> > > > 96 {
> > > > 97 asm volatile(LOCK_PREFIX "incl %0"
> > > > 98 : "+m" (v->counter) :: "memory");
> > > > 99 }
> > > > 100 #define arch_atomic_inc arch_atomic_inc
> > > >
> > > > [ ... ]
> > > >
> > > > (gdb) disassemble qla24xx_async_abort_cmd
> > > > Dump of assembler code for function qla24xx_async_abort_cmd:
> > > > 0x000000000000f870 <+0>: callq 0xf875 <qla24xx_async_abort_cmd+5>
> > > > 0x000000000000f875 <+5>: push %r15
> > > > 0x000000000000f877 <+7>: push %r14
> > > > 0x000000000000f879 <+9>: push %r13
> > > > 0x000000000000f87b <+11>: push %r12
> > > > 0x000000000000f87d <+13>: push %rbp
> > > > 0x000000000000f87e <+14>: push %rbx
> > > > 0x000000000000f87f <+15>: mov 0x28(%rdi),%r13
> > > > 0x000000000000f883 <+19>: mov 0x20(%rdi),%r15
> > > > 0x000000000000f887 <+23>: mov 0x48(%rdi),%r14
> > > > 0x000000000000f88b <+27>: lock incl 0x4(%r14)
> > > > 0x000000000000f890 <+32>: mfence
> > >
> > > Thanks, this is very helpful. I think the above means that the crash is
> > > triggered by the following code:
> > >
> > > sp = qla2xxx_get_qpair_sp(cmd_sp->qpair, cmd_sp->fcport,
> > > GFP_KERNEL);
> > >
> > > From the start of qla2xxx_get_qpair_sp():
> > >
> > > QLA_QPAIR_MARK_BUSY(qpair, bail);
> > >
> > > From qla_def.h:
> > >
> > > #define QLA_QPAIR_MARK_BUSY(__qpair, __bail) do { \
> > > atomic_inc(&__qpair->ref_count); \
> > > mb(); \
> > > if (__qpair->delete_in_progress) { \
> > > atomic_dec(&__qpair->ref_count); \
> > > __bail = 1; \
> > > } else { \
> > > __bail = 0; \
> > > } \
> > > } while (0)
> > >
> > > One of the changes between kernel version v4.9.210 and v4.19.98 is the
> > > following: "qla2xxx: Add multiple queue pair functionality". I think the
> > > above information means that the cmd_sp->qpair pointer is NULL. I will
> > > let QLogic recommend a solution.
> >
> > Thank you very much for the analysis.
> > Unfortunately, QLogic does not seem to care...
>
> Let's try to CC the people at Cavium that signed-off the commit.
No reply.
qla2xxx-upstream@qlogic.com address is dead:
Generating server: DC5-EXCH01.marvell.com
qla2xxx-upstream@qlogic.com
Remote Server returned '550 5.1.1 RESOLVER.ADR.RecipNotFound; not found'
Added some more CC addresses.
Yesterday it crashed again at the same place:
[2076301.849762] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[2076301.850021] PGD 0 P4D 0
[2076301.850109] Oops: 0002 [#1] SMP PTI
[2076301.850219] CPU: 4 PID: 18992 Comm: kworker/u16:1 Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1
[2076301.850478] Hardware name: Dell Inc. PowerEdge 2950/0JR815, BIOS 2.7.0 10/30/2010
[2076301.850720] Workqueue: scsi_tmf_4 scmd_eh_abort_handler [scsi_mod]
[2076301.850936] RIP: 0010:qla24xx_async_abort_cmd+0x1b/0x250 [qla2xxx]
[2076301.851130] Code: e9 19 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 41 55 41 54 55 53 4c 8b 6f 28 4c 8b 7f 20 4c 8b 77 48 <f0> 41 ff 46 04 0f a
e f0 41 f6 46 24 04 74 17 f0 41 ff 4e 04 bd 02
[2076301.851663] RSP: 0018:ffffa10f8bbe7da8 EFLAGS: 00010293
[2076301.851820] RAX: 0000000000000800 RBX: ffff8ab8ddd197a8 RCX: 0000000000000070
[2076301.852036] RDX: ffff8ab8de4a8388 RSI: 0000000000000001 RDI: ffff8ab8799b8c40
[2076301.852253] RBP: ffff8ab8dc96c480 R08: ffffffffc03b7860 R09: 0000000000000000
[2076301.852469] R10: 8080808080808080 R11: 0000000000000010 R12: ffff8ab8dea00000
[2076301.852686] R13: ffff8ab8ddd197a8 R14: 0000000000000000 R15: ffff8ab8dd632000
[2076301.852902] FS: 0000000000000000(0000) GS:ffff8ab8e7b00000(0000) knlGS:0000000000000000
[2076301.853142] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2076301.853320] CR2: 0000000000000004 CR3: 00000002203dc000 CR4: 00000000000006e0
[2076301.853536] Call Trace:
[2076301.853632] qla24xx_abort_command+0x218/0x2d0 [qla2xxx]
[2076301.853799] ? __switch_to_asm+0x41/0x70
[2076301.853924] ? __switch_to_asm+0x35/0x70
[2076301.854056] qla2xxx_eh_abort+0x117/0x310 [qla2xxx]
[2076301.854209] scmd_eh_abort_handler+0x85/0x220 [scsi_mod]
[2076301.854375] process_one_work+0x1a7/0x3a0
[2076301.854506] worker_thread+0x30/0x390
[2076301.854628] ? create_worker+0x1a0/0x1a0
[2076301.854753] kthread+0x112/0x130
[2076301.854859] ? kthread_bind+0x30/0x30
[2076301.854980] ret_from_fork+0x35/0x40
[2076301.855095] Modules linked in: loop ipmi_ssif radeon coretemp ttm drm_kms_helper drm kvm i2c_algo_bit i5000_edac iTCO_wdt sg iTCO_vendor_support irqbypass evdev i5k_
amb serio_raw joydev ipmi_si rng_core pcc_cpufreq dcdbas pcspkr ipmi_devintf acpi_cpufreq ipmi_msghandler button ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb crypt
o_simd cryptd glue_helper aes_x86_64 dm_service_time dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua uas usb_storage hid_generic usbhid hid sr_mod cdrom ses enc
losure sd_mod scsi_transport_sas ata_generic qla2xxx ata_piix nvme_fc ehci_pci nvme_fabrics libata uhci_hcd psmouse ehci_hcd nvme_core megaraid_sas usbcore scsi_transport
_fc lpc_ich mfd_core scsi_mod usb_common bnx2
[2076301.856887] CR2: 0000000000000004
[2076301.856999] ---[ end trace e9083db8fb76e126 ]---
[2076301.857151] RIP: 0010:qla24xx_async_abort_cmd+0x1b/0x250 [qla2xxx]
[2076301.857345] Code: e9 19 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 41 55 41 54 55 53 4c 8b 6f 28 4c 8b 7f 20 4c 8b 77 48 <f0> 41 ff 46 04 0f a
e f0 41 f6 46 24 04 74 17 f0 41 ff 4e 04 bd 02
[2076301.857878] RSP: 0018:ffffa10f8bbe7da8 EFLAGS: 00010293
[2076301.858035] RAX: 0000000000000800 RBX: ffff8ab8ddd197a8 RCX: 0000000000000070
[2076301.858251] RDX: ffff8ab8de4a8388 RSI: 0000000000000001 RDI: ffff8ab8799b8c40
[2076301.858467] RBP: ffff8ab8dc96c480 R08: ffffffffc03b7860 R09: 0000000000000000
[2076301.869384] R10: 8080808080808080 R11: 0000000000000010 R12: ffff8ab8dea00000
[2076301.880412] R13: ffff8ab8ddd197a8 R14: 0000000000000000 R15: ffff8ab8dd632000
[2076301.891483] FS: 0000000000000000(0000) GS:ffff8ab8e7b00000(0000) knlGS:0000000000000000
[2076301.902490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2076301.913344] CR2: 0000000000000004 CR3: 00000002203dc000 CR4: 00000000000006e0
[2077225.259348] mysqld[2155]: segfault at 0 ip 000056409366ad93 sp 00007fa049514450 error 6 in mysqld[564092eb2000+805000]
[2077225.270564] Code: c7 45 00 00 00 00 00 8b 7d cc 4c 89 e2 4c 89 f6 e8 62 81 84 ff 49 89 c7 49 39 c4 0f 84 f6 00 00 00 e8 e1 1c 00 00 41 8b 4d 00 <89> 08 85 c9 74 37 4
9 83 ff ff 0f 84 9d 00 00 00 f6 c3 06 75 28 4d
--
Ondrej Zary
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-03-19 18:01 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-23 18:29 NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian) Ondrej Zary
2020-02-23 19:26 ` Bart Van Assche
2020-02-23 19:57 ` Ondrej Zary
2020-02-24 2:17 ` Bart Van Assche
2020-02-24 8:20 ` Ondrej Zary
2020-02-25 3:41 ` Bart Van Assche
2020-02-27 17:09 ` Ondrej Zary
2020-03-02 22:26 ` Ondrej Zary
2020-03-19 18:01 ` Ondrej Zary
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).