* [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
@ 2018-02-12 18:28 Himanshu Madhani
2018-02-13 10:38 ` Johannes Thumshirn
2018-02-13 10:54 ` John Garry
0 siblings, 2 replies; 7+ messages in thread
From: Himanshu Madhani @ 2018-02-12 18:28 UTC (permalink / raw)
To: James.Bottomley, martin.petersen; +Cc: himanshu.madhani, linux-scsi
This patch fixes NULL pointer crash due to active timer running
for abort IOCB.
>From crash dump analysis it was discoverd that get_next_timer_interrupt()
encountered a corrupted entry on the timer list.
#9 [ffff95e1f6f0fd40] page_fault at ffffffff914fe8f8
[exception RIP: get_next_timer_interrupt+440]
RIP: ffffffff90ea3088 RSP: ffff95e1f6f0fdf0 RFLAGS: 00010013
RAX: ffff95e1f6451028 RBX: 000218e2389e5f40 RCX: 00000001232ad600
RDX: 0000000000000001 RSI: ffff95e1f6f0fdf0 RDI: 0000000001232ad6
RBP: ffff95e1f6f0fe40 R8: ffff95e1f6451188 R9: 0000000000000001
R10: 0000000000000016 R11: 0000000000000016 R12: 00000001232ad5f6
R13: ffff95e1f6450000 R14: ffff95e1f6f0fdf8 R15: ffff95e1f6f0fe10
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
Looking at the assembly of get_next_timer_interrupt(), address came
from %r8 (ffff95e1f6451188) which is pointing to list_head with single
entry at ffff95e5ff621178.
0xffffffff90ea307a <get_next_timer_interrupt+426>: mov (%r8),%rdx
0xffffffff90ea307d <get_next_timer_interrupt+429>: cmp %r8,%rdx
0xffffffff90ea3080 <get_next_timer_interrupt+432>: je 0xffffffff90ea30a7 <get_next_timer_interrupt+471>
0xffffffff90ea3082 <get_next_timer_interrupt+434>: nopw 0x0(%rax,%rax,1)
0xffffffff90ea3088 <get_next_timer_interrupt+440>: testb $0x1,0x18(%rdx)
crash> rd ffff95e1f6451188 10
ffff95e1f6451188: ffff95e5ff621178 ffff95e5ff621178 x.b.....x.b.....
ffff95e1f6451198: ffff95e1f6451198 ffff95e1f6451198 ..E.......E.....
ffff95e1f64511a8: ffff95e1f64511a8 ffff95e1f64511a8 ..E.......E.....
ffff95e1f64511b8: ffff95e77cf509a0 ffff95e77cf509a0 ...|.......|....
ffff95e1f64511c8: ffff95e1f64511c8 ffff95e1f64511c8 ..E.......E.....
crash> rd ffff95e5ff621178 10
ffff95e5ff621178: 0000000000000001 ffff95e15936aa00 ..........6Y....
ffff95e5ff621188: 0000000000000000 00000000ffffffff ................
ffff95e5ff621198: 00000000000000a0 0000000000000010 ................
ffff95e5ff6211a8: ffff95e5ff621198 000000000000000c ..b.............
ffff95e5ff6211b8: 00000f5800000000 ffff95e751f8d720 ....X... ..Q....
ffff95e5ff621178 belongs to freed mempool object at ffff95e5ff621080.
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
ffff95dc7fd74d00 mnt_cache 384 19785 24948 594 16k
SLAB MEMORY NODE TOTAL ALLOCATED FREE
ffffdc5dabfd8800 ffff95e5ff620000 1 42 29 13
FREE / [ALLOCATED]
ffff95e5ff621080 (cpu 6 cache)
Examining the contents of that memory reveals a pointer to a constant
string in the driver, "abort\0", which is set by qla24xx_async_abort_cmd().
crash> rd ffffffffc059277c 20
ffffffffc059277c: 6e490074726f6261 0074707572726574 abort.Interrupt.
ffffffffc059278c: 00676e696c6c6f50 6920726576697244 Polling.Driver i
ffffffffc059279c: 646f6d207325206e 6974736554000a65 n %s mode..Testi
ffffffffc05927ac: 636976656420676e 786c252074612065 ng device at %lx
ffffffffc05927bc: 6b63656843000a2e 646f727020676e69 ...Checking prod
ffffffffc05927cc: 6f20444920746375 0a2e706968632066 uct ID of chip..
ffffffffc05927dc: 5120646e756f4600 204130303232414c .Found QLA2200A
ffffffffc05927ec: 43000a2e70696843 20676e696b636568 Chip...Checking
ffffffffc05927fc: 65786f626c69616d 6c636e69000a2e73 mailboxes...incl
ffffffffc059280c: 756e696c2f656475 616d2d616d642f78 ude/linux/dma-ma
crash> struct -ox srb_iocb
struct srb_iocb {
union {
struct {...} logio;
struct {...} els_logo;
struct {...} tmf;
struct {...} fxiocb;
struct {...} abt;
struct ct_arg ctarg;
struct {...} mbx;
struct {...} nack;
[0x0 ] } u;
[0xb8] struct timer_list timer;
[0x108] void (*timeout)(void *);
}
SIZE: 0x110
crash> ! bc
ibase=16
obase=10
B8+40
F8
The object is a srb_t, and at offset 0xf8 within that structure
(i.e. ffff95e5ff621080 + f8 -> ffff95e5ff621178) is a struct timer_list.
Cc: <stable@vger.stable.com> #4.4+
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
---
Hi Martin,
This patch addresses crash due to NULL pointer access because driver
left active timer running for abort IOCB.
Please apply this patch to 4.16/scsi-fixes at your earliest convenience.
Thanks,
Himanshu
---
drivers/scsi/qla2xxx/qla_init.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index 2dea1129d396..04870621e712 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -1527,6 +1527,7 @@ qla24xx_abort_sp_done(void *ptr, int res)
srb_t *sp = ptr;
struct srb_iocb *abt = &sp->u.iocb_cmd;
+ del_timer(&sp->u.iocb_cmd.timer);
complete(&abt->u.abt.comp);
}
--
2.12.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
2018-02-12 18:28 [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS Himanshu Madhani
@ 2018-02-13 10:38 ` Johannes Thumshirn
2018-02-13 18:13 ` Madhani, Himanshu
2018-02-13 10:54 ` John Garry
1 sibling, 1 reply; 7+ messages in thread
From: Johannes Thumshirn @ 2018-02-13 10:38 UTC (permalink / raw)
To: Himanshu Madhani, James.Bottomley, martin.petersen; +Cc: linux-scsi
Thanks Himanshu,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Do you happen to know which commit it actually fixes?
--
Johannes Thumshirn Storage
jthu
mshirn@suse.de +49 911 74053 689
SUSE
LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane
Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38
9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
2018-02-12 18:28 [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS Himanshu Madhani
2018-02-13 10:38 ` Johannes Thumshirn
@ 2018-02-13 10:54 ` John Garry
2018-02-13 20:07 ` Madhani, Himanshu
2018-02-14 19:58 ` Madhani, Himanshu
1 sibling, 2 replies; 7+ messages in thread
From: John Garry @ 2018-02-13 10:54 UTC (permalink / raw)
To: Himanshu Madhani, James.Bottomley, martin.petersen; +Cc: linux-scsi
On 12/02/2018 18:28, Himanshu Madhani wrote:
> This patch fixes NULL pointer crash due to active timer running
> for abort IOCB.
>
>>From crash dump analysis it was discoverd that get_next_timer_interrupt()
> encountered a corrupted entry on the timer list.
>
> #9 [ffff95e1f6f0fd40] page_fault at ffffffff914fe8f8
> [exception RIP: get_next_timer_interrupt+440]
> RIP: ffffffff90ea3088 RSP: ffff95e1f6f0fdf0 RFLAGS: 00010013
> RAX: ffff95e1f6451028 RBX: 000218e2389e5f40 RCX: 00000001232ad600
> RDX: 0000000000000001 RSI: ffff95e1f6f0fdf0 RDI: 0000000001232ad6
> RBP: ffff95e1f6f0fe40 R8: ffff95e1f6451188 R9: 0000000000000001
> R10: 0000000000000016 R11: 0000000000000016 R12: 00000001232ad5f6
> R13: ffff95e1f6450000 R14: ffff95e1f6f0fdf8 R15: ffff95e1f6f0fe10
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>
> Looking at the assembly of get_next_timer_interrupt(), address came
> from %r8 (ffff95e1f6451188) which is pointing to list_head with single
> entry at ffff95e5ff621178.
>
> 0xffffffff90ea307a <get_next_timer_interrupt+426>: mov (%r8),%rdx
> 0xffffffff90ea307d <get_next_timer_interrupt+429>: cmp %r8,%rdx
> 0xffffffff90ea3080 <get_next_timer_interrupt+432>: je 0xffffffff90ea30a7 <get_next_timer_interrupt+471>
> 0xffffffff90ea3082 <get_next_timer_interrupt+434>: nopw 0x0(%rax,%rax,1)
> 0xffffffff90ea3088 <get_next_timer_interrupt+440>: testb $0x1,0x18(%rdx)
>
> crash> rd ffff95e1f6451188 10
> ffff95e1f6451188: ffff95e5ff621178 ffff95e5ff621178 x.b.....x.b.....
> ffff95e1f6451198: ffff95e1f6451198 ffff95e1f6451198 ..E.......E.....
> ffff95e1f64511a8: ffff95e1f64511a8 ffff95e1f64511a8 ..E.......E.....
> ffff95e1f64511b8: ffff95e77cf509a0 ffff95e77cf509a0 ...|.......|....
> ffff95e1f64511c8: ffff95e1f64511c8 ffff95e1f64511c8 ..E.......E.....
>
> crash> rd ffff95e5ff621178 10
> ffff95e5ff621178: 0000000000000001 ffff95e15936aa00 ..........6Y....
> ffff95e5ff621188: 0000000000000000 00000000ffffffff ................
> ffff95e5ff621198: 00000000000000a0 0000000000000010 ................
> ffff95e5ff6211a8: ffff95e5ff621198 000000000000000c ..b.............
> ffff95e5ff6211b8: 00000f5800000000 ffff95e751f8d720 ....X... ..Q....
>
> ffff95e5ff621178 belongs to freed mempool object at ffff95e5ff621080.
>
> CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
> ffff95dc7fd74d00 mnt_cache 384 19785 24948 594 16k
> SLAB MEMORY NODE TOTAL ALLOCATED FREE
> ffffdc5dabfd8800 ffff95e5ff620000 1 42 29 13
> FREE / [ALLOCATED]
> ffff95e5ff621080 (cpu 6 cache)
>
> Examining the contents of that memory reveals a pointer to a constant
> string in the driver, "abort\0", which is set by qla24xx_async_abort_cmd().
>
> crash> rd ffffffffc059277c 20
> ffffffffc059277c: 6e490074726f6261 0074707572726574 abort.Interrupt.
> ffffffffc059278c: 00676e696c6c6f50 6920726576697244 Polling.Driver i
> ffffffffc059279c: 646f6d207325206e 6974736554000a65 n %s mode..Testi
> ffffffffc05927ac: 636976656420676e 786c252074612065 ng device at %lx
> ffffffffc05927bc: 6b63656843000a2e 646f727020676e69 ...Checking prod
> ffffffffc05927cc: 6f20444920746375 0a2e706968632066 uct ID of chip..
> ffffffffc05927dc: 5120646e756f4600 204130303232414c .Found QLA2200A
> ffffffffc05927ec: 43000a2e70696843 20676e696b636568 Chip...Checking
> ffffffffc05927fc: 65786f626c69616d 6c636e69000a2e73 mailboxes...incl
> ffffffffc059280c: 756e696c2f656475 616d2d616d642f78 ude/linux/dma-ma
>
> crash> struct -ox srb_iocb
> struct srb_iocb {
> union {
> struct {...} logio;
> struct {...} els_logo;
> struct {...} tmf;
> struct {...} fxiocb;
> struct {...} abt;
> struct ct_arg ctarg;
> struct {...} mbx;
> struct {...} nack;
> [0x0 ] } u;
> [0xb8] struct timer_list timer;
> [0x108] void (*timeout)(void *);
> }
> SIZE: 0x110
>
> crash> ! bc
> ibase=16
> obase=10
> B8+40
> F8
>
> The object is a srb_t, and at offset 0xf8 within that structure
> (i.e. ffff95e5ff621080 + f8 -> ffff95e5ff621178) is a struct timer_list.
>
> Cc: <stable@vger.stable.com> #4.4+
> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
> ---
> Hi Martin,
>
> This patch addresses crash due to NULL pointer access because driver
> left active timer running for abort IOCB.
>
> Please apply this patch to 4.16/scsi-fixes at your earliest convenience.
>
> Thanks,
> Himanshu
> ---
> drivers/scsi/qla2xxx/qla_init.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
> index 2dea1129d396..04870621e712 100644
> --- a/drivers/scsi/qla2xxx/qla_init.c
> +++ b/drivers/scsi/qla2xxx/qla_init.c
> @@ -1527,6 +1527,7 @@ qla24xx_abort_sp_done(void *ptr, int res)
> srb_t *sp = ptr;
> struct srb_iocb *abt = &sp->u.iocb_cmd;
>
> + del_timer(&sp->u.iocb_cmd.timer);
Hi,
I am not familiar with this code. But I note that this abort seems to be
"asynchronous" (I see it's setup in qla24xx_async_abort_cmd()), so do
you need the "sync" variant of del_timer() for safety? This would be to
ensure the timeout has not actually occurred and is racing with the
"done" callback?
Thanks,
John
> complete(&abt->u.abt.comp);
> }
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
2018-02-13 10:38 ` Johannes Thumshirn
@ 2018-02-13 18:13 ` Madhani, Himanshu
2018-02-14 8:27 ` Johannes Thumshirn
0 siblings, 1 reply; 7+ messages in thread
From: Madhani, Himanshu @ 2018-02-13 18:13 UTC (permalink / raw)
To: Johannes Thumshirn; +Cc: James Bottomley, Martin K . Petersen, linux-scsi
Hi Johannes,
> On Feb 13, 2018, at 2:38 AM, Johannes Thumshirn <jthumshirn@suse.de> wrote:
>
> Thanks Himanshu,
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
>
> Do you happen to know which commit it actually fixes?
>
Thanks for asking (I did not add fixes commit ID because it was pre 4.x kernel)
Here’s commit id 4440e46d5db7b445a961a84444849b2a31fa7fd1 which is fixed by this patch.
> --
> Johannes Thumshirn Storage
> jthu
> mshirn@suse.de +49 911 74053 689
> SUSE
> LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane
> Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38
> 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Thanks,
- Himanshu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
2018-02-13 10:54 ` John Garry
@ 2018-02-13 20:07 ` Madhani, Himanshu
2018-02-14 19:58 ` Madhani, Himanshu
1 sibling, 0 replies; 7+ messages in thread
From: Madhani, Himanshu @ 2018-02-13 20:07 UTC (permalink / raw)
To: John Garry; +Cc: James.Bottomley, martin.petersen, linux-scsi
Hi John,
> On Feb 13, 2018, at 2:54 AM, John Garry <john.garry@huawei.com> wrote:
>
> On 12/02/2018 18:28, Himanshu Madhani wrote:
>> This patch fixes NULL pointer crash due to active timer running
>> for abort IOCB.
>>
>>> From crash dump analysis it was discoverd that get_next_timer_interrupt()
>> encountered a corrupted entry on the timer list.
>>
>> #9 [ffff95e1f6f0fd40] page_fault at ffffffff914fe8f8
>> [exception RIP: get_next_timer_interrupt+440]
>> RIP: ffffffff90ea3088 RSP: ffff95e1f6f0fdf0 RFLAGS: 00010013
>> RAX: ffff95e1f6451028 RBX: 000218e2389e5f40 RCX: 00000001232ad600
>> RDX: 0000000000000001 RSI: ffff95e1f6f0fdf0 RDI: 0000000001232ad6
>> RBP: ffff95e1f6f0fe40 R8: ffff95e1f6451188 R9: 0000000000000001
>> R10: 0000000000000016 R11: 0000000000000016 R12: 00000001232ad5f6
>> R13: ffff95e1f6450000 R14: ffff95e1f6f0fdf8 R15: ffff95e1f6f0fe10
>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>>
>> Looking at the assembly of get_next_timer_interrupt(), address came
>> from %r8 (ffff95e1f6451188) which is pointing to list_head with single
>> entry at ffff95e5ff621178.
>>
>> 0xffffffff90ea307a <get_next_timer_interrupt+426>: mov (%r8),%rdx
>> 0xffffffff90ea307d <get_next_timer_interrupt+429>: cmp %r8,%rdx
>> 0xffffffff90ea3080 <get_next_timer_interrupt+432>: je 0xffffffff90ea30a7 <get_next_timer_interrupt+471>
>> 0xffffffff90ea3082 <get_next_timer_interrupt+434>: nopw 0x0(%rax,%rax,1)
>> 0xffffffff90ea3088 <get_next_timer_interrupt+440>: testb $0x1,0x18(%rdx)
>>
>> crash> rd ffff95e1f6451188 10
>> ffff95e1f6451188: ffff95e5ff621178 ffff95e5ff621178 x.b.....x.b.....
>> ffff95e1f6451198: ffff95e1f6451198 ffff95e1f6451198 ..E.......E.....
>> ffff95e1f64511a8: ffff95e1f64511a8 ffff95e1f64511a8 ..E.......E.....
>> ffff95e1f64511b8: ffff95e77cf509a0 ffff95e77cf509a0 ...|.......|....
>> ffff95e1f64511c8: ffff95e1f64511c8 ffff95e1f64511c8 ..E.......E.....
>>
>> crash> rd ffff95e5ff621178 10
>> ffff95e5ff621178: 0000000000000001 ffff95e15936aa00 ..........6Y....
>> ffff95e5ff621188: 0000000000000000 00000000ffffffff ................
>> ffff95e5ff621198: 00000000000000a0 0000000000000010 ................
>> ffff95e5ff6211a8: ffff95e5ff621198 000000000000000c ..b.............
>> ffff95e5ff6211b8: 00000f5800000000 ffff95e751f8d720 ....X... ..Q....
>>
>> ffff95e5ff621178 belongs to freed mempool object at ffff95e5ff621080.
>>
>> CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
>> ffff95dc7fd74d00 mnt_cache 384 19785 24948 594 16k
>> SLAB MEMORY NODE TOTAL ALLOCATED FREE
>> ffffdc5dabfd8800 ffff95e5ff620000 1 42 29 13
>> FREE / [ALLOCATED]
>> ffff95e5ff621080 (cpu 6 cache)
>>
>> Examining the contents of that memory reveals a pointer to a constant
>> string in the driver, "abort\0", which is set by qla24xx_async_abort_cmd().
>>
>> crash> rd ffffffffc059277c 20
>> ffffffffc059277c: 6e490074726f6261 0074707572726574 abort.Interrupt.
>> ffffffffc059278c: 00676e696c6c6f50 6920726576697244 Polling.Driver i
>> ffffffffc059279c: 646f6d207325206e 6974736554000a65 n %s mode..Testi
>> ffffffffc05927ac: 636976656420676e 786c252074612065 ng device at %lx
>> ffffffffc05927bc: 6b63656843000a2e 646f727020676e69 ...Checking prod
>> ffffffffc05927cc: 6f20444920746375 0a2e706968632066 uct ID of chip..
>> ffffffffc05927dc: 5120646e756f4600 204130303232414c .Found QLA2200A
>> ffffffffc05927ec: 43000a2e70696843 20676e696b636568 Chip...Checking
>> ffffffffc05927fc: 65786f626c69616d 6c636e69000a2e73 mailboxes...incl
>> ffffffffc059280c: 756e696c2f656475 616d2d616d642f78 ude/linux/dma-ma
>>
>> crash> struct -ox srb_iocb
>> struct srb_iocb {
>> union {
>> struct {...} logio;
>> struct {...} els_logo;
>> struct {...} tmf;
>> struct {...} fxiocb;
>> struct {...} abt;
>> struct ct_arg ctarg;
>> struct {...} mbx;
>> struct {...} nack;
>> [0x0 ] } u;
>> [0xb8] struct timer_list timer;
>> [0x108] void (*timeout)(void *);
>> }
>> SIZE: 0x110
>>
>> crash> ! bc
>> ibase=16
>> obase=10
>> B8+40
>> F8
>>
>> The object is a srb_t, and at offset 0xf8 within that structure
>> (i.e. ffff95e5ff621080 + f8 -> ffff95e5ff621178) is a struct timer_list.
>>
>> Cc: <stable@vger.stable.com> #4.4+
>> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
>> ---
>> Hi Martin,
>>
>> This patch addresses crash due to NULL pointer access because driver
>> left active timer running for abort IOCB.
>>
>> Please apply this patch to 4.16/scsi-fixes at your earliest convenience.
>>
>> Thanks,
>> Himanshu
>> ---
>> drivers/scsi/qla2xxx/qla_init.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
>> index 2dea1129d396..04870621e712 100644
>> --- a/drivers/scsi/qla2xxx/qla_init.c
>> +++ b/drivers/scsi/qla2xxx/qla_init.c
>> @@ -1527,6 +1527,7 @@ qla24xx_abort_sp_done(void *ptr, int res)
>> srb_t *sp = ptr;
>> struct srb_iocb *abt = &sp->u.iocb_cmd;
>>
>> + del_timer(&sp->u.iocb_cmd.timer);
>
> Hi,
>
> I am not familiar with this code. But I note that this abort seems to be "asynchronous" (I see it's setup in qla24xx_async_abort_cmd()), so do you need the "sync" variant of del_timer() for safety? This would be to ensure the timeout has not actually occurred and is racing with the "done" callback?
>
You do have a point here. I need to investigate your suggestion and see if we need to go with “sync” variant. I’ll update you shortly.
> Thanks,
> John
>
>> complete(&abt->u.abt.comp);
>> }
>>
>>
>
>
Thanks,
- Himanshu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
2018-02-13 18:13 ` Madhani, Himanshu
@ 2018-02-14 8:27 ` Johannes Thumshirn
0 siblings, 0 replies; 7+ messages in thread
From: Johannes Thumshirn @ 2018-02-14 8:27 UTC (permalink / raw)
To: Madhani, Himanshu; +Cc: James Bottomley, Martin K . Petersen, linux-scsi
On Tue, 2018-02-13 at 18:13 +0000, Madhani, Himanshu wrote:
> Thanks for asking (I did not add fixes commit ID because it was pre
> 4.x kernel)
>
> Here’s commit id 4440e46d5db7b445a961a84444849b2a31fa7fd1 which is
> fixed by this patch.
Thanks for the info. In general having a
Fixes: 4440e46d5db7 ("[SCSI] qla2xxx: Add IOCB Abort command
asynchronous handling.")
line is extremly valuable as many distributions have automation in
place to check whether a fix for a backport arrived upstream.
Thanks,
Johannes
--
Johannes Thumshirn Storage
jthu
mshirn@suse.de +49 911 74053 689
SUSE
LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane
Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38
9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
2018-02-13 10:54 ` John Garry
2018-02-13 20:07 ` Madhani, Himanshu
@ 2018-02-14 19:58 ` Madhani, Himanshu
1 sibling, 0 replies; 7+ messages in thread
From: Madhani, Himanshu @ 2018-02-14 19:58 UTC (permalink / raw)
To: John Garry; +Cc: James.Bottomley, martin.petersen, linux-scsi
Hi John,
> On Feb 13, 2018, at 2:54 AM, John Garry <john.garry@huawei.com> wrote:
>
> On 12/02/2018 18:28, Himanshu Madhani wrote:
>> This patch fixes NULL pointer crash due to active timer running
>> for abort IOCB.
>>
>>> From crash dump analysis it was discoverd that get_next_timer_interrupt()
>> encountered a corrupted entry on the timer list.
>>
>> #9 [ffff95e1f6f0fd40] page_fault at ffffffff914fe8f8
>> [exception RIP: get_next_timer_interrupt+440]
>> RIP: ffffffff90ea3088 RSP: ffff95e1f6f0fdf0 RFLAGS: 00010013
>> RAX: ffff95e1f6451028 RBX: 000218e2389e5f40 RCX: 00000001232ad600
>> RDX: 0000000000000001 RSI: ffff95e1f6f0fdf0 RDI: 0000000001232ad6
>> RBP: ffff95e1f6f0fe40 R8: ffff95e1f6451188 R9: 0000000000000001
>> R10: 0000000000000016 R11: 0000000000000016 R12: 00000001232ad5f6
>> R13: ffff95e1f6450000 R14: ffff95e1f6f0fdf8 R15: ffff95e1f6f0fe10
>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>>
>> Looking at the assembly of get_next_timer_interrupt(), address came
>> from %r8 (ffff95e1f6451188) which is pointing to list_head with single
>> entry at ffff95e5ff621178.
>>
>> 0xffffffff90ea307a <get_next_timer_interrupt+426>: mov (%r8),%rdx
>> 0xffffffff90ea307d <get_next_timer_interrupt+429>: cmp %r8,%rdx
>> 0xffffffff90ea3080 <get_next_timer_interrupt+432>: je 0xffffffff90ea30a7 <get_next_timer_interrupt+471>
>> 0xffffffff90ea3082 <get_next_timer_interrupt+434>: nopw 0x0(%rax,%rax,1)
>> 0xffffffff90ea3088 <get_next_timer_interrupt+440>: testb $0x1,0x18(%rdx)
>>
>> crash> rd ffff95e1f6451188 10
>> ffff95e1f6451188: ffff95e5ff621178 ffff95e5ff621178 x.b.....x.b.....
>> ffff95e1f6451198: ffff95e1f6451198 ffff95e1f6451198 ..E.......E.....
>> ffff95e1f64511a8: ffff95e1f64511a8 ffff95e1f64511a8 ..E.......E.....
>> ffff95e1f64511b8: ffff95e77cf509a0 ffff95e77cf509a0 ...|.......|....
>> ffff95e1f64511c8: ffff95e1f64511c8 ffff95e1f64511c8 ..E.......E.....
>>
>> crash> rd ffff95e5ff621178 10
>> ffff95e5ff621178: 0000000000000001 ffff95e15936aa00 ..........6Y....
>> ffff95e5ff621188: 0000000000000000 00000000ffffffff ................
>> ffff95e5ff621198: 00000000000000a0 0000000000000010 ................
>> ffff95e5ff6211a8: ffff95e5ff621198 000000000000000c ..b.............
>> ffff95e5ff6211b8: 00000f5800000000 ffff95e751f8d720 ....X... ..Q....
>>
>> ffff95e5ff621178 belongs to freed mempool object at ffff95e5ff621080.
>>
>> CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
>> ffff95dc7fd74d00 mnt_cache 384 19785 24948 594 16k
>> SLAB MEMORY NODE TOTAL ALLOCATED FREE
>> ffffdc5dabfd8800 ffff95e5ff620000 1 42 29 13
>> FREE / [ALLOCATED]
>> ffff95e5ff621080 (cpu 6 cache)
>>
>> Examining the contents of that memory reveals a pointer to a constant
>> string in the driver, "abort\0", which is set by qla24xx_async_abort_cmd().
>>
>> crash> rd ffffffffc059277c 20
>> ffffffffc059277c: 6e490074726f6261 0074707572726574 abort.Interrupt.
>> ffffffffc059278c: 00676e696c6c6f50 6920726576697244 Polling.Driver i
>> ffffffffc059279c: 646f6d207325206e 6974736554000a65 n %s mode..Testi
>> ffffffffc05927ac: 636976656420676e 786c252074612065 ng device at %lx
>> ffffffffc05927bc: 6b63656843000a2e 646f727020676e69 ...Checking prod
>> ffffffffc05927cc: 6f20444920746375 0a2e706968632066 uct ID of chip..
>> ffffffffc05927dc: 5120646e756f4600 204130303232414c .Found QLA2200A
>> ffffffffc05927ec: 43000a2e70696843 20676e696b636568 Chip...Checking
>> ffffffffc05927fc: 65786f626c69616d 6c636e69000a2e73 mailboxes...incl
>> ffffffffc059280c: 756e696c2f656475 616d2d616d642f78 ude/linux/dma-ma
>>
>> crash> struct -ox srb_iocb
>> struct srb_iocb {
>> union {
>> struct {...} logio;
>> struct {...} els_logo;
>> struct {...} tmf;
>> struct {...} fxiocb;
>> struct {...} abt;
>> struct ct_arg ctarg;
>> struct {...} mbx;
>> struct {...} nack;
>> [0x0 ] } u;
>> [0xb8] struct timer_list timer;
>> [0x108] void (*timeout)(void *);
>> }
>> SIZE: 0x110
>>
>> crash> ! bc
>> ibase=16
>> obase=10
>> B8+40
>> F8
>>
>> The object is a srb_t, and at offset 0xf8 within that structure
>> (i.e. ffff95e5ff621080 + f8 -> ffff95e5ff621178) is a struct timer_list.
>>
>> Cc: <stable@vger.stable.com> #4.4+
>> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
>> ---
>> Hi Martin,
>>
>> This patch addresses crash due to NULL pointer access because driver
>> left active timer running for abort IOCB.
>>
>> Please apply this patch to 4.16/scsi-fixes at your earliest convenience.
>>
>> Thanks,
>> Himanshu
>> ---
>> drivers/scsi/qla2xxx/qla_init.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
>> index 2dea1129d396..04870621e712 100644
>> --- a/drivers/scsi/qla2xxx/qla_init.c
>> +++ b/drivers/scsi/qla2xxx/qla_init.c
>> @@ -1527,6 +1527,7 @@ `(void *ptr, int res)
>> srb_t *sp = ptr;
>> struct srb_iocb *abt = &sp->u.iocb_cmd;
>>
>> + del_timer(&sp->u.iocb_cmd.timer);
>
> Hi,
>
> I am not familiar with this code. But I note that this abort seems to be "asynchronous" (I see it's setup in qla24xx_async_abort_cmd()), so do you need the "sync" variant of del_timer() for safety? This would be to ensure the timeout has not actually occurred and is racing with the "done" callback?
>
after reviewing code for the usage of timer i see that qla24xx_async_abort_cmd() would not be adding timer back even though the call is
asynchronous. I think its safe to use only del_timer().
> Thanks,
> John
>
>> complete(&abt->u.abt.comp);
>> }
Thanks,
- Himanshu
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-02-14 19:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-12 18:28 [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS Himanshu Madhani
2018-02-13 10:38 ` Johannes Thumshirn
2018-02-13 18:13 ` Madhani, Himanshu
2018-02-14 8:27 ` Johannes Thumshirn
2018-02-13 10:54 ` John Garry
2018-02-13 20:07 ` Madhani, Himanshu
2018-02-14 19:58 ` Madhani, Himanshu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.