From: "Madhani, Himanshu" <Himanshu.Madhani@cavium.com>
To: John Garry <john.garry@huawei.com>
Cc: "James.Bottomley@HansenPartnership.com"
<James.Bottomley@HansenPartnership.com>,
"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
Date: Wed, 14 Feb 2018 19:58:21 +0000 [thread overview]
Message-ID: <7950650B-C0D6-41A1-8ABB-BEC816E99615@cavium.com> (raw)
In-Reply-To: <4a757a57-5d44-2e70-dbe3-3b969d5588fd@huawei.com>
Hi John,
> On Feb 13, 2018, at 2:54 AM, John Garry <john.garry@huawei.com> wrote:
>
> On 12/02/2018 18:28, Himanshu Madhani wrote:
>> This patch fixes NULL pointer crash due to active timer running
>> for abort IOCB.
>>
>>> From crash dump analysis it was discoverd that get_next_timer_interrupt()
>> encountered a corrupted entry on the timer list.
>>
>> #9 [ffff95e1f6f0fd40] page_fault at ffffffff914fe8f8
>> [exception RIP: get_next_timer_interrupt+440]
>> RIP: ffffffff90ea3088 RSP: ffff95e1f6f0fdf0 RFLAGS: 00010013
>> RAX: ffff95e1f6451028 RBX: 000218e2389e5f40 RCX: 00000001232ad600
>> RDX: 0000000000000001 RSI: ffff95e1f6f0fdf0 RDI: 0000000001232ad6
>> RBP: ffff95e1f6f0fe40 R8: ffff95e1f6451188 R9: 0000000000000001
>> R10: 0000000000000016 R11: 0000000000000016 R12: 00000001232ad5f6
>> R13: ffff95e1f6450000 R14: ffff95e1f6f0fdf8 R15: ffff95e1f6f0fe10
>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>>
>> Looking at the assembly of get_next_timer_interrupt(), address came
>> from %r8 (ffff95e1f6451188) which is pointing to list_head with single
>> entry at ffff95e5ff621178.
>>
>> 0xffffffff90ea307a <get_next_timer_interrupt+426>: mov (%r8),%rdx
>> 0xffffffff90ea307d <get_next_timer_interrupt+429>: cmp %r8,%rdx
>> 0xffffffff90ea3080 <get_next_timer_interrupt+432>: je 0xffffffff90ea30a7 <get_next_timer_interrupt+471>
>> 0xffffffff90ea3082 <get_next_timer_interrupt+434>: nopw 0x0(%rax,%rax,1)
>> 0xffffffff90ea3088 <get_next_timer_interrupt+440>: testb $0x1,0x18(%rdx)
>>
>> crash> rd ffff95e1f6451188 10
>> ffff95e1f6451188: ffff95e5ff621178 ffff95e5ff621178 x.b.....x.b.....
>> ffff95e1f6451198: ffff95e1f6451198 ffff95e1f6451198 ..E.......E.....
>> ffff95e1f64511a8: ffff95e1f64511a8 ffff95e1f64511a8 ..E.......E.....
>> ffff95e1f64511b8: ffff95e77cf509a0 ffff95e77cf509a0 ...|.......|....
>> ffff95e1f64511c8: ffff95e1f64511c8 ffff95e1f64511c8 ..E.......E.....
>>
>> crash> rd ffff95e5ff621178 10
>> ffff95e5ff621178: 0000000000000001 ffff95e15936aa00 ..........6Y....
>> ffff95e5ff621188: 0000000000000000 00000000ffffffff ................
>> ffff95e5ff621198: 00000000000000a0 0000000000000010 ................
>> ffff95e5ff6211a8: ffff95e5ff621198 000000000000000c ..b.............
>> ffff95e5ff6211b8: 00000f5800000000 ffff95e751f8d720 ....X... ..Q....
>>
>> ffff95e5ff621178 belongs to freed mempool object at ffff95e5ff621080.
>>
>> CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
>> ffff95dc7fd74d00 mnt_cache 384 19785 24948 594 16k
>> SLAB MEMORY NODE TOTAL ALLOCATED FREE
>> ffffdc5dabfd8800 ffff95e5ff620000 1 42 29 13
>> FREE / [ALLOCATED]
>> ffff95e5ff621080 (cpu 6 cache)
>>
>> Examining the contents of that memory reveals a pointer to a constant
>> string in the driver, "abort\0", which is set by qla24xx_async_abort_cmd().
>>
>> crash> rd ffffffffc059277c 20
>> ffffffffc059277c: 6e490074726f6261 0074707572726574 abort.Interrupt.
>> ffffffffc059278c: 00676e696c6c6f50 6920726576697244 Polling.Driver i
>> ffffffffc059279c: 646f6d207325206e 6974736554000a65 n %s mode..Testi
>> ffffffffc05927ac: 636976656420676e 786c252074612065 ng device at %lx
>> ffffffffc05927bc: 6b63656843000a2e 646f727020676e69 ...Checking prod
>> ffffffffc05927cc: 6f20444920746375 0a2e706968632066 uct ID of chip..
>> ffffffffc05927dc: 5120646e756f4600 204130303232414c .Found QLA2200A
>> ffffffffc05927ec: 43000a2e70696843 20676e696b636568 Chip...Checking
>> ffffffffc05927fc: 65786f626c69616d 6c636e69000a2e73 mailboxes...incl
>> ffffffffc059280c: 756e696c2f656475 616d2d616d642f78 ude/linux/dma-ma
>>
>> crash> struct -ox srb_iocb
>> struct srb_iocb {
>> union {
>> struct {...} logio;
>> struct {...} els_logo;
>> struct {...} tmf;
>> struct {...} fxiocb;
>> struct {...} abt;
>> struct ct_arg ctarg;
>> struct {...} mbx;
>> struct {...} nack;
>> [0x0 ] } u;
>> [0xb8] struct timer_list timer;
>> [0x108] void (*timeout)(void *);
>> }
>> SIZE: 0x110
>>
>> crash> ! bc
>> ibase=16
>> obase=10
>> B8+40
>> F8
>>
>> The object is a srb_t, and at offset 0xf8 within that structure
>> (i.e. ffff95e5ff621080 + f8 -> ffff95e5ff621178) is a struct timer_list.
>>
>> Cc: <stable@vger.stable.com> #4.4+
>> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
>> ---
>> Hi Martin,
>>
>> This patch addresses crash due to NULL pointer access because driver
>> left active timer running for abort IOCB.
>>
>> Please apply this patch to 4.16/scsi-fixes at your earliest convenience.
>>
>> Thanks,
>> Himanshu
>> ---
>> drivers/scsi/qla2xxx/qla_init.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
>> index 2dea1129d396..04870621e712 100644
>> --- a/drivers/scsi/qla2xxx/qla_init.c
>> +++ b/drivers/scsi/qla2xxx/qla_init.c
>> @@ -1527,6 +1527,7 @@ `(void *ptr, int res)
>> srb_t *sp = ptr;
>> struct srb_iocb *abt = &sp->u.iocb_cmd;
>>
>> + del_timer(&sp->u.iocb_cmd.timer);
>
> Hi,
>
> I am not familiar with this code. But I note that this abort seems to be "asynchronous" (I see it's setup in qla24xx_async_abort_cmd()), so do you need the "sync" variant of del_timer() for safety? This would be to ensure the timeout has not actually occurred and is racing with the "done" callback?
>
after reviewing code for the usage of timer i see that qla24xx_async_abort_cmd() would not be adding timer back even though the call is
asynchronous. I think its safe to use only del_timer().
> Thanks,
> John
>
>> complete(&abt->u.abt.comp);
>> }
Thanks,
- Himanshu
prev parent reply other threads:[~2018-02-14 19:58 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-12 18:28 [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS Himanshu Madhani
2018-02-13 10:38 ` Johannes Thumshirn
2018-02-13 18:13 ` Madhani, Himanshu
2018-02-14 8:27 ` Johannes Thumshirn
2018-02-13 10:54 ` John Garry
2018-02-13 20:07 ` Madhani, Himanshu
2018-02-14 19:58 ` Madhani, Himanshu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7950650B-C0D6-41A1-8ABB-BEC816E99615@cavium.com \
--to=himanshu.madhani@cavium.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=john.garry@huawei.com \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.