All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Madhani, Himanshu" <Himanshu.Madhani@cavium.com>
To: John Garry <john.garry@huawei.com>
Cc: "James.Bottomley@HansenPartnership.com"
	<James.Bottomley@HansenPartnership.com>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
Date: Wed, 14 Feb 2018 19:58:21 +0000	[thread overview]
Message-ID: <7950650B-C0D6-41A1-8ABB-BEC816E99615@cavium.com> (raw)
In-Reply-To: <4a757a57-5d44-2e70-dbe3-3b969d5588fd@huawei.com>

Hi John,

> On Feb 13, 2018, at 2:54 AM, John Garry <john.garry@huawei.com> wrote:
> 
> On 12/02/2018 18:28, Himanshu Madhani wrote:
>> This patch fixes NULL pointer crash due to active timer running
>> for abort IOCB.
>> 
>>> From crash dump analysis it was discoverd that get_next_timer_interrupt()
>> encountered a corrupted entry on the timer list.
>> 
>> #9 [ffff95e1f6f0fd40] page_fault at ffffffff914fe8f8
>>    [exception RIP: get_next_timer_interrupt+440]
>>    RIP: ffffffff90ea3088  RSP: ffff95e1f6f0fdf0  RFLAGS: 00010013
>>    RAX: ffff95e1f6451028  RBX: 000218e2389e5f40  RCX: 00000001232ad600
>>    RDX: 0000000000000001  RSI: ffff95e1f6f0fdf0  RDI: 0000000001232ad6
>>    RBP: ffff95e1f6f0fe40   R8: ffff95e1f6451188   R9: 0000000000000001
>>    R10: 0000000000000016  R11: 0000000000000016  R12: 00000001232ad5f6
>>    R13: ffff95e1f6450000  R14: ffff95e1f6f0fdf8  R15: ffff95e1f6f0fe10
>>    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>> 
>> Looking at the assembly of get_next_timer_interrupt(), address came
>> from %r8 (ffff95e1f6451188) which is pointing to list_head with single
>> entry at ffff95e5ff621178.
>> 
>> 0xffffffff90ea307a <get_next_timer_interrupt+426>:      mov    (%r8),%rdx
>> 0xffffffff90ea307d <get_next_timer_interrupt+429>:      cmp    %r8,%rdx
>> 0xffffffff90ea3080 <get_next_timer_interrupt+432>:      je     0xffffffff90ea30a7 <get_next_timer_interrupt+471>
>> 0xffffffff90ea3082 <get_next_timer_interrupt+434>:      nopw   0x0(%rax,%rax,1)
>> 0xffffffff90ea3088 <get_next_timer_interrupt+440>:      testb  $0x1,0x18(%rdx)
>> 
>> crash> rd ffff95e1f6451188 10
>> ffff95e1f6451188:  ffff95e5ff621178 ffff95e5ff621178   x.b.....x.b.....
>> ffff95e1f6451198:  ffff95e1f6451198 ffff95e1f6451198   ..E.......E.....
>> ffff95e1f64511a8:  ffff95e1f64511a8 ffff95e1f64511a8   ..E.......E.....
>> ffff95e1f64511b8:  ffff95e77cf509a0 ffff95e77cf509a0   ...|.......|....
>> ffff95e1f64511c8:  ffff95e1f64511c8 ffff95e1f64511c8   ..E.......E.....
>> 
>> crash> rd ffff95e5ff621178 10
>> ffff95e5ff621178:  0000000000000001 ffff95e15936aa00   ..........6Y....
>> ffff95e5ff621188:  0000000000000000 00000000ffffffff   ................
>> ffff95e5ff621198:  00000000000000a0 0000000000000010   ................
>> ffff95e5ff6211a8:  ffff95e5ff621198 000000000000000c   ..b.............
>> ffff95e5ff6211b8:  00000f5800000000 ffff95e751f8d720   ....X... ..Q....
>> 
>> ffff95e5ff621178 belongs to freed mempool object at ffff95e5ff621080.
>> 
>> CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
>> ffff95dc7fd74d00 mnt_cache                384      19785     24948    594    16k
>>   SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
>>   ffffdc5dabfd8800  ffff95e5ff620000     1     42         29    13
>>   FREE / [ALLOCATED]
>>    ffff95e5ff621080  (cpu 6 cache)
>> 
>> Examining the contents of that memory reveals a pointer to a constant
>> string in the driver, "abort\0", which is set by qla24xx_async_abort_cmd().
>> 
>> crash> rd ffffffffc059277c 20
>> ffffffffc059277c:  6e490074726f6261 0074707572726574   abort.Interrupt.
>> ffffffffc059278c:  00676e696c6c6f50 6920726576697244   Polling.Driver i
>> ffffffffc059279c:  646f6d207325206e 6974736554000a65   n %s mode..Testi
>> ffffffffc05927ac:  636976656420676e 786c252074612065   ng device at %lx
>> ffffffffc05927bc:  6b63656843000a2e 646f727020676e69   ...Checking prod
>> ffffffffc05927cc:  6f20444920746375 0a2e706968632066   uct ID of chip..
>> ffffffffc05927dc:  5120646e756f4600 204130303232414c   .Found QLA2200A
>> ffffffffc05927ec:  43000a2e70696843 20676e696b636568   Chip...Checking
>> ffffffffc05927fc:  65786f626c69616d 6c636e69000a2e73   mailboxes...incl
>> ffffffffc059280c:  756e696c2f656475 616d2d616d642f78   ude/linux/dma-ma
>> 
>> crash> struct -ox srb_iocb
>> struct srb_iocb {
>>           union {
>>               struct {...} logio;
>>               struct {...} els_logo;
>>               struct {...} tmf;
>>               struct {...} fxiocb;
>>               struct {...} abt;
>>               struct ct_arg ctarg;
>>               struct {...} mbx;
>>               struct {...} nack;
>>    [0x0 ] } u;
>>    [0xb8] struct timer_list timer;
>>    [0x108] void (*timeout)(void *);
>> }
>> SIZE: 0x110
>> 
>> crash> ! bc
>> ibase=16
>> obase=10
>> B8+40
>> F8
>> 
>> The object is a srb_t, and at offset 0xf8 within that structure
>> (i.e. ffff95e5ff621080 + f8 -> ffff95e5ff621178) is a struct timer_list.
>> 
>> Cc: <stable@vger.stable.com> #4.4+
>> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
>> ---
>> Hi Martin,
>> 
>> This patch addresses crash due to NULL pointer access because driver
>> left active timer running for abort IOCB.
>> 
>> Please apply this patch to 4.16/scsi-fixes at your earliest convenience.
>> 
>> Thanks,
>> Himanshu
>> ---
>> drivers/scsi/qla2xxx/qla_init.c | 1 +
>> 1 file changed, 1 insertion(+)
>> 
>> diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
>> index 2dea1129d396..04870621e712 100644
>> --- a/drivers/scsi/qla2xxx/qla_init.c
>> +++ b/drivers/scsi/qla2xxx/qla_init.c
>> @@ -1527,6 +1527,7 @@ `(void *ptr, int res)
>> 	srb_t *sp = ptr;
>> 	struct srb_iocb *abt = &sp->u.iocb_cmd;
>> 
>> +	del_timer(&sp->u.iocb_cmd.timer);
> 
> Hi,
> 
> I am not familiar with this code. But I note that this abort seems to be "asynchronous" (I see it's setup in qla24xx_async_abort_cmd()), so do you need the "sync" variant of del_timer() for safety? This would be to ensure the timeout has not actually occurred and is racing with the "done" callback?
> 

after reviewing code for the usage of timer i see that qla24xx_async_abort_cmd() would not be adding timer back even though the call is
asynchronous. I think its safe to use only del_timer(). 

> Thanks,
> John
> 
>> 	complete(&abt->u.abt.comp);
>> }

Thanks,
- Himanshu

      parent reply	other threads:[~2018-02-14 19:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-12 18:28 [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS Himanshu Madhani
2018-02-13 10:38 ` Johannes Thumshirn
2018-02-13 18:13   ` Madhani, Himanshu
2018-02-14  8:27     ` Johannes Thumshirn
2018-02-13 10:54 ` John Garry
2018-02-13 20:07   ` Madhani, Himanshu
2018-02-14 19:58   ` Madhani, Himanshu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7950650B-C0D6-41A1-8ABB-BEC816E99615@cavium.com \
    --to=himanshu.madhani@cavium.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=john.garry@huawei.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.