All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chesnokov Gleb <Chesnokov.G@raidix.com>
To: Himanshu Madhani <himanshu.madhani@oracle.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: [PATCH 2/2] qla2xxx: Fix missed DMA unmap for aborted cmds
Date: Mon, 25 Apr 2022 18:49:47 +0000	[thread overview]
Message-ID: <AS8PR10MB495233A032E7425D1DAE9D549DF89@AS8PR10MB4952.EURPRD10.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <F1592489-7C94-454B-8EF3-BF5C56F48A10@oracle.com>

>> On Apr 20, 2022, at 7:42 AM, Chesnokov Gleb <Chesnokov.G@raidix.com> wrote:
>> 
>>> Do you have a log showing this error sequence?
>> 
>> Yes, I have, but the problem is that I have a different target stack, not LIO. So the Call Trace basically contains code sequence from this target stack only,
>> except for the call of the qlt_free_cmd() that trigger BUG: BUG_ON(cmd->sg_mapped).
>> Regardless, I think the problem lies on the qlogic driver side, because it is responsible for management to map/unmap sgl list.
>
> Agree. Am curious to understand the test case/steps that would trigger this issue in your env. If you can share your test scenario would be a bit more helpful. 
>>
>> 
>>> Can you share more details?
>> 
>> What I am observing:
>> 
>> 1) Command processing calls qlt_rdy_to_xfer(), maps sgl and sends a command to the firmware
>> 2) Qlogic adapter reset occurs
>> 
>> qla2xxx [0000:82:00.1]-5003:13: ISP System Error - mbx1=110eh mbx2=10h mbx3=dh mbx4=0h mbx5=8a1h mbx6=0h mbx7=0h.
>
> This message indicates there was a firmware crash. Qlogic/Marvell folks should be able to help you capture/save dump. That firmware dump might give you clues on what is the cause of the firmware crash. 
>
>> qla2xxx [0000:82:00.1]-d01e:13: -> fwdump no buffer
>
>> qla2xxx [0000:82:00.1]-00af:13: Performing ISP error recovery - ha=ffff9dd7d6058000.
>> 
>
>> 3) Somehow the command is being aborted, so that means the command's abort flag has already been set.
>> I think it may happens something like this:
>> qla2x00_abort_isp_cleanup() --> qla2x00_abort_all_cmds()
>> 
>
> I think this is the aftereffect of a firmware crash and the driver is just recovering from that. A good firmware analysis will shed more light on this issue. 
>
>> 4) The target stack calls qlt_abort_cmd(), and since aborted flag has already been set, this call ended as multiple abort.
>> 
>> 5) The target stack calls xmit_response, and since command has already been aborted, this call starts the code sequence to release the command that ended > with qlt_free_cmd()
>> 
>> I think I could try to reproduce the problem with LIO target stack, but I have special case with my target stack that lead to reset of qlogic adapter (ISP error recovery) and this is one important part of the error sequence. So, I think I will not be able to reproduce the problem with the LIO until I find out how to similarly reset qlogic adapter during processing active commands that have already been sent to the firmware.
>
>
> Himanshu Madhani        Oracle Linux Engineering

I seem to know the cause of the firmware crash. This is an abnormal sg list that is generated by my backend driver and passed to the Qlogic driver via target stack. The abnormal state of the sg list in my case means that it contains more than a thousand nents. So apparently Qlogic adapter does not know how to work with such buffers.

In any case, I think that the main thing is not to find the cause of the firmware crash or fix it (because it actually comes from my side), but to fix the crash during recovery the Qlogic driver after a firmware crash.

I have special case that allows me to reproduce the problem, but perhaps it can be reproduced in other cases that cause a firmware crash. Maybe there is a way to manually cause the firmware crash and it will allow to artificially reproduce the problem?

  reply	other threads:[~2022-04-25 18:50 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-15 12:42 [PATCH 2/2] qla2xxx: Fix missed DMA unmap for aborted cmds Chesnokov Gleb
2022-04-19 19:41 ` Himanshu Madhani
2022-04-20 14:42   ` Chesnokov Gleb
2022-04-20 19:09     ` Himanshu Madhani
2022-04-25 18:49       ` Chesnokov Gleb [this message]
2022-04-27 16:52         ` Himanshu Madhani
2022-05-03  0:50 ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AS8PR10MB495233A032E7425D1DAE9D549DF89@AS8PR10MB4952.EURPRD10.PROD.OUTLOOK.COM \
    --to=chesnokov.g@raidix.com \
    --cc=himanshu.madhani@oracle.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.