* [PATCH] lpfc: Fix hard lock up NMI in els timeout handling.
@ 2017-11-07 20:59 James Smart
2017-11-08 18:57 ` Ewan D. Milne
2017-11-08 23:25 ` Martin K. Petersen
0 siblings, 2 replies; 3+ messages in thread
From: James Smart @ 2017-11-07 20:59 UTC (permalink / raw)
To: linux-scsi; +Cc: Dick Kennedy, James Smart
From: Dick Kennedy <dick.kennedy@broadcom.com>
System crashed due to a hard lockup at lpfc_els_timeout_handler+0x128.
The els ring's txcmplq list is corrupted: the last element in the list
does not point back the the head causing a loop. Issue is the
els processing path for sli4 hbas are using the hbalock instead of
the ring_lock for removing elements from the txcmplq list.
Use the adapter SLI_REV to determine which lock should be used for
removing iocbqs from the els rings txcmplq.
note: the future refactoring will address this so that we don't have
this ugly type-based lock code.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
---
drivers/scsi/lpfc/lpfc_sli.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 1229f58bdd09..c1c7df607604 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -2732,7 +2732,8 @@ lpfc_sli_process_unsol_iocb(struct lpfc_hba *phba, struct lpfc_sli_ring *pring,
*
* This function looks up the iocb_lookup table to get the command iocb
* corresponding to the given response iocb using the iotag of the
- * response iocb. This function is called with the hbalock held.
+ * response iocb. This function is called with the hbalock held
+ * for sli3 devices or the ring_lock for sli4 devices.
* This function returns the command iocb object if it finds the command
* iocb else returns NULL.
**/
@@ -2828,9 +2829,15 @@ lpfc_sli_process_sol_iocb(struct lpfc_hba *phba, struct lpfc_sli_ring *pring,
unsigned long iflag;
/* Based on the iotag field, get the cmd IOCB from the txcmplq */
- spin_lock_irqsave(&phba->hbalock, iflag);
+ if (phba->sli_rev == LPFC_SLI_REV4)
+ spin_lock_irqsave(&pring->ring_lock, iflag);
+ else
+ spin_lock_irqsave(&phba->hbalock, iflag);
cmdiocbp = lpfc_sli_iocbq_lookup(phba, pring, saveq);
- spin_unlock_irqrestore(&phba->hbalock, iflag);
+ if (phba->sli_rev == LPFC_SLI_REV4)
+ spin_unlock_irqrestore(&pring->ring_lock, iflag);
+ else
+ spin_unlock_irqrestore(&phba->hbalock, iflag);
if (cmdiocbp) {
if (cmdiocbp->iocb_cmpl) {
--
2.13.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] lpfc: Fix hard lock up NMI in els timeout handling.
2017-11-07 20:59 [PATCH] lpfc: Fix hard lock up NMI in els timeout handling James Smart
@ 2017-11-08 18:57 ` Ewan D. Milne
2017-11-08 23:25 ` Martin K. Petersen
1 sibling, 0 replies; 3+ messages in thread
From: Ewan D. Milne @ 2017-11-08 18:57 UTC (permalink / raw)
To: James Smart; +Cc: linux-scsi, Dick Kennedy, James Smart
On Tue, 2017-11-07 at 12:59 -0800, James Smart wrote:
> From: Dick Kennedy <dick.kennedy@broadcom.com>
>
> System crashed due to a hard lockup at lpfc_els_timeout_handler+0x128.
>
> The els ring's txcmplq list is corrupted: the last element in the list
> does not point back the the head causing a loop. Issue is the
> els processing path for sli4 hbas are using the hbalock instead of
> the ring_lock for removing elements from the txcmplq list.
>
> Use the adapter SLI_REV to determine which lock should be used for
> removing iocbqs from the els rings txcmplq.
>
> note: the future refactoring will address this so that we don't have
> this ugly type-based lock code.
>
> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
> Signed-off-by: James Smart <james.smart@broadcom.com>
> ---
> drivers/scsi/lpfc/lpfc_sli.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
> index 1229f58bdd09..c1c7df607604 100644
> --- a/drivers/scsi/lpfc/lpfc_sli.c
> +++ b/drivers/scsi/lpfc/lpfc_sli.c
> @@ -2732,7 +2732,8 @@ lpfc_sli_process_unsol_iocb(struct lpfc_hba *phba, struct lpfc_sli_ring *pring,
> *
> * This function looks up the iocb_lookup table to get the command iocb
> * corresponding to the given response iocb using the iotag of the
> - * response iocb. This function is called with the hbalock held.
> + * response iocb. This function is called with the hbalock held
> + * for sli3 devices or the ring_lock for sli4 devices.
> * This function returns the command iocb object if it finds the command
> * iocb else returns NULL.
> **/
> @@ -2828,9 +2829,15 @@ lpfc_sli_process_sol_iocb(struct lpfc_hba *phba, struct lpfc_sli_ring *pring,
> unsigned long iflag;
>
> /* Based on the iotag field, get the cmd IOCB from the txcmplq */
> - spin_lock_irqsave(&phba->hbalock, iflag);
> + if (phba->sli_rev == LPFC_SLI_REV4)
> + spin_lock_irqsave(&pring->ring_lock, iflag);
> + else
> + spin_lock_irqsave(&phba->hbalock, iflag);
> cmdiocbp = lpfc_sli_iocbq_lookup(phba, pring, saveq);
> - spin_unlock_irqrestore(&phba->hbalock, iflag);
> + if (phba->sli_rev == LPFC_SLI_REV4)
> + spin_unlock_irqrestore(&pring->ring_lock, iflag);
> + else
> + spin_unlock_irqrestore(&phba->hbalock, iflag);
>
> if (cmdiocbp) {
> if (cmdiocbp->iocb_cmpl) {
The other callers of lpfc_sli_iocbq_lookup() use the 2 different locks,
depending upon the SLI-3/SLI-4 case.
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] lpfc: Fix hard lock up NMI in els timeout handling.
2017-11-07 20:59 [PATCH] lpfc: Fix hard lock up NMI in els timeout handling James Smart
2017-11-08 18:57 ` Ewan D. Milne
@ 2017-11-08 23:25 ` Martin K. Petersen
1 sibling, 0 replies; 3+ messages in thread
From: Martin K. Petersen @ 2017-11-08 23:25 UTC (permalink / raw)
To: James Smart; +Cc: linux-scsi, Dick Kennedy, James Smart
James,
> System crashed due to a hard lockup at lpfc_els_timeout_handler+0x128.
>
> The els ring's txcmplq list is corrupted: the last element in the list
> does not point back the the head causing a loop. Issue is the
> els processing path for sli4 hbas are using the hbalock instead of
> the ring_lock for removing elements from the txcmplq list.
>
> Use the adapter SLI_REV to determine which lock should be used for
> removing iocbqs from the els rings txcmplq.
Applied to 4.15/scsi-queue. Thanks!
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-11-08 23:25 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-07 20:59 [PATCH] lpfc: Fix hard lock up NMI in els timeout handling James Smart
2017-11-08 18:57 ` Ewan D. Milne
2017-11-08 23:25 ` Martin K. Petersen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.