All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] lpfc: Fix hard lock up NMI in els timeout handling.
@ 2017-11-07 20:59 James Smart
  2017-11-08 18:57 ` Ewan D. Milne
  2017-11-08 23:25 ` Martin K. Petersen
  0 siblings, 2 replies; 3+ messages in thread
From: James Smart @ 2017-11-07 20:59 UTC (permalink / raw)
  To: linux-scsi; +Cc: Dick Kennedy, James Smart

From: Dick Kennedy <dick.kennedy@broadcom.com>

System crashed due to a hard lockup at lpfc_els_timeout_handler+0x128.

The els ring's txcmplq list is corrupted: the last element in the list
does not point back the the head causing a loop. Issue is the
els processing path for sli4 hbas are using the hbalock instead of
the ring_lock for removing elements from the txcmplq list.

Use the adapter SLI_REV to determine which lock should be used for
removing iocbqs from the els rings txcmplq.

note: the future refactoring will address this so that we don't have
this ugly type-based lock code.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
---
 drivers/scsi/lpfc/lpfc_sli.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 1229f58bdd09..c1c7df607604 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -2732,7 +2732,8 @@ lpfc_sli_process_unsol_iocb(struct lpfc_hba *phba, struct lpfc_sli_ring *pring,
  *
  * This function looks up the iocb_lookup table to get the command iocb
  * corresponding to the given response iocb using the iotag of the
- * response iocb. This function is called with the hbalock held.
+ * response iocb. This function is called with the hbalock held
+ * for sli3 devices or the ring_lock for sli4 devices.
  * This function returns the command iocb object if it finds the command
  * iocb else returns NULL.
  **/
@@ -2828,9 +2829,15 @@ lpfc_sli_process_sol_iocb(struct lpfc_hba *phba, struct lpfc_sli_ring *pring,
 	unsigned long iflag;
 
 	/* Based on the iotag field, get the cmd IOCB from the txcmplq */
-	spin_lock_irqsave(&phba->hbalock, iflag);
+	if (phba->sli_rev == LPFC_SLI_REV4)
+		spin_lock_irqsave(&pring->ring_lock, iflag);
+	else
+		spin_lock_irqsave(&phba->hbalock, iflag);
 	cmdiocbp = lpfc_sli_iocbq_lookup(phba, pring, saveq);
-	spin_unlock_irqrestore(&phba->hbalock, iflag);
+	if (phba->sli_rev == LPFC_SLI_REV4)
+		spin_unlock_irqrestore(&pring->ring_lock, iflag);
+	else
+		spin_unlock_irqrestore(&phba->hbalock, iflag);
 
 	if (cmdiocbp) {
 		if (cmdiocbp->iocb_cmpl) {
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] lpfc: Fix hard lock up NMI in els timeout handling.
  2017-11-07 20:59 [PATCH] lpfc: Fix hard lock up NMI in els timeout handling James Smart
@ 2017-11-08 18:57 ` Ewan D. Milne
  2017-11-08 23:25 ` Martin K. Petersen
  1 sibling, 0 replies; 3+ messages in thread
From: Ewan D. Milne @ 2017-11-08 18:57 UTC (permalink / raw)
  To: James Smart; +Cc: linux-scsi, Dick Kennedy, James Smart

On Tue, 2017-11-07 at 12:59 -0800, James Smart wrote:
> From: Dick Kennedy <dick.kennedy@broadcom.com>
> 
> System crashed due to a hard lockup at lpfc_els_timeout_handler+0x128.
> 
> The els ring's txcmplq list is corrupted: the last element in the list
> does not point back the the head causing a loop. Issue is the
> els processing path for sli4 hbas are using the hbalock instead of
> the ring_lock for removing elements from the txcmplq list.
> 
> Use the adapter SLI_REV to determine which lock should be used for
> removing iocbqs from the els rings txcmplq.
> 
> note: the future refactoring will address this so that we don't have
> this ugly type-based lock code.
> 
> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
> Signed-off-by: James Smart <james.smart@broadcom.com>
> ---
>  drivers/scsi/lpfc/lpfc_sli.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
> index 1229f58bdd09..c1c7df607604 100644
> --- a/drivers/scsi/lpfc/lpfc_sli.c
> +++ b/drivers/scsi/lpfc/lpfc_sli.c
> @@ -2732,7 +2732,8 @@ lpfc_sli_process_unsol_iocb(struct lpfc_hba *phba, struct lpfc_sli_ring *pring,
>   *
>   * This function looks up the iocb_lookup table to get the command iocb
>   * corresponding to the given response iocb using the iotag of the
> - * response iocb. This function is called with the hbalock held.
> + * response iocb. This function is called with the hbalock held
> + * for sli3 devices or the ring_lock for sli4 devices.
>   * This function returns the command iocb object if it finds the command
>   * iocb else returns NULL.
>   **/
> @@ -2828,9 +2829,15 @@ lpfc_sli_process_sol_iocb(struct lpfc_hba *phba, struct lpfc_sli_ring *pring,
>  	unsigned long iflag;
>  
>  	/* Based on the iotag field, get the cmd IOCB from the txcmplq */
> -	spin_lock_irqsave(&phba->hbalock, iflag);
> +	if (phba->sli_rev == LPFC_SLI_REV4)
> +		spin_lock_irqsave(&pring->ring_lock, iflag);
> +	else
> +		spin_lock_irqsave(&phba->hbalock, iflag);
>  	cmdiocbp = lpfc_sli_iocbq_lookup(phba, pring, saveq);
> -	spin_unlock_irqrestore(&phba->hbalock, iflag);
> +	if (phba->sli_rev == LPFC_SLI_REV4)
> +		spin_unlock_irqrestore(&pring->ring_lock, iflag);
> +	else
> +		spin_unlock_irqrestore(&phba->hbalock, iflag);
>  
>  	if (cmdiocbp) {
>  		if (cmdiocbp->iocb_cmpl) {

The other callers of lpfc_sli_iocbq_lookup() use the 2 different locks,
depending upon the SLI-3/SLI-4 case.

Reviewed-by: Ewan D. Milne <emilne@redhat.com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] lpfc: Fix hard lock up NMI in els timeout handling.
  2017-11-07 20:59 [PATCH] lpfc: Fix hard lock up NMI in els timeout handling James Smart
  2017-11-08 18:57 ` Ewan D. Milne
@ 2017-11-08 23:25 ` Martin K. Petersen
  1 sibling, 0 replies; 3+ messages in thread
From: Martin K. Petersen @ 2017-11-08 23:25 UTC (permalink / raw)
  To: James Smart; +Cc: linux-scsi, Dick Kennedy, James Smart


James,

> System crashed due to a hard lockup at lpfc_els_timeout_handler+0x128.
>
> The els ring's txcmplq list is corrupted: the last element in the list
> does not point back the the head causing a loop. Issue is the
> els processing path for sli4 hbas are using the hbalock instead of
> the ring_lock for removing elements from the txcmplq list.
>
> Use the adapter SLI_REV to determine which lock should be used for
> removing iocbqs from the els rings txcmplq.

Applied to 4.15/scsi-queue. Thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-11-08 23:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-07 20:59 [PATCH] lpfc: Fix hard lock up NMI in els timeout handling James Smart
2017-11-08 18:57 ` Ewan D. Milne
2017-11-08 23:25 ` Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.