All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fnic: finish scsi_cmnd before dropping the spinlock to prevent abort race
@ 2022-03-11 18:43 David Jeffery
  2022-03-11 19:12 ` Laurence Oberman
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: David Jeffery @ 2022-03-11 18:43 UTC (permalink / raw)
  To: linux-scsi
  Cc: Satish Kharat, Sesidhar Baddela, Karan Tilak Kumar,
	Laurence Oberman, John Pittman, David Jeffery

When aborting a scsi command through fnic, there is a race with the fnic
interrupt handler which can result in the scsi command and its request
being completed twice. If the interrupt handler claims the command by
setting CMD_SP to NULL first, the abort handler assumes the interrupt
handler has completed the command and returns SUCCESS, causing the request
for the scsi_cmnd to be re-queued.

But the interrupt handler may not have finished the command yet. After it
drops the spinlock protecting CMD_SP, it does memory cleanup before
finally calling scsi_done to complete the scsi_cmnd. If the call to
scsi_done occurs after the abort handler finishes and re-queues the
request, the completion of the scsi_cmnd will advance and try to double
complete a request already queued for retry.

This patch fixes the issue by moving scsi_done and any other use of
scsi_cmnd to before the spinlock is released by the interrupt handler.

Signed-off-by: David Jeffery <djeffery@redhat.com>
---
 drivers/scsi/fnic/fnic_scsi.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
index 88c549f257db..40a52feb315d 100644
--- a/drivers/scsi/fnic/fnic_scsi.c
+++ b/drivers/scsi/fnic/fnic_scsi.c
@@ -986,8 +986,6 @@ static void fnic_fcpio_icmnd_cmpl_handler(struct fnic *fnic,
 	CMD_SP(sc) = NULL;
 	CMD_FLAGS(sc) |= FNIC_IO_DONE;
 
-	spin_unlock_irqrestore(io_lock, flags);
-
 	if (hdr_status != FCPIO_SUCCESS) {
 		atomic64_inc(&fnic_stats->io_stats.io_failures);
 		shost_printk(KERN_ERR, fnic->lport->host, "hdr status = %s\n",
@@ -996,8 +994,6 @@ static void fnic_fcpio_icmnd_cmpl_handler(struct fnic *fnic,
 
 	fnic_release_ioreq_buf(fnic, io_req, sc);
 
-	mempool_free(io_req, fnic->io_req_pool);
-
 	cmd_trace = ((u64)hdr_status << 56) |
 		  (u64)icmnd_cmpl->scsi_status << 48 |
 		  (u64)icmnd_cmpl->flags << 40 | (u64)sc->cmnd[0] << 32 |
@@ -1021,6 +1017,12 @@ static void fnic_fcpio_icmnd_cmpl_handler(struct fnic *fnic,
 	} else
 		fnic->lport->host_stats.fcp_control_requests++;
 
+	/* Call SCSI completion function to complete the IO */
+	scsi_done(sc);
+	spin_unlock_irqrestore(io_lock, flags);
+
+	mempool_free(io_req, fnic->io_req_pool);
+
 	atomic64_dec(&fnic_stats->io_stats.active_ios);
 	if (atomic64_read(&fnic->io_cmpl_skip))
 		atomic64_dec(&fnic->io_cmpl_skip);
@@ -1049,9 +1051,6 @@ static void fnic_fcpio_icmnd_cmpl_handler(struct fnic *fnic,
 		if(io_duration_time > atomic64_read(&fnic_stats->io_stats.current_max_io_time))
 			atomic64_set(&fnic_stats->io_stats.current_max_io_time, io_duration_time);
 	}
-
-	/* Call SCSI completion function to complete the IO */
-	scsi_done(sc);
 }
 
 /* fnic_fcpio_itmf_cmpl_handler
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] fnic: finish scsi_cmnd before dropping the spinlock to prevent abort race
  2022-03-11 18:43 [PATCH] fnic: finish scsi_cmnd before dropping the spinlock to prevent abort race David Jeffery
@ 2022-03-11 19:12 ` Laurence Oberman
  2022-03-15  6:21 ` Ming Lei
  2022-03-15 18:22 ` Martin K. Petersen
  2 siblings, 0 replies; 4+ messages in thread
From: Laurence Oberman @ 2022-03-11 19:12 UTC (permalink / raw)
  To: David Jeffery, linux-scsi
  Cc: Satish Kharat, Sesidhar Baddela, Karan Tilak Kumar, John Pittman

On Fri, 2022-03-11 at 13:43 -0500, David Jeffery wrote:
> When aborting a scsi command through fnic, there is a race with the
> fnic
> interrupt handler which can result in the scsi command and its
> request
> being completed twice. If the interrupt handler claims the command by
> setting CMD_SP to NULL first, the abort handler assumes the interrupt
> handler has completed the command and returns SUCCESS, causing the
> request
> for the scsi_cmnd to be re-queued.
> 
> But the interrupt handler may not have finished the command yet.
> After it
> drops the spinlock protecting CMD_SP, it does memory cleanup before
> finally calling scsi_done to complete the scsi_cmnd. If the call to
> scsi_done occurs after the abort handler finishes and re-queues the
> request, the completion of the scsi_cmnd will advance and try to
> double
> complete a request already queued for retry.
> 
> This patch fixes the issue by moving scsi_done and any other use of
> scsi_cmnd to before the spinlock is released by the interrupt
> handler.
> 
> Signed-off-by: David Jeffery <djeffery@redhat.com>
> ---
>  drivers/scsi/fnic/fnic_scsi.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/scsi/fnic/fnic_scsi.c
> b/drivers/scsi/fnic/fnic_scsi.c
> index 88c549f257db..40a52feb315d 100644
> --- a/drivers/scsi/fnic/fnic_scsi.c
> +++ b/drivers/scsi/fnic/fnic_scsi.c
> @@ -986,8 +986,6 @@ static void fnic_fcpio_icmnd_cmpl_handler(struct
> fnic *fnic,
>  	CMD_SP(sc) = NULL;
>  	CMD_FLAGS(sc) |= FNIC_IO_DONE;
>  
> -	spin_unlock_irqrestore(io_lock, flags);
> -
>  	if (hdr_status != FCPIO_SUCCESS) {
>  		atomic64_inc(&fnic_stats->io_stats.io_failures);
>  		shost_printk(KERN_ERR, fnic->lport->host, "hdr status =
> %s\n",
> @@ -996,8 +994,6 @@ static void fnic_fcpio_icmnd_cmpl_handler(struct
> fnic *fnic,
>  
>  	fnic_release_ioreq_buf(fnic, io_req, sc);
>  
> -	mempool_free(io_req, fnic->io_req_pool);
> -
>  	cmd_trace = ((u64)hdr_status << 56) |
>  		  (u64)icmnd_cmpl->scsi_status << 48 |
>  		  (u64)icmnd_cmpl->flags << 40 | (u64)sc->cmnd[0] << 32
> |
> @@ -1021,6 +1017,12 @@ static void
> fnic_fcpio_icmnd_cmpl_handler(struct fnic *fnic,
>  	} else
>  		fnic->lport->host_stats.fcp_control_requests++;
>  
> +	/* Call SCSI completion function to complete the IO */
> +	scsi_done(sc);
> +	spin_unlock_irqrestore(io_lock, flags);
> +
> +	mempool_free(io_req, fnic->io_req_pool);
> +
>  	atomic64_dec(&fnic_stats->io_stats.active_ios);
>  	if (atomic64_read(&fnic->io_cmpl_skip))
>  		atomic64_dec(&fnic->io_cmpl_skip);
> @@ -1049,9 +1051,6 @@ static void
> fnic_fcpio_icmnd_cmpl_handler(struct fnic *fnic,
>  		if(io_duration_time > atomic64_read(&fnic_stats-
> >io_stats.current_max_io_time))
>  			atomic64_set(&fnic_stats-
> >io_stats.current_max_io_time, io_duration_time);
>  	}
> -
> -	/* Call SCSI completion function to complete the IO */
> -	scsi_done(sc);
>  }
>  
>  /* fnic_fcpio_itmf_cmpl_handler

This patch was also presented to Ming who agreed with David's changes.
Its been sent to a customer for full testing to see if it avoids the
panics.
The trigger is a sequence of these and then we get the double
completion. WHile its not easy to reproduce and not often seen this
customer can make it happen at will it seems.

[1363787.139752] scsi host7: hdr status = FCPIO_DATA_CNT_MISMATCH
[1363787.139822] scsi host7: hdr status = FCPIO_DATA_CNT_MISMATCH
[1363787.139870] scsi host7: hdr status = FCPIO_DATA_CNT_MISMATCH
[1363787.139916] scsi host7: hdr status = FCPIO_DATA_CNT_MISMATCH
[1363787.139961] scsi host7: hdr status = FCPIO_DATA_CNT_MISMATCH
[1363787.140006] scsi host7: hdr status = FCPIO_DATA_CNT_MISMATCH

Reviewed-by: Laurence Oberman <loberman@redhat.com>

Thanks very much


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] fnic: finish scsi_cmnd before dropping the spinlock to prevent abort race
  2022-03-11 18:43 [PATCH] fnic: finish scsi_cmnd before dropping the spinlock to prevent abort race David Jeffery
  2022-03-11 19:12 ` Laurence Oberman
@ 2022-03-15  6:21 ` Ming Lei
  2022-03-15 18:22 ` Martin K. Petersen
  2 siblings, 0 replies; 4+ messages in thread
From: Ming Lei @ 2022-03-15  6:21 UTC (permalink / raw)
  To: David Jeffery
  Cc: linux-scsi, Satish Kharat, Sesidhar Baddela, Karan Tilak Kumar,
	Laurence Oberman, John Pittman

On Fri, Mar 11, 2022 at 01:43:59PM -0500, David Jeffery wrote:
> When aborting a scsi command through fnic, there is a race with the fnic
> interrupt handler which can result in the scsi command and its request
> being completed twice. If the interrupt handler claims the command by
> setting CMD_SP to NULL first, the abort handler assumes the interrupt
> handler has completed the command and returns SUCCESS, causing the request
> for the scsi_cmnd to be re-queued.
> 
> But the interrupt handler may not have finished the command yet. After it
> drops the spinlock protecting CMD_SP, it does memory cleanup before
> finally calling scsi_done to complete the scsi_cmnd. If the call to
> scsi_done occurs after the abort handler finishes and re-queues the
> request, the completion of the scsi_cmnd will advance and try to double
> complete a request already queued for retry.
> 
> This patch fixes the issue by moving scsi_done and any other use of
> scsi_cmnd to before the spinlock is released by the interrupt handler.

This way provides one simple fix for the race between normal completion
and abort, looks fine:

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] fnic: finish scsi_cmnd before dropping the spinlock to prevent abort race
  2022-03-11 18:43 [PATCH] fnic: finish scsi_cmnd before dropping the spinlock to prevent abort race David Jeffery
  2022-03-11 19:12 ` Laurence Oberman
  2022-03-15  6:21 ` Ming Lei
@ 2022-03-15 18:22 ` Martin K. Petersen
  2 siblings, 0 replies; 4+ messages in thread
From: Martin K. Petersen @ 2022-03-15 18:22 UTC (permalink / raw)
  To: David Jeffery, linux-scsi
  Cc: Martin K . Petersen, Laurence Oberman, Sesidhar Baddela,
	Karan Tilak Kumar, John Pittman, Satish Kharat

On Fri, 11 Mar 2022 13:43:59 -0500, David Jeffery wrote:

> When aborting a scsi command through fnic, there is a race with the fnic
> interrupt handler which can result in the scsi command and its request
> being completed twice. If the interrupt handler claims the command by
> setting CMD_SP to NULL first, the abort handler assumes the interrupt
> handler has completed the command and returns SUCCESS, causing the request
> for the scsi_cmnd to be re-queued.
> 
> [...]

Applied to 5.17/scsi-fixes, thanks!

[1/1] fnic: finish scsi_cmnd before dropping the spinlock to prevent abort race
      https://git.kernel.org/mkp/scsi/c/733ab7e1b5d1

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-03-15 18:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-11 18:43 [PATCH] fnic: finish scsi_cmnd before dropping the spinlock to prevent abort race David Jeffery
2022-03-11 19:12 ` Laurence Oberman
2022-03-15  6:21 ` Ming Lei
2022-03-15 18:22 ` Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.